Some question on the semantics of the Arith dialect

I’m quite new to this forum, so I hope I’m posting in the right section and haven’t overlooked any relevant discussions.

I’ve been looking into the documentation for the Arith dialect lately and I’ve noticed a few details that I’d like to clarify -

  1. The fact that the shli operation has no shlui/shlsi counterparts seems to imply a 2s complement representation of the integer types - but this doesn’t appear to be mentioned in the spec. Should this inference be explicitly stated in the specifications for better clarity?
  2. A related query pertains to the cmpi operation—when a signed integer is compared using an “unsigned” comparison operation, or vice versa, what are the intended semantics?
  3. Additionally, the requirement for the left-hand side (LHS) and right-hand side (RHS) in shift left/right operations to be of the same type seems somewhat restrictive. Is there a specific reason behind this constraint?
  4. Again on the shift operations, it seems like the behaviour is not specified in the documentation when the shift exceeds the bit width. Wondering what’s supposed to be the behaviour here?

I had a look through the previous posts on this topic, and it looks like last year the semantics clarity of Arith was raised ([RFC] Define precise arith semantics) where people seemed to agree it was an issue, but I couldn’t find any subsequent updates. I’m curious about the current status of this.

In that discussion we realized that we need poison semantics to express the desired semantics of some of the ops. We now have the ub dialect with poison constants. This work has been primarily driven by @Hardcode84, @zero9178, and @KFAF.

We had a roundtable on poison semantics at the US LLVM Dev Mtg last month, where @regehr and I discussed the next steps for the arith dialect. The summary of the discussion is:

  • Currently the poison attribute support for basic scalar types but we’d want to add support for aggregates (vector, tuple, maybe tensor) so that we can support per-element poison values.
  • Currently the arith folders only propagate poison but don’t produce it. We probably want to keep it as the default because not all lowering targets support poison semantics (SPIR-V doesn’t). Any folders/canonicalizes that introduce poison at this levels should probably be opt in.
  • Regardless of how folding is implemented, we can go ahead and start using poison in the semantics (textual definition) of the airth dialect, e.g., in the definition of shift ops like you pointed out.

All of these have been on my radar for a while, but because none of my other work is blocked by this, something else tends to take the priority. Contributions very very welcome :slight_smile:.

Right, AFAIK the arith dialect assumes signless integers and 2s complement. We should probably mention this in the intro paragraph.

The same could be asked about other ops like addi, muli, etc. I’d say that putting these constraints on types reduces the number of possible corner cases in op semantics – all sign/zero extension are explicit. This also implies fewer corner cases in transforms that operate on arith ops.

Some clarification: while we do not support per-element poison for vector type, full poison seems to work just fine, the only issue I found so far was the extractelement/insertelement folder bug.

Thank you for the clarifications, it’s really helpful!

I’m interested in contributing too - although I’ve used MLIR for the last few months I haven’t made any contributions to MLIR itself yet, and I’m keen to make a first PR :slight_smile:

Wondering which of these are higher priority, how much work would you estimate for these as a relative newbie?

Regardless of how folding is implemented, we can go ahead and start using poison in the semantics (textual definition) of the airth dialect, e.g., in the definition of shift ops like you pointed out.

I’m a little confused by this. I understand this as that it’s now a part of arith’s semantics to create poison, but then the constant folders don’t introduce those since not all targets support poison semantics. What would the semantics of these operations be (e.g. shift past width) when translating to targets that don’t support poison? Would it be a different semantics or perhaps just immediate UB?

The same could be asked about other ops like addi, muli, etc. I’d say that putting these constraints on types reduces the number of possible corner cases in op semantics – all sign/zero extension are explicit.

Might shifts be a little different to e.g. addi, muli cases since it is not Commutative?

Awesome! I’d say that in general these sub-projects (per-element poison, missing folds/canon patterns, op semantics) should parallelize pretty well. It might be easiest to start in the reverse order to what I listed, so that you can develop a good understanding of the semantics and existing mechanisms within MLIR before having to solve problems we do not have full answers for.

Happy to chat if you, or anyone else, is interested in this work and would like more guidance (via email / chat / video call).

We could say that these ops produce poison and use this to justify (the semantic correctness of) our rewrite patterns, but make any fold/canons that actually create poison values opt-in. The issue with SPIR-V specifically is that it only has OpUndef which is difficult to reason and optimize around, and we would really like to avoid having to lower ub.poison to spirv.Undef.

Similarly, we can make the transformations more defined than what the semantics, and, for instance, define division by zero as immediate UB but fold it to ub.poison.

It might be easiest to start in the reverse order to what I listed, so that you can develop a good understanding of the semantics and existing mechanisms within MLIR before having to solve problems we do not have full answers for.

Great! That’s a good idea to start from the last point.

I’ll start by updating the documentation to include the recent changes (e.g. poison semantics, int representation, and any other stuff I come across). I’ll probably come back to you when I get stuck :stuck_out_tongue:

Thanks, the explanation made sense. So it seems like it might be better to define the semantics of these “error” instances as immediate UB rather than producing poisons, which would leave room for transformations to make it more defined. E.g. folders could actually produce a poison value like you said rather than UB. Should I proceed with that then for the doc updates?

That would be too restrictive and prevent us from speculating on ops that are known to succeed (come back with some result without trapping etc.) on most hardware. An example of an op where we’d want immediate UB is division by zero or out of bounds memory accesses. We already disallow have some wording around the latter being UB in the vector dialect, e.g., 'vector' Dialect - MLIR.

You can follow the original proposal for poison semantics to see the full rationale: [RFC] Poison semantics for MLIR.

I see - so it’s more of a case-by-case basis on when we need immediate UB vs. poison. Thanks for the link and I’ll go through it tomorrow before starting to work on the doc updates! :slight_smile:

1 Like

Currently, the arith shift ops are documented as returning poison of the shift amount is greater than the bitwidth - this is in line with the LLVM Support lib which defines the result of shifting by an amount equal to the bitwdith as zero. However, the LLVM instructions return poison even when the shift amount is equal to the bitwidth. I would suggest adapting the poison behavior from the LLVM instructions because otherwise we have to catch this special case in the lowering which seems suboptimal for fast bitshift operations.

I have created a PR which changes the documentation accordingly but leaves the folders alone for now, since producing Poison in the folders is not yet implemented.

1 Like

@kuhar (who is on vacation at this moment)