I am currently implementing additional runtime verification for various memref dialect ops and took a closer look at memref.subview. We have this example in the TableGen documentation:
%1 = memref.subview %0[%i, %j][4, 4][%x, %y] :
: memref<?x?xf32, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + d1 * s2 + s0)>> to
memref<4x4xf32, affine_map<(d0, d1)[r0, r1, r2] -> (d0 * r1 + d1 * r2 + r0)>>
// Note that the subview op does not guarantee that the result
// memref is "inbounds" w.r.t to base memref. It is upto the client
// to ensure that the subview is accessed in a manner that is
// in-bounds.
I’d like discuss whether this subview is allowed to out-of-bounds at runtime. There are two ways how the comment could be interpreted:
The memref.subview result may go out-of-bounds with respect to the source memref. In particular, this would imply that memref.subview %m[0][10][1] : memref<5xf32> to memref<10xf32> is a valid operation.
The memref.subview result must not go out-of-bounds. However, in the context of dynamic shapes, this cannot be statically verified.
The comment seems to imply Option 1. Does this still make sense today or is the comment outdated? (For reference: We strengthened the verifier of tensor.extract_slice recently, so that out-of-bounds extractions are no longer allowed.) It is also suspicious that these out-of-bounds semantics are mentioned only towards the end of the documentation next to an example.
I don’t really have that much experience here, but to me the example that you shared should be rejected by the Op verifier. In fact, that’s what should happen according to the Op description (extra emphasis by me):
The “subview” operation converts a memref type to another memref type which represents a reduced-size view of the original memref as specified by the operation’s offsets, sizes and strides arguments.
In the case of dynamic shapes/sizes, it would be up to the client to make sure that everything stays in-bounds. At least that’s the assumption I’ve been making. That said, having an option to enable a runtime check would be nice.
I’m not sure if the landed change was a good idea. Before this change, you could create an out-of-bounds memref using memref.subview and mask the out-of-bounds part of the memref. It was upto the user to mask the out-of-bounds part properly. After this change, this is impossible to do.
A really nice thing you could for masking was:
%mem : memref<257xf32>
to do a masked vectorized read of vector<8xf32>, you could:
For tensors, this is not a big problem, because tensors have value semantics and you can just pad. But you need to create a view from memrefs and mask over it.
I would recommend reverting back to the old stage.
Have you experimented with memref.reinterpret_cast? memref.subview lowers to that operation and it allows you to do the masked_load from your example.
The reason why I tightened the out-of-bounds semantics is because I wanted better compile-time and run-time op verification that checks for out-of-bounds accesses. I noticed that the memref sizes were effectively not utilized at all, raising the question why memrefs have sizes at all. (You can do all index computations just with the strides.)
memref.reinterpret_cast works on the base underlying memory. I’m not sure if you can fold the indexing created by a memref.reinterpret_cast. A memref.subview can always be folded into something like a vector.maskedload.
This should probably be a pass ran before FoldMemRefAliasOps in a pipeline that wants to not have out-of-bound subviews. I’m not sure if it makes sense to change the verifier to check this for static shapes because it can be folded away into indexing. (Note that it’s weird that you can do out-of-bounds subviews for dynamic shapes, but you can’t do it for static shapes)
Also, out-of-bounds accesses should be checked on the user, like a memref.load/vector.load, doing an out-of-bounds subview shouldn’t be undefined.
I’m not sure what this means. How would you do expand_shape/collapse_shape of a memref with no sizes?
You cannot do out-of-bounds subviews on dynamic shapes either. It’s just that the verifier cannot detect it:
The offset, size and stride operands must be in-bounds with respect to the
source memref. When possible, the static operation verifier will detect
out-of-bounds subviews. Subviews that cannot be confirmed to be in-bounds
or out-of-bounds based on compile-time information are valid. However,
performing an out-of-bounds subview at runtime is undefined behavior.
memref.load/vector.load do not accept static indices, so we cannot compare the design directly. Maybe (probably?) they would have been designed to also reject out-of-bounds indices in the op verifier if they were to support static indices.
Note that my commit just improved the verifier but did not change the op semantics. Out-of-bounds accesses were already forbidden according to the op documentation before my commit:
Rolling back this commit (and changing the op semantics in the op documentation) actually means rolling back 4-5 stacked commits that built on top of this. Which I don’t mind per se. But I’d like to make sure that these are actually the semantics that we want.
In particular, we probably would have to change the semantics/documentation of memref.load and some other ops:
The `load` op reads an element from a memref at the specified indices.
The number of indices must match the rank of the memref. The indices must
be in-bounds: `0 <= idx < dim_size`.
Whether a load/store/access is valid is then no longer a property of the operation in question and its memref operand. We would need to introduce the concept of an “underlying allocation” or “allocated pointer”.
The distinction between static and dynamic on the same op (one is invalid IR, the other one is UB) is a design smell to me: I don’t believe it is desirable to do this (vs having runtime UB).
Also this op is defined as Pure, which seems like a bug since it is incompatible with UB.
Can you elaborate? Why is this not desirable? We currently have many ops with mixed static/dynamic shape semantics: tensor.extract_slice, tensor.insert_slice, memref.subview, tensor.expand_shape, tensor.collapse_shape, memref.expand_shape, memref.collapse_shape, tensor.pad, tensor.parallel_insert_slice, [memref.reinterpret_cast]
This in itself is not desirable but my remark above was about the verifier, not the design of the ops.
On the op design: there were long discussions about it years ago (that should still exist in the archive), but in a nutshell it clobbers the API and does not really brings benefits, or at least they don’t pay for itself. Inherent Attributes / Properties are useful when actually needed for codegen, otherwise regular constants just work. If we have more profound issue with constants in the infra, we should address them directly instead of hacking around on these ops. Folding/canonicalization is more involved on these ops as well, and this is where having a verifier becomes iffy.
%cst = constant 10
some_op %cst
is valid but:
some_op 10
becomes invalid IR? That means that they are not equivalent? How can you explain this from an operational semantics on the operation execution?
+1 now that we have TypedValue<X> is it finally time to move to generalized Quantity<X> that can seamlessly encode Value or Attr (rather than all the OpFoldResult weirdness and lack of first-class printer/parser support).
I’d certainly welcome those cleanups, they are long standing warts.