Right. I can look into a patch to see what kind of impact it has. Like with ElementsAttr recently, a lot of these should really just be interfaces. My main desire here is that we figure out the right way forward medium/long term, so that these types of things can be done more easily and in a less controversial way. Adding things to the builtin dialect naturally has a lot of push back and reservation (it affects a lot of people, many of which likely don’t care about e.g. tensors), and we really shouldn’t need the builtin dialect as a funnel point.
+1 - let me file an issue to track. For old, pervasive things like this, it is likely to cause a fair amount of migration pain for our users. It’d be good to get an idea of just how bad that is and whether it needs some pre-staging or something. The initial split may not be that bad. I’m not relishing the idea of writing !tensor.tensor everywhere, though (if we go so far as to moving the existing types).
In general I’m happy to see a type being added now, this was raised originally when the init_tensor and the general mechanism in linalg was introduced. I’m wondering what is the relationship with the shape dialect?
It seems like there is a non-trivial overlap between manipulating “shapes” and what is sought here. The initial example of the RFC about the %windows used in the linalg.pooling_nhwc_sum does not seem any different to me than manipulating the concept of “a shape”.
On the “builtin” aspect: this is a long lasting issue that is under-invested. I’m with River in the sense that we only seem to make progress on this kind of layering / software architecture aspects when we push back on adding more technical debt. This kind of event (the need of an addition) is always a good forcing function to at least make a plan (like we did with the std dialect a while ago).
My understanding of the !shape.shape type, which is quite dated as I haven’t looked at it in a while is that it is a more “analysis-centric” type that lives as part of a DSL for constructing standalone representations of shape transfer functions. Critically, it has error-or semantics and a definition that is quite different from tensor. There may be an undiscovered relationship between the two in terms of transferring in and out of “shape DSL” land to a concrete representation, but I haven’t given it a lot of thought.
From my perspective, the interesting thing about this type is that it fills a gap in the existing type hierarchy and is coherent with it. We don’t want to switch to a type system with different semantics just to represent the metadata of a tensor as part of a regular program.
I suppose the other usage we see a lot tensor<?xindex> to represent a truly data-dependent sequence of dimensions. I think what the proposed abstract_tensor is trying to capture is not a sequence of dimensions so much as a partial shape specialization in the same way that tensor does.
abstract_tensor<3x4xf32> would yield shape dimensions of type tensor<2xindex> or a !shape.shape if bridging to a shape transfer function
abstract_tensor<*xf32> would yield shape dimensions of type tensor<?xindex>
abstract_tensor<2x?xf32> would yield shape dimensions of type tensor<2xindex> but representing as the latter exclusively would be a loss of information.
I imagine that with the type, we would then immediately find need of various casting ops to get in/out of these other adjacent representations.
This semantics difference is not clear to me from the doc describing the type: 'shape' Dialect - MLIR (could be a doc issue)
I think this RFc is also an opportunity to clarify the role of the shape dialect and what it includes, as much as I can sympathize with “We don’t want to switch to a type system with different semantics just to represent the metadata of a tensor as part of a regular program”, the counter part would be my concern that we create two parallel worlds that don’t inter-op seamlessly, that would make me question the shape dialect itself.
There might be good reasons for having two hermetic worlds, but I don’t feel I have seen it clearly spelled out right now.
Other aspect, if an abstract_tensor is the tensor without the data, it also raises the question of the “encoding” attribute.
We should definitely get the opinions of @jpienaar and @herhut. There is a lot of good and careful thought that has gone into the shape dialect for certain uses. I am aware of, but cannot fully represent, the viewpoints on this, which also include @_sean_silva. Iirc, in multiple attempts to use the !shape.shape type for this part of the problem, the sticking points were:
The “[invalid] for an invalid shape” part of the definition and whether you were dealing with the problem before or post validation of shapes. After a certain point, you assume that program constraints have been met and having such a high level type mixed in at those lower levels does not compose well.
As noted in this RFC and other cases I have seen in practice, the shape is often not needed on its own but in conjunction at least with the element type (and I am not sure about the encoding, as you point out).
The shape dialect is mixing three different concerns, none of which have in-tree uses, making it hard to reason about: definition of a shape type and supporting ops, a constraint system for establishing some invariants, and a concept of “shape libraries” for expressing shape transfer functions (i.e. for custom ops).
It is my understanding that the “shape library” functionality and related lowerings is what the rest exists in support of. If that is indeed the case, then I think these are different levels of the problem and we should be talking about how to interop between the different representations.
While it would be great to split up the linalg dialect, moving a single op this way really doesn’t address that. As you detail below, you have other objectives behind moving this op out. If you really would like to split up the linalg dialect , that could be discussed in a separate thread but I would recommend splitting along these lines as a starting point:
ops used for representational purposes for tensor/memref compute (named or generic)
ops being used for transformational purposes (generated by in-dialect passes, etc. and used by transformation pipelines)
There could be a finer split or another class but looks like ops used for transformational purposes were dumped into the same dialect. LinalgOps.cpp already takes 20 s to build (even in release mode and on a modern fast processor), and I believe this is really affecting build times for everyone including those who aren’t using linalg (for eg., changes to memref, scf, affine, or other dialects would slow the build critical path). I’m surprised that the file grew this big! There are of course other ways to split (for eg. TF uses tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cctensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc) but you might as well split the dialect.
A lot of discussion since yesterday, I only have a few moment now but will unpack more later
Making ShapedType an interface, +1, I did this knowing it was not optimal but we needed something. I think we could switch to an interface with 0 downstream user changes. Even changing Tensor etc post could have little downstream changes, but pushing on type interfaces so that ShapedType is simply swapped sounds great to me.
!shape.shape indeed is type as value part of the world. And in that world the shape is part of the value, so making it part of type has always felt awkward. E.g., like if one had tensor<2xi64, [?, 5]> and encoded part of the value in the type. Doesn’t mean we couldn’t add !shape.dims<?x5> or the like.
Splitting shape dialect sounds interesting here. Moving the types out into its own dialect, making it lightweight and 0 to minimal ops. Having tensor, vector and memref dependent on it (pushing on how we represent dependent dialects so that this isn’t a big change a long with that). The computation and constraint representation ops can be moved out then.
Then we could also move the dim attribute Mehdi has in a different change here and out of builtin
Thanks everyone for all the responses. I read through the whole thread, and will try to answer some of the points raised here. (Thanks @stellaraccident for responding to the concerns here in the meantime).
Having a ShapedType as an interface would be really good. I am happy to wait for that and use the interface for the abstract_tensor type. Indeed it layers better. In terms of putting it in builtin dialect, I said this dialect cause tensor is in this dialect. IMO, this type should be in the same dialect as tensor type, cause any place you would use tensor you would probably need to use abstract_tensor as well. So that was the intent. I am not super tied to putting this in builtin dialect. I am happy to put this in some other dialect (say tensor dialect) if that makes more sense.
Stella answered it to some extent. I cant find the words for it, but as she said there is a gap in the current system that the shape dialect does not seem to address. Thats what this RFC is trying to fill.
I am a bit apprehensive of trying to force fit what exists in shape dialect currently for this use case. Also +1 on clarifying the role of shape dialect (I would also like to know, really cause it is not entirely clear to me). Just hoping that we don’t get into a cyclic dependency between having to untangle things w.r.t shape dialect and this RFC. My understanding (as what Stella explained above) is that there is no redundancy here w.r.t what is in Shape dialect, and they are decoupled issues.
What is encoding attribute?
No it doesn’t solve the whole problem. Indeed, I am not the person to solve that problem. What is really at the core of this RFC is that I see linalg.init_tensor was added as a stop-gap solution cause we did not have the type that is proposed here. So its really technical debt. AFAICS, before we even start thinking about graduating things out of Linalg, this technical debt needs to be addressed. The uses of linalg.init_tensor have already gone past its initial use case. Without decoupling the current use cases and addressing them appropriately this will become a blocker to the overall effort of graduating things from Linalg. For example, couple of solution here which I think would be down a wrong path would be either defining an undef tensor (see all the issues w.r.t undef semantics in LLVM), or having side-effecting tensor ops to prevent CSE (which is what this RFC proposed).
That’d be great if it works out that way! I have tried this patch before and found it to spider a lot more than that, but that was a long time ago and the infra has improved (and I may have been holding it wrong).
I don’t quite get how is it different than how it is modeled on tensor: when the shape is fully encoded in the value, this is unranked (and there is a gradient where you can get some of the dimension dynamic).
I didn’t quite get this as clear from Stella’s previous posts so far (and Jacques’). Maybe I’ll re-read this over the week-end
+1 for making ShapedType an interface and moving MemrefType to memref and TensorType to tensor and VectorType to vector to clean up the layering of that (very) old legacy.
Regarding the proposal itself, wouldn’t we be better off adding an abstract encoding on the tensor type?
This would be almost completely transparent to the infrastructure and all transformations, without the need to deduplicate or extend specific ops .
This would avoid the confusion re. “the encoding of an abstract tensor”: abstract is the encoding.
The difference between an abstract tensor and a notional shape is that you can take a subset of an abstract tensor at a certain offset.
An abstract tensor could convert to a shape where needed but not the converse.
I don’t think it makes sense to take a shape within a shape at a certain offset?
I’m +1 on this as an end goal. Ultimately, I believe that we should have a stable, high level programming model/op set at its own layer of the stack. I don’t think we’re ready to take this apart yet, though. And if we were, I would want to spend a bit of time asking the question whether this is just a move or if we are reworking it along the way. Given that such an op set is destined to expand without limit, we might be better off planning for it and doing something that is more space and compile time efficient than generating full (highly redundant) c++ code for everything. Right now, the source of truth for all of that is a relatively simple yaml file, which then gets turned into something that takes a long time to compile. It seems like there may be a better way.
My understanding of the encoding is that everywhere a tensor would be used, the tensor:abstract would also be a valid use. I’d rather not do that. I think it would be better to be more explicit about where abstract_tensor is valid (instead of default valid everywhere). For example, for the transformations (like say tiling, which is probably the main thing that needs to be addressed), explicit handling of abstract_tensor to convert it to a real tensor is better IMO than it happening automatically. Will force us to think about the where a real tensor is instantiated (for now using linalg.init_tensor).
I think adding such an attribute would require us to tighten that up and be more explicit. Adding a type to the shaped type hierarchy is not free from this perspective either and will require an audit and tightening of constraints.
I’m lost right now: what does “adding an abstract encoding on the tensor type” mean?
Is it a type/attribute added to the tensor type? If so would we need an abstract_tensor type or would this abstract encoding be itself a type?
I don’t quite get how it makes it transparently used everywhere in the infrastructure either?
I dont see any major objections here. The main thing to start this work is to decide whether the move from ShapedType to an Type inference happens before adding the abstract_tensor type or not. @River707 I was under the impression that you already have a WIP patch for this. If this is the case, I can wait for your changes to land to add this type.
Just to clarify, I am planning to add this as a separate type and not as an encoding attribute on tensor type. I want to actually have no transformations work with abstract_tensor (at least to start with) and make the bridge from abstract_tensor to tensor explicit where needed.