Linal.generic on tensor seems to have a weird semantics

Cool, thanks for your questions! Here are a few more detailed answers.
Note that we also plan some ODM presentations about those topics once we have a good grasp on everything. Probably within the next 2 months.

Because 1) tensors are immutable and 2) it is needed to preserve SSA use-def chain properties.
In tensor world, the mechanisms are very similar to operations in the vector dialect, llvm.struct and llvm.vector for instance.
In fact, I have found that thinking of tensors like “very big vectors that have to go through memory to avoid catastrophic spills when we reach the register level” is a good first approximation. Here is a post with more details about how transformations operate on these abstractions. In the grander picture, these
compiler abstractions are designed with a transformations-first model in mind.

Regarding linalg-bufferize and other passes, I am not fully up to speed with what their evolution these days. Last I looked I found too many surprising behaviors that I was not able to unify for my purpose (codegen transformations with strong inplace guarantees). These days the thinking is that there is a notion of a) graph level bufferization that requires refcounting, does not obey scoping rules and is more conservative and b) a high performance version with inplace guarantees that works well in the presence of tiling, fusion and other transformations on tensor SSA-values but is less applicable.

Yes, that is one of the key insights that ComprehensiveBufferize uses, in addition to keeping SSA use-def chains as buffers are created. You can view this as the inverse design decision from “allocate conservatively and perform later optimization based on alias analysis”. The posts/RFCs linked above explain these in more detail.

This is a very good point/distinction: linalg op have “inplaceable” semantics in addition to the “value” semantics. This means the op semantics defines that an operand and a result “can be the same thing after bufferization”: this is a weird way to say the same “runtime type” (i.e. the data structures and all sizes of everything are bitwise equal / mirror images of each other).

As to your question: we cannot use full “inplace” semantics because deciding whether a tensor can be bufferized inplace in a given program is a global property of the inplace decisions made for surrounding ops: it is easy to shoot oneself in the foot and write incorrect programs. Instead, the comprehensive bufferization process allocates things inplace greedily.

Yes, as I mentioned above, the 2 are orthogonal but still related. The decision on the design of linalg on tensors comes from the need for transformations on tensors.

Yes this is the case in practice: linalg.inplaceble is an attribute that is not part of op semantics and is only used for bufferization. The problem is the one I mentioned in reply to Mehdi’s comment: the interface between MLIR and the outside world is not yet defined on tensors (ABI / API level). So for the entry (e.g. C++ caller) and exit points (e.g. calls from MLIR into library calls) of non-MLIR parts of the program, we cannot make magic decisions and this bufferization-specific annotation is needed to tell comprehensive bufferization what to do at these function boundaries. This is an implementation detail that leaks until the ABI on tensors is defined in MLIR. The alternatives are either a) external assumptions (like IREE and XLA have) or b) miscompiles.

Bonus point: re controlling inplace / inplaceable decisions. In principle, comprehensive bufferize is modular enough that you could want to control what gets bufferized inplace by annotating the ops yourself and calling the pass to fill the blanks and perform the actual allocations. I’ve been using that mode for debugging purposes in very limited cases. If that sounds like something that could be useful, it may be possible to harden and make available as a feature.