The tensor dialect has two operations that create a new tensor with specified contents:

a) `tensor.generate`

b) `tensor.from_elements`

During bufferization, these are lowered to:

a) a buffer allocation and an `scf.parallel`

loop filling the buffer

b) a buffer allocation and sequence of stores into the buffer

These lowerings jump abstraction gaps. In particular, the `tensor.generate`

lowering parallelizes the code. Some users may even prefer a sequential version using `scf.for`

(or maybe yet another lowering). In the case of `tensor.from_elements`

, there may be vectorization opportunities instead of creating scalar stores.

These issue could be solved by introducing two new ops in the memref dialect:

a) `memref.generate`

b) `memref.from_elements`

The exact naming of these ops could be different. The key point is that both have the same semantics as their respective tensor counterparts, but operate on a memref value that is passed into the op as an additional operand.

Example:

```
func @tensor.generate_static_and_dynamic(%arg0: index) -> tensor<16x?xindex> {
%result = tensor.generate %arg0 {
^bb0(%i: index, %j: index):
%sum = arith.addi %i, %j : index
tensor.yield %sum : index
} : tensor<16x?xindex>
return %result : tensor<16x?xindex>
}
```

During bufferization, this would lower to:

```
func @tensor.generate_static_and_dynamic(%arg0: index) -> memref<16x?xindex> {
%result = memref.alloc(%arg0) : memref<16x?xindex>
memref.generate %result {
^bb0(%i: index, %j: index):
%sum = arith.addi %i, %j : index
memref.yield %sum : index
} : memref<16x?xindex>
return %result : memref<16x?xindex>
}
```

Another pass could then lower the `memref.generate`

to `scf.parallel`

or `scf.for`

. We currently have `BufferizeGenerateOp`

, which lowers from `tensor.generate`

all the way to `memref.alloc`

+ `scf.parallel`

, so we would effectively split this pattern into two patterns and add a new op for the intermediate step.

What are your opinions on this?