In the context of lowering the shape dialect, we have come across the issue that we need to create a small tensor of index values with dynamic length to represent the shape of an unranked array. Currently, we use a workaround like

```
%1 = alloca(%0) : memref<?xindex>
scf.for %arg1 = %c0 to %0 step %c1 {
%9 = dim %arg0, %arg1 : tensor<*xf32>
store %9, %1[%arg1] : memref<?xindex>
}
%2 = tensor_load %1 : memref<?xindex>
```

to model this. However, this introduces memrefs at an early stage, which we would like to avoid. It is also difficult to lower this correctly, as semantically the `tensor_load`

makes a copy of the provided memref, leading to extra allocations in the lowering that we would need to optimize away.

For static sized tensors, standard already has a `tensor_from_elements`

operation but that cannot be used here.

We could model this similar to LLVM by introducing an `std.undef`

operation to create a tensor of undefined values and `std.insert_element`

to fill said tensor but that also is inconvenient to lower as it requires some analysis to avoid temporaries and also would introduce undefined values.

Instead, we could have a more tailored operation like

```
%2 = generate_tensor %0 {
^bb0(%arg1 : index):
%9 = dim %arg0, %arg1 : tensor<*xf32>
yield %9 : index
} : tensor<?xindex>
```

The operands to `generate_tensor`

specify the dynamic sizes of the result and the body region defines the value at each position of the tensor.

This form is fairly easy to lower (maps directly to an `scf.for`

nest) and also allows nice canonicalizations with `extract_element`

. For example, we have a use case

```
%2 = generate_tensor %0 {
^bb0(%arg1 : index):
%9 = dim %arg0, %arg1 : tensor<*xf32>
yield %9 : index
} : tensor<?xindex>
%3 = dim %2, %c0 : tensor<?xindex>
%4 = scf.for %arg1 = %c0 to %3 step %c1
iter_args(%arg2 = %c1) -> (index) {
%9 = extract_element %2[%arg1] : tensor<?xindex>
%10 = muli %9, %arg2 : index
scf.yield %10 : index
}
```

where the `extract_element`

can be trivially canonicalized into `dim %arg0, %arg1`

(assuming `extract_element`

has undefined behavior for out-of-bounds reads).

We also avoid undefined values and tackling optimizing updates on tensors.