Hi, @stevenvar and I are taking over @giuseros work about SVE handling in MLIR. (Linalg and masking). His work was targeting Axpy but we would like now to target multi-ranked operations.
I managed to generate llvm dialect with the first example running mlir-opt with -lower-vector-mask -test-vector-transferop-opt -convert-vector-to-llvm="enable-arm-sve" -convert-func-to-llvm. For the first example, I get this output which looks good to me.
llvm.func @lower_scalable(%arg0: i64, %arg1: !llvm.array<4 x vector<[4]xf32>>) -> !llvm.array<4 x vector<[4]xf32>> {
llvm.return %arg1 : !llvm.array<4 x vector<[4]xf32>>
}
But mlir-translate --mlir-to-llvmir will fail saying those vector types are not suitable. Indeed, we have :
We thought about implementing such feature. Inspired by the VectorFMAOpNDRewritePattern in ConvertVectorToLLVM.cpp. but it does not completely gets rid of the first dimension, it just extracts 1-D vectors out of it. You’d still have somewhere a multi-dimensionnal scalable vector. It also brings another problem for types where several ranks are scalable such as vector<[2x8]xi8> . Would you unroll it twice and then have “mask on the execution” ?
I don’t know the details yet. At higher abstraction you probably need an extract op to reduce dimensionality. There is an extract op for the interplay between scalable and fixed length vectors.
To target LLVM, you have to find a solution to find 1-D scalable vectors with predicates. Write the whole thing into memory and read 1-D scalable vectors from it?!?.
Hello, I have follow-up question about the lowering to LLVM: are LLVM arrays containing multiple scalable vectors really conceptually impossible to have?
I am far from an expert in the specifics of the LLVM backend, so I wonder if this limitation would be easily (or not) removed, and if allowing arrays of scalable vectors is feasible? Surely if structs can have it then arrays can have it too?
EDIT: From what I understand the decision of rejecting scalable vectors inside structs and array was mainly motivated by performance issues (cf. D64079 Scalable Vector IR Type (Try 3) (llvm.org) ), rather than an actual conceptual reason, is that correct?
There’s no conceptual reason not to allow arrays and struct of scalable vectors but there is a significant body of code that assumes the offset required to walk such data structures (especially when considering structs that contain both scalable and fixed sized types) can be represented as a single integer. Whilst this is fixable the cost exceeded the benefit, especially given we felt we’d be unlikely to get a similar change into C/C++.
D94142 is really a soft enablement of supporting structs of scalable vectors because there’s a requirement that such structs are effectively in-register only. This was done to allow multi-vector returns without having to resort to using silly vector types that cause unnecessary complication during code generation. The expectation being the only supported operations for such types are getelement and setelement.