matrix type conversion

Hi Florian (+cfe-dev for visibilty),

I was playing a little bit with Clang matrix language extension and and wanted to check with you to see if I am not missing something about the matrix type conversion. The draft spec says:

"A value of matrix type can be converted to another matrix type if the number of rows and columns are the same and the value’s elements can be converted to the element type of the result type. "

I have tried a different variants of this:

typedef char m2x8_t attribute((matrix_type(2, 8)));
typedef char m8x2_t attribute((matrix_type(8, 2)));
typedef char m2x2_char_t attribute((matrix_type(2, 2)));
typedef int m2x2_int_t attribute((matrix_type(2, 2)));

m2x2_int_t f(m2x8_t a, m8x2_t b) {
return static_cast<m2x2_int_t>(a *b);
}

but am getting errors that this conversion is not allowed. Unless I am doing something very silly here, I am guessing that this because the matrix extension is work in progress?

The draft spec also says that implicit conversions don’t apply, but that would perhaps be convenient? But I haven’t given this any thoughts yet though if that could be problematic.

Moving on a bit to lowering this to te matrix multiply intrinsics. I think it would be convenient if the matrix multiply can accumulate in a wider type (because that’s what some instructions do). While there are probably different approaches possible, the llvm intrinsic has the vector type for the return value and its arguments:

vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, ...)

So perhaps we can relax this?

Cheers,
Sjoerd.

Hi,

Hi Florian (+cfe-dev for visibilty),

I was playing a little bit with Clang matrix language extension and and wanted to check with you to see if I am not missing something about the matrix type conversion. The draft spec says:

"A value of matrix type can be converted to another matrix type if the number of rows and columns are the same and the value’s elements can be converted to the element type of the result type. "

I have tried a different variants of this:

typedef char m2x8_t attribute((matrix_type(2, 8)));
typedef char m8x2_t attribute((matrix_type(8, 2)));
typedef char m2x2_char_t attribute((matrix_type(2, 2)));
typedef int m2x2_int_t attribute((matrix_type(2, 2)));

m2x2_int_t f(m2x8_t a, m8x2_t b) {
return static_cast<m2x2_int_t>(a *b);
}

but am getting errors that this conversion is not allowed. Unless I am doing something very silly here, I am guessing that this because the matrix extension is work in progress?

This should work according to the spec, but the conversion has not been implemented yet I think. I’ve created https://bugs.llvm.org/show_bug.cgi?id=47141 and linked it to https://bugs.llvm.org/show_bug.cgi?id=46163 which should act as an umbrella issue to track the missing pieces.

The draft spec also says that implicit conversions don’t apply, but that would perhaps be convenient? But I haven’t given this any thoughts yet though if that could be problematic.

I think currently we match the behavior for vector types and only convert scalar operands for binary operators implicitly to matrixes. If there’s a strong need for implicit conversions, this is certainly something that can be revisited.

Moving on a bit to lowering this to te matrix multiply intrinsics. I think it would be convenient if the matrix multiply can accumulate in a wider type (because that’s what some instructions do). While there are probably different approaches possible, the llvm intrinsic has the vector type for the return value and its arguments:

vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, ...)

So perhaps we can relax this?

Yes we can certainly extend this, to allow use cases to map to hardware instructions that implement an extension step, like AAch64’s udot. IIRC it extends the sums, which I think would make the most sense to use, as otherwise it should be sufficient to extend the operands/result.

I think the more interesting question here would be how this fits into the C/C++ spec. I guess it would be possible to specify it so a multiply that gets extended lowers to the widening intrinsic, but this would seem quite surprising/awkward. In my opinion, a separate builtin would be a cleaner solution.

Cheers,
Florian

Hi,

This should work according to the spec, but the conversion has not been implemented yet I think. I’ve created https://bugs.llvm.org/show_bug.cgi?id=47141 and linked it to https://bugs.llvm.org/show_bug.cgi?id=46163 which should act as an umbrella issue to track the missing pieces.

Ah, I was unaware of that umbrella ticket. Thanks for that, and for raising the ticket.

I think currently we match the behavior for vector types and only convert scalar operands for binary operators implicitly to matrixes. If there’s a strong need for implicit conversions, this is certainly something that can be revisited.

Not sure if there’s a strong need, but from writing my first examples yesterday, I can see that it would be convenient and possibly cleaner too (i.e. less text/clutter). I am not sure about this one, but it’s also what people would expect perhaps?

Yes we can certainly extend this, to allow use cases to map to hardware instructions that implement an extension step, like AAch64’s udot. IIRC it extends the sums, which I think would make the most sense to use, as otherwise it should be sufficient to extend the operands/result.

Yes, or the v8.6 matrix multiply accumulate instructions which multiply 8 bit values and store them to 32-bits.

I think the more interesting question here would be how this fits into the C/C++ spec. I guess it would be possible to specify it so a multiply that gets extended lowers to the widening intrinsic, but this would seem quite surprising/awkward. I

As I said, I haven’t given this too much thought yet, so just for my understanding, what exactly is the surprising/awkward bit of the C/C++ spec here? I was guessing that the assignment of a result from a matrix operation, using an implicit/explicit conversion, would take care of this?

Cheers,
Sjoerd.

Hi,

This should work according to the spec, but the conversion has not been implemented yet I think. I’ve created https://bugs.llvm.org/show_bug.cgi?id=47141 and linked it to https://bugs.llvm.org/show_bug.cgi?id=46163 which should act as an umbrella issue to track the missing pieces.

Ah, I was unaware of that umbrella ticket. Thanks for that, and for raising the ticket.

I think currently we match the behavior for vector types and only convert scalar operands for binary operators implicitly to matrixes. If there’s a strong need for implicit conversions, this is certainly something that can be revisited.

Not sure if there’s a strong need, but from writing my first examples yesterday, I can see that it would be convenient and possibly cleaner too (i.e. less text/clutter). I am not sure about this one, but it’s also what people would expect perhaps?

I think as part of the discussion for the RFC we decided for being more explicit. But as I said, if there’s consensus that it would be better to provide implicit conversion, it is easy to change. But it would be good to implement explicit conversion to start with :wink:

Yes we can certainly extend this, to allow use cases to map to hardware instructions that implement an extension step, like AAch64’s udot. IIRC it extends the sums, which I think would make the most sense to use, as otherwise it should be sufficient to extend the operands/result.

Yes, or the v8.6 matrix multiply accumulate instructions which multiply 8 bit values and store them to 32-bits.

Oh right, I just had a look at those. It seems like the matrix multiply accumulate instructions widen the result of the matrix multiplication. I don’t think we need any changes to the intrinsic to model that. We should be able to model this by just extending the result vector of the matrix multiplication. And the extension instructions would be generated naturally from implicit/explicit conversion to a a matrix with wider element type.

What I was referring to in the statement below was related to instructions where the results of the intermediate multiplications get widened, which are then accumulated using the wider type. To model that, I think we would need a ‘widening’ version of the matrix multiply intrinsic. And mapping this extension ‘in the middle’ to implicit/explicit conversion of the final result would be confusing/surprising IMO. But I might be missing something.

I think the more interesting question here would be how this fits into the C/C++ spec. I guess it would be possible to specify it so a multiply that gets extended lowers to the widening intrinsic, but this would seem quite surprising/awkward. I

As I said, I haven’t given this too much thought yet, so just for my understanding, what exactly is the surprising/awkward bit of the C/C++ spec here? I was guessing that the assignment of a result from a matrix operation, using an implicit/explicit conversion, would take care of this?

Cheers,
Florian

This topic also came up during the Q&A for the "Matrix Support in Clang and LLVM” talk. I filed https://bugs.llvm.org/show_bug.cgi?id=47764.