I was playing a little bit with Clang matrix language extension and and wanted to check with you to see if I am not missing something about the matrix type conversion. The draft spec says:

"A value of matrix type can be converted to another matrix type if the number of rows and columns are the same and the value’s elements can be converted to the element type of the result type. "

m2x2_int_t f(m2x8_t a, m8x2_t b) {
return static_cast<m2x2_int_t>(a *b);
}

but am getting errors that this conversion is not allowed. Unless I am doing something very silly here, I am guessing that this because the matrix extension is work in progress?

The draft spec also says that implicit conversions don’t apply, but that would perhaps be convenient? But I haven’t given this any thoughts yet though if that could be problematic.

Moving on a bit to lowering this to te matrix multiply intrinsics. I think it would be convenient if the matrix multiply can accumulate in a wider type (because that’s what some instructions do). While there are probably different approaches possible, the llvm intrinsic has the vector type for the return value and its arguments:

I was playing a little bit with Clang matrix language extension and and wanted to check with you to see if I am not missing something about the matrix type conversion. The draft spec says:

"A value of matrix type can be converted to another matrix type if the number of rows and columns are the same and the value’s elements can be converted to the element type of the result type. "

m2x2_int_t f(m2x8_t a, m8x2_t b) {
return static_cast<m2x2_int_t>(a *b);
}

but am getting errors that this conversion is not allowed. Unless I am doing something very silly here, I am guessing that this because the matrix extension is work in progress?

The draft spec also says that implicit conversions don’t apply, but that would perhaps be convenient? But I haven’t given this any thoughts yet though if that could be problematic.

I think currently we match the behavior for vector types and only convert scalar operands for binary operators implicitly to matrixes. If there’s a strong need for implicit conversions, this is certainly something that can be revisited.

Moving on a bit to lowering this to te matrix multiply intrinsics. I think it would be convenient if the matrix multiply can accumulate in a wider type (because that’s what some instructions do). While there are probably different approaches possible, the llvm intrinsic has the vector type for the return value and its arguments:

Yes we can certainly extend this, to allow use cases to map to hardware instructions that implement an extension step, like AAch64’s udot. IIRC it extends the sums, which I think would make the most sense to use, as otherwise it should be sufficient to extend the operands/result.

I think the more interesting question here would be how this fits into the C/C++ spec. I guess it would be possible to specify it so a multiply that gets extended lowers to the widening intrinsic, but this would seem quite surprising/awkward. In my opinion, a separate builtin would be a cleaner solution.

Ah, I was unaware of that umbrella ticket. Thanks for that, and for raising the ticket.

I think currently we match the behavior for vector types and only convert scalar operands for binary operators implicitly to matrixes. If there’s a strong need for implicit conversions, this is certainly something that can be revisited.

Not sure if there’s a strong need, but from writing my first examples yesterday, I can see that it would be convenient and possibly cleaner too (i.e. less text/clutter). I am not sure about this one, but it’s also what people would expect perhaps?

Yes we can certainly extend this, to allow use cases to map to hardware instructions that implement an extension step, like AAch64’s udot. IIRC it extends the sums, which I think would make the most sense to use, as otherwise it should be sufficient to extend the operands/result.

Yes, or the v8.6 matrix multiply accumulate instructions which multiply 8 bit values and store them to 32-bits.

I think the more interesting question here would be how this fits into the C/C++ spec. I guess it would be possible to specify it so a multiply that gets extended lowers to the widening intrinsic, but this would seem quite surprising/awkward. I

As I said, I haven’t given this too much thought yet, so just for my understanding, what exactly is the surprising/awkward bit of the C/C++ spec here? I was guessing that the assignment of a result from a matrix operation, using an implicit/explicit conversion, would take care of this?

Ah, I was unaware of that umbrella ticket. Thanks for that, and for raising the ticket.

I think currently we match the behavior for vector types and only convert scalar operands for binary operators implicitly to matrixes. If there’s a strong need for implicit conversions, this is certainly something that can be revisited.

Not sure if there’s a strong need, but from writing my first examples yesterday, I can see that it would be convenient and possibly cleaner too (i.e. less text/clutter). I am not sure about this one, but it’s also what people would expect perhaps?

I think as part of the discussion for the RFC we decided for being more explicit. But as I said, if there’s consensus that it would be better to provide implicit conversion, it is easy to change. But it would be good to implement explicit conversion to start with

Yes we can certainly extend this, to allow use cases to map to hardware instructions that implement an extension step, like AAch64’s udot. IIRC it extends the sums, which I think would make the most sense to use, as otherwise it should be sufficient to extend the operands/result.

Yes, or the v8.6 matrix multiply accumulate instructions which multiply 8 bit values and store them to 32-bits.

Oh right, I just had a look at those. It seems like the matrix multiply accumulate instructions widen the result of the matrix multiplication. I don’t think we need any changes to the intrinsic to model that. We should be able to model this by just extending the result vector of the matrix multiplication. And the extension instructions would be generated naturally from implicit/explicit conversion to a a matrix with wider element type.

What I was referring to in the statement below was related to instructions where the results of the intermediate multiplications get widened, which are then accumulated using the wider type. To model that, I think we would need a ‘widening’ version of the matrix multiply intrinsic. And mapping this extension ‘in the middle’ to implicit/explicit conversion of the final result would be confusing/surprising IMO. But I might be missing something.

I think the more interesting question here would be how this fits into the C/C++ spec. I guess it would be possible to specify it so a multiply that gets extended lowers to the widening intrinsic, but this would seem quite surprising/awkward. I

As I said, I haven’t given this too much thought yet, so just for my understanding, what exactly is the surprising/awkward bit of the C/C++ spec here? I was guessing that the assignment of a result from a matrix operation, using an implicit/explicit conversion, would take care of this?