- Overview
This is a first step towards adding support for a new type known as “cooperative matrix” along with new built-in operations to the OpenCL C language as proposed by the Cooperative Matrix Extension by Khronos OpenCL work group. This new type and the subsequent built-in operations will allow for representing matrices and optimized matrix multiply operations in OpenCL.
- Background
A “cooperative matrix” type was first introduced in the VK_KHR_cooperative_matrix extension for Vulkan (VK_KHR_cooperative_matrix(3)). The extension added support for “cooperative matrix” types in SPIR-V which are primarily supported in compute shaders. Such types are used to represent matrices, the storage for which is spread across all invocations in some scope (usually a subgroup) and those invocations cooperate to efficiently perform matrix multiplies.
For the sake of brevity, the presentation from the Khronos group here Cooperative Matrix Multiply gives further explanation of the extension in detail.
An initial version of the OpenCL extension for cooperative matrices can be found here. Changes proposed in this RFC would add to the initial extension by enabling OpenCL C language support for cooperative matrices.
Also, we would like to acknowledge Imagination Technologies® for their contributions to this document.
- History of cooperative matrix in LLVM project
The SPV_KHR_cooperative_matrix is a SPIR-V extension introduced by Khronos. This extension is supported in the official LLVM SPIR-V backend. Our proposal aims to generate LLVM IR in such a way that it can be consumed by the SPIR-V backend without any further modifications.
- Proposed Approach
We propose changes to the clang OpenCL front-end to represent (and lower to LLVM IR) a new “cooperative matrix” type that can be defined in a kernel source along with built-in operations proposed by our extension. The following sections describe our proposed changes in detail.
4.1. Cooperative Matrix Type
A “cooperative matrix” type can be defined in a kernel source as:
<component_type> <var_name> __attribute__((coop_mat(_scope_, _row_, _column_, _use_)));
Example 1: float vmat1 __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 8, 8, CLK_COOPERATIVE_MATRIX_A)));
Example 2: typedef half chmat44 __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 4, 4, CLK_COOPERATIVE_MATRIX_B)));
chmat44 vmat2;
Where _attribute_ ((coop_mat(_scope_, _row_, _column_, _use_))) is our proposed custom type-attribute to declare matrix types with any trivial datatype already available to the OpenCL-C language as the base/component type. In the above example, base/component type is a scalar numerical type (float/half). The scope parameter is one of the supported memory scopes. The matrix type will be spread across all the invocations in this memory scope. Currently the only supported _scope_ value is:
CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP
Our implementation can be extended to add more memory scopes in future as needed.
_use_ parameter defines the use for the matrix variable and is one of following predefined Enums (introduced by the extension):
CLK_COOPERATIVE_MATRIX_A
CLK_COOPERATIVE_MATRIX_B
CLK_COOPERATIVE_MATRIX_ACCUMULATOR
4.2. Lowering to Clang AST
This custom attribute is then parsed and lowered to the Clang AST by deriving a class called clang::CooperativeMatrixType from the upstream clang::MatrixType. We have used clang::ConstantMatrixType class as reference and extended it to store _scope_ and _use_ information.
The extension introduces the __opencl_c_ext_cooperative_matrix feature name to guard the cooperative matrix type generation.
4.3. Lowering to LLVM IR
This cooperative matrix type is then lowered to the LLVM-IR utilizing the Target Extension Type as target(“spirv.CooperativeMatrixKHR”, , , , , ). We largely chose this representation to allow a cooperative matrix type in the IR to be a target agnostic type which can be lowered further according to a particular target architecture. Also, the name of this target extension type has been selected to be in sync with the existing implementation for cooperative matrix in the SPIR-V backend. For example, following is a definition of a cooperative matrix called m. LLVM IR representation of the data type for m is shown here.
typedef half matA __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 2, 4, CLK_COOPERATIVE_MATRIX_A)));
matA m;
target("spirv.CooperativeMatrixKHR", half, 2, 4, 0, 0) // LLVM IR representation of data type for m
4.4. Built-in Functions
The following Built-in Functions are introduced by the extension:
<coop_mat_type> coop_mat_load(const gentype *p, const coop_matrix_layout_t layout, const size_t stride) // Load a cooperative matrix from p.
void coop_mat_store(const gentype *p, <coop_mat_type> m, const coop_matrix_layout_t layout, const size_t stride) // Store a cooperative matrix to p.
<coop_mat_type> coop_matmul_add(<coop_mat_type> A, <coop_mat_type> B, <coop_mat_type> C, <coop_mat_operands> coop_mat_operand) // Matrix multiply of A by B and then component-wise add C.
As described in section 3.1 all the <coop_mat_type> are represented using the CooperativeMatrixType and then lowered to the Target Extension Type in LLVM-IR.
The layout argument indicates the layout of the matrix values in memory. It can accept one of the following predefined Enum values as introduced by the extension:
CLK_COOPERATIVE_MATRIX_LAYOUT_ROW_MAJOR
CLK_COOPERATIVE_MATRIX_LAYOUT_COLUMN_MAJOR
The coop_mat_operand argument is used to represent additional information about input and output matrices, and also about the operation being performed. Following predefined values for this operand are introduced in the extension:
CLK_COOPERATIVE_MATRIX_OPERAND_NONE
CLK_COOPERATIVE_MATRIX_OPERAND_MATRIX_A_SIGNED
CLK_COOPERATIVE_MATRIX_OPERAND_MATRIX_B_SIGNED
CLK_COOPERATIVE_MATRIX_OPERAND_MATRIX_C_SIGNED
CLK_COOPERATIVE_MATRIX_OPERAND_MATRIX_RESULT_SIGNED
CLK_COOPERATIVE_MATRIX_OPERAND_SATURATING_ACCUMULATION
4.5. Lowering to LLVM IR:
All operations are lowered to SPIR-V friendly LLVM IR builtin function calls. Again, the names of these functions have been selected to be in sync with the names found in the existing implementation for cooperative matrix in the SPIR-V backend.
coop_mat_load
typedef float matA __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 8, 4, CLK_COOPERATIVE_MATRIX_A)));
matA m;
m = coop_mat_load(A, CLK_COOPERATIVE_MATRIX_LAYOUT_ROW_MAJOR, 3);
// results in
%2 = call target("spirv.CooperativeMatrixKHR", float, 8, 4, 0, 0) @_Z32__spirv_CooperativeMatrixLoadKHR(ptr %1, i32 0, i32 3)
store target("spirv.CooperativeMatrixKHR", float, 8, 4, 0, 0) %2, ptr %m
coop_mat_store
typedef half matA __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 2, 4, CLK_COOPERATIVE_MATRIX_A)));
matA m;
coop_mat_store(A, m, CLK_COOPERATIVE_MATRIX_LAYOUT_ROW_MAJOR, 3);
// results in
%1 = load target("spirv.CooperativeMatrixKHR", half, 2, 4, 0, 0), ptr %m
call void @_Z33__spirv_CooperativeMatrixStoreKHR(ptr %2, target("spirv.CooperativeMatrixKHR", half, 2, 4, 0, 0) %1, i32 0, i32 3)
coop_matmul_add
typedef float matA __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 4, 8, CLK_COOPERATIVE_MATRIX_A)));
typedef float matB __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 8, 2, CLK_COOPERATIVE_MATRIX_B)));
typedef float matC __attribute__((coop_mat(CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP, 4, 2, CLK_COOPERATIVE_MATRIX_ACCUMULATOR)));
matA a;
matB b;
matC c, d;
d = coop_matmul_add(a, b, c, CLK_COOPERATIVE_MATRIX_OPERAND_MATRIX_RESULT_SIGNED);
// results in
%3 = call target("spirv.CooperativeMatrixKHR", float, 4, 2, 0, 2) @_Z34__spirv_CooperativeMatrixMulAddKHR(target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) %0, target("spirv.CooperativeMatrixKHR", float, 8, 2, 0, 1) %1, target("spirv.CooperativeMatrixKHR", float, 4, 2, 0, 2) %2, i8 8)
store target("spirv.CooperativeMatrixKHR", float, 4, 2, 0, 2) %3, ptr %d
4.6. Operators
The supported list of operators on a cooperative matrix type include arithmetic binary operators i.e. add (+), subtract (-), multiply (*), and divide (/) along with the arithmetic unary operator negate (-). Each operation is performed component-wise with the operands having identical types. The arithmetic binary operator multiply (*) however can also be used on a cooperative matrix type and a scalar with the scalar type matching the component type of the matrix. Each operation is then parsed and lowered to LLVM-IR as built-in function calls.
Example:
The example code snippet below shows how supported operators on a “cooperative matrix” type are lowered into LLVM-IR. We again use the same naming strategy as earlier.
typedef float matA __attribute__((coop_mat("CLK_COOPERATIVE_MATRIX_SCOPE_SUBGROUP", 4, 8, "CLK_COOPERATIVE_MATRIX_A")));
matA c = a * 4;
c = -c;
c = c + c;
// Results in
%2 = call target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) @_Z34__spirv_CooperativeMatrixScalarMul(target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) % 1, i32 4)
store target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) %2, ptr %c
%4 = call target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) @_Z34__spirv_CooperativeMatrixScalarNeg(target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) %3)
store target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) %4, ptr %c
%6 = load target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0), ptr %c
%7 = call target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) @_Z34__spirv_CooperativeMatrixBinaryAdd(target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) %5, target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) %6)
store target("spirv.CooperativeMatrixKHR", float, 4, 8, 0, 0) %7, ptr %c
- Next steps
Submission of a patch that covers the following:
-
Extension of existing clang::MatrixType class to represent co-operative matrix. It will be great to get feedback from the community if there are any alternate approaches, or if a new class will be preferred.
-
Lowering to the LLVM-IR utilizing the Target Extension Type.
-
Introduction of OpenCL CTS tests and LIT tests to support this implementation.
The implementation will be introduced and managed by multiple members of the Khronos OpenCL work group, including, but not limited to, Imagination Technology and Qualcomm Inc.
Thanks