Before actual MLIR code, I found it easier to get the code generation structure right by simply emitting pseudo-code on the debugging output.
The kernel:
x(i) = a(i) + b(i) * c(i)
looks as generic Linalg operation as follows:
func @generic_op_vec1d(%arga: tensor<?xf32>,
%argb: tensor<?xf32>,
%argc: tensor<?xf32>) -> tensor<?xf32> {
%0 = linalg.generic #trait_vec1d
ins(%arga,
%argb,
%argc : tensor<?xf32>, tensor<?xf32>, tensor<?xf32>) {
^bb(%a: f32, %b: f32, %c : f32):
%0 = mulf %b, %c : f32
%1 = addf %a, %0 : f32
linalg.yield %1 : f32
} -> tensor<?xf32>
return %0 : tensor<?xf32>
}
The newly implemented code generation traverses the generated merge lattices (from SSA form) recursively to generate the following pseudo-code on the debugging output:
while ( i_00 i_01 i_02 )
if ( i_00 i_01 i_02 )
tensor_out := (tensor_0 + (tensor_1 * tensor_2));
if ( i_01 i_02 )
tensor_out := (tensor_1 * tensor_2);
if ( i_00 )
tensor_out := tensor_0;
while ( i_01 i_02 )
if ( i_01 i_02 )
tensor_out := (tensor_1 * tensor_2);
while ( i_00 )
tensor_out := tensor_0;
A more elaborate kernel like:
x(i) = a(i) + b(i) + c(i)
Generates the following:
while ( i_00 i_01 i_02 )
if ( i_00 i_01 i_02 )
tensor_out := (tensor_2 + (tensor_0 + tensor_1));
if ( i_00 i_02 )
tensor_out := (tensor_2 + tensor_0);
if ( i_01 i_02 )
tensor_out := (tensor_2 + tensor_1);
if ( i_00 i_01 )
tensor_out := (tensor_0 + tensor_1);
if ( i_02 )
tensor_out := tensor_2;
if ( i_00 )
tensor_out := tensor_0;
if ( i_01 )
tensor_out := tensor_1;
while ( i_00 i_02 )
if ( i_00 i_02 )
tensor_out := (tensor_2 + tensor_0);
if ( i_02 )
tensor_out := tensor_2;
if ( i_00 )
tensor_out := tensor_0;
while ( i_01 i_02 )
if ( i_01 i_02 )
tensor_out := (tensor_2 + tensor_1);
if ( i_02 )
tensor_out := tensor_2;
if ( i_01 )
tensor_out := tensor_1;
while ( i_00 i_01 )
if ( i_00 i_01 )
tensor_out := (tensor_0 + tensor_1);
if ( i_00 )
tensor_out := tensor_0;
if ( i_01 )
tensor_out := tensor_1;
while ( i_02 )
tensor_out := tensor_2;
while ( i_00 )
tensor_out := tensor_0;
while ( i_01 )
tensor_out := tensor_1;
Since these look good to me (does anyone spot something strange?), now it needs to be linked to an actual MLIR representation of the tensors in either dense or sparse format. This will probably take a while…