Hi everyone! I’m testing the loopschedule dialect and I found this problem. So firstly I create this function with affine ops.
func.func @elementwise_add(%a: memref<4xi8>, %b: memref<4xi8>, %c: memref<4xi8>) {
affine.for %i = 0 to 4 {
%a_elem = affine.load %a[%i] : memref<4xi8>
%b_elem = affine.load %b[%i] : memref<4xi8>
%sum = arith.addi %a_elem, %b_elem : i8
affine.store %sum, %c[%i] : memref<4xi8>
}
return
}
After I run the pass I got this:
func.func @elementwise_add(%arg0: memref<4xi8>, %arg1: memref<4xi8>, %arg2: memref<4xi8>) {
%c0 = arith.constant 0 : index
%c4 = arith.constant 4 : index
%c1 = arith.constant 1 : index
loopschedule.pipeline II = 1 trip_count = 4 iter_args(%arg3 = %c0) : (index) -> () {
%0 = arith.cmpi ult, %arg3, %c4 : index
loopschedule.register %0 : i1
} do {
%0:3 = loopschedule.pipeline.stage start = 0 {
%2 = memref.load %arg0[%arg3] : memref<4xi8>
%3 = memref.load %arg1[%arg3] : memref<4xi8>
%4 = arith.addi %arg3, %c1 : index
loopschedule.register %2, %3, %4 : i8, i8, index
} : i8, i8, index
%1 = loopschedule.pipeline.stage start = 1 {
%2 = arith.addi %0#0, %0#1 : i8
memref.store %2, %arg2[%arg3] : memref<4xi8>
loopschedule.register %2 : i8
} : i8
loopschedule.terminator iter_args(%0#2), results() : (index) -> ()
}
return
}
, but I think this is actually incorrect. You can see here in the last stage, the memref.store is trying to use the value of addition result. And later with pass --lower-loopschedule-to-calyx
it will fail because of this. If I manually rewrite the generated code to this:
func.func @elementwise_add(%arg0: memref<4xi8>, %arg1: memref<4xi8>, %arg2: memref<4xi8>) {
%c0 = arith.constant 0 : index
%c4 = arith.constant 4 : index
%c1 = arith.constant 1 : index
loopschedule.pipeline II = 1 trip_count = 4 iter_args(%arg3 = %c0) : (index) -> () {
%0 = arith.cmpi ult, %arg3, %c4 : index
loopschedule.register %0 : i1
} do {
%0:3 = loopschedule.pipeline.stage start = 0 {
%2 = memref.load %arg0[%arg3] : memref<4xi8>
%3 = memref.load %arg1[%arg3] : memref<4xi8>
%4 = arith.addi %arg3, %c1 : index
loopschedule.register %2, %3, %4 : i8, i8, index
} : i8, i8, index
%1 = loopschedule.pipeline.stage start = 1 {
%2 = arith.addi %0#0, %0#1 : i8
// memref.store %2, %arg2[%arg3] : memref<4xi8>
loopschedule.register %2 : i8
} : i8
loopschedule.pipeline.stage start = 2 {
memref.store %1, %arg2[%arg3] : memref<4xi8>
loopschedule.register
}
loopschedule.terminator iter_args(%0#2), results() : (index) -> ()
}
return
}
, the pass lowering to calyx would work. I think you look at this problem.
Besides, I still have the problem of the following:
- The pass
-lower-calyx-to-fsm
won’t work because there is ParOp. - The pass
-lower-calyx-to-hw
won’t work because there is GroupOp and the calyx pass-calyx-remove-groups
won’t work either.