Dear all,

Imagine we have following code:

1 #define ny 10

2 #define Batch_Size 10

3

4 typedef float data_t;

5

6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]);

7

8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size],

9 data_t out[ny][Batch_Size]) {

10

11 data_t max[Batch_Size];

12

13 SA_MAX2:

14 for (int i = 0; i < Batch_Size; i++) {

15 max[i] = 0;

16 SA_MAX1:

17 for (int j = 0; j < ny; j++) {

18 if (l_Z2[j][i] > max[i])

19 max[i] = l_Z2[j][i];

20 }

21 }

22 foo(out, max);

23 }

we can see ‘max[i]’ is an invariant variable to loop ‘SA_MAX1’, so I want to know which pass can following following transformation/optimization:

1 #define ny 10

2 #define Batch_Size 10

3

4 typedef float data_t;

5

6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]);

7

8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size],

9 data_t out[ny][Batch_Size]) {

10

11 data_t max[Batch_Size];

12

13 SA_MAX2:

14 for (int i = 0; i < Batch_Size; i++) {

15 data_t Max = 0;

16 SA_MAX1:

17 for (int j = 0; j < ny; j++) {

18 if (l_Z2[j][i] > Max)

19 Max = l_Z2[j][i];

20 }

21 max[i] = Max;

22 }

23 foo(out, max);

24 }

Which will use a local scalar ‘Max’ to replace the original ‘max[i]’, and sink the original write out of the loop ‘SA_MAX1’.

I did some experiment with godbolt, looks like currently we don’t have such kind of optimization.

https://godbolt.org/z/9PK3hYvPs

Do you know which pass can do this? Or it’s not necessary for CPU?

Thanks,

Fangqing

Xilinx Inc.