Running the SNAP application as a benchmark here, on my x86-64 based build machine, with modified 2d_mms_st.inp (nx = ny = 80 instead of 20) as input, and using release build of flang-new -O3
to compile the source files in both cases. This is measured with OMP=off, MPI=off.
I have a small fix where I’ve swapped two lines to fix [FLANG] Use after free in "mask" when using WHERE construct. · Issue #56921 · llvm/llvm-project · GitHub
Difference percentage is Optimised SUM divided by Standard with percentage formatting.
The overall summary is that “Significant improvement on the overall time it takes to run the benchmark”, in line with earlier measurements.
I ran both versions a few times, and all of the times vary up and down by a percent or two, but the overall difference is solidly around 57% of the Standard, and roughtly 37% in the Inner Iterations (where you’d expect SUM
to make a difference).
Measure | Standard | Optimised SUM | Difference % |
---|---|---|---|
Parallel Setup | 0.000005 | 0.000005 | 103.98% |
Input | 0.000233 | 0.000237 | 101.75% |
Setup | 3.596800 | 3.526600 | 98.05% |
Solve | 8.155600 | 3.114100 | 38.18% |
Parameter Setup | 0.022620 | 0.022232 | 98.28% |
Outer Source | 0.068606 | 0.068472 | 99.80% |
Inner Iterations | 8.061600 | 3.020500 | 37.47% |
Inner Source | 0.033129 | 0.032798 | 99.00% |
Transport Sweeps | 8.017600 | 2.977000 | 37.13% |
Inner Misc Ops | 0.010913 | 0.010674 | 97.81% |
Solution Misc Ops | 0.002813 | 0.002879 | 102.35% |
Output | 0.300240 | 0.288240 | 96.00% |
Total Execution time | 12.059000 | 6.935800 | 57.52% |
The patch for making SUM inline is here (it just makes an inlineable/simplified version, MLIR and/or LLVM IR optimisation inlines the function once it’s available as MLIR)
https://reviews.llvm.org/D125407