Performance comparison with Simplified and Inlined intrinsics

Leporacanthicus · August 4, 2022, 4:21pm

Running the SNAP application as a benchmark here, on my x86-64 based build machine, with modified 2d_mms_st.inp (nx = ny = 80 instead of 20) as input, and using release build of flang-new -O3 to compile the source files in both cases. This is measured with OMP=off, MPI=off.

I have a small fix where I’ve swapped two lines to fix [FLANG] Use after free in "mask" when using WHERE construct. · Issue #56921 · llvm/llvm-project · GitHub

Difference percentage is Optimised SUM divided by Standard with percentage formatting.

The overall summary is that “Significant improvement on the overall time it takes to run the benchmark”, in line with earlier measurements.

I ran both versions a few times, and all of the times vary up and down by a percent or two, but the overall difference is solidly around 57% of the Standard, and roughtly 37% in the Inner Iterations (where you’d expect SUM to make a difference).

Measure	Standard	Optimised SUM	Difference %
Parallel Setup	0.000005	0.000005	103.98%
Input	0.000233	0.000237	101.75%
Setup	3.596800	3.526600	98.05%
Solve	8.155600	3.114100	38.18%
Parameter Setup	0.022620	0.022232	98.28%
Outer Source	0.068606	0.068472	99.80%
Inner Iterations	8.061600	3.020500	37.47%
Inner Source	0.033129	0.032798	99.00%
Transport Sweeps	8.017600	2.977000	37.13%
Inner Misc Ops	0.010913	0.010674	97.81%
Solution Misc Ops	0.002813	0.002879	102.35%
Output	0.300240	0.288240	96.00%
Total Execution time	12.059000	6.935800	57.52%

The patch for making SUM inline is here (it just makes an inlineable/simplified version, MLIR and/or LLVM IR optimisation inlines the function once it’s available as MLIR)
https://reviews.llvm.org/D125407

tschuett · August 4, 2022, 8:38pm

Great work!

The LLD guys use ministat to show performance improvements.
https://www.freebsd.org/cgi/man.cgi?query=ministat

Topic		Replies	Views
RFC: How to inline Fortran inrinsics Flang	44	1360	August 4, 2022
SNAP Performance analysis, more detailed than the presentation Flang	21	1446	July 21, 2022
Status of Flang's Optimization Flang	11	1365	December 4, 2023
Adding intrinsics support in flang Flang	2	192	February 5, 2024
Performance analysis for TSVC Flang	13	769	October 3, 2024

Performance comparison with Simplified and Inlined intrinsics

Related topics