Old bug, but I decided to use some modern hardware to do some analysis on it for fun.. I updated the Bugzilla report, but it was suggested that I should also share with llvmdev for broader exposure for anyone interested.. Text from the bug report copied below, and PPT attached to mail.
Useful for anyone interested in or troubled by code alignment issues on IA.
bugzilla-5615-presentation-public.pptx (162 KB)
Interesting findings, thanks for sharing.
I’d be interesting in seeing any prototype patches you have for this. My frontend (Java) is likely to be generating code which is potentially more branch heavy than your typical C code. I’d be curious to see if the tradeoffs were different. I’d be happy to apply a patch locally and report back on the big picture impact.
If it does turn out to be profitable to nop pad in the way you describe, we could potentially apply this only to hot loops. Using profile data to guide when we pad vs don’t pad, we might be able to avoid excessive code bloat while still getting the improvements you describe.
As someone who has been writing hand optimized assembly for Intel x86 since the 80486 era (I slacked off for years and have recently been reading the optimization manuals again, finally), this was very interesting to me, thanks for posting on the list.
Software Development Engineer