How can I identify “less hard optimization failures”?
This is hard to answer generally, so I’ll try to go through a couple examples, based on recent issues.
Let’s pick the newest reported issue Suboptimal code generated for alignment checks and similar via `number.trailing_zeros() >= bit_count` · Issue #107554 · rust-lang/rust · GitHub as an example. (Edit: This has already been fixed by ⚙ D143368 [InstCombine] Look through truncate to fold icmp with intrinsics since I originally wrote this.)
- Inspect LLVM IR on Godbolt: Compiler Explorer
- Create an LLVM IR test case:Compiler Explorer Here we see that it fails to optimize if there is a truncation between the cttz and the comparison. The basic pattern itself is already supported.
- As such, we have two ways this might be fixed. First one is to just add support for the extra trunc, which would be this transform: Compiler Explorer Second one is to instead remove the trunc and then reuse the existing transform: Compiler Explorer
- In either case, this is a simple peephole transform that can be added to the InstCombine pass.
This is the hallmark of an “easy” optimization problem. It doesn’t get much easier than adding a peephole transform to InstCombine. An experienced LLVM developer would be able to implement and test this within an hour. For a new contributor this would take a few more hours, both in implementation time and review cycles. I have written a contribution guide that targets precisely this kind of InstCombine change as a general guidance.
Let’s look at the second issue: Missed optimization opportunity when trivially moving tuple of slices · Issue #107436 · rust-lang/rust · GitHub
This one already has an LLVM IR sample posted in the comment, but here’s the Godbolt link: Compiler Explorer. Here we perform a memcpy to an alloca and then only use it in a
readonly noalias argument. This sounds kind of similar to a byval argument optimization LLVM already implements.
I would tentatively classify this as medium – this is not a simple peephole transform and would require some careful thinking about when exactly it is correct. But at least there is an existing transform to follow.
However, after implementing that transform one would find that it doesn’t actually fix the original issue (though it does fix others, so it’s valuable in any case). That’s because Rust’s LLVM IR is missing an
align annotation on the
%val argument for some reason. So a full fix would also need a change on the Rust side. That could be done as part of the project or not, depending on whether there is interest in working on Rust itself.
The third issue is missed(?) optimization with a const array of same item · Issue #107208 · rust-lang/rust · GitHub. This is about a load from a constant that contains a repeating pattern.
This one already has some analysis on the issue, and points out that we currently fail to handle even the basic case of a load from an all-zero global. Here is a reduced LLVM IR example of the missing transform: Compiler Explorer Fixing this would be another “easy” optimization.
Fixing the more general case of a load from a pattern-filled global I would classify as “medium”. It’s pretty clear what that transform has to do in abstract, but it would take a lot more implementation effort.
(I’m out of office for the next two weeks and have to catch a flight now, I’ll finish responding some other time.)