Full context is in Clang produces individual memclr calls when setting large data structures · Issue #62813 · llvm/llvm-project · GitHub.
We have a function that sets all members of a large struct to zero except a handful of members. In clang, this results in many __aeabi_memclr
calls to these individual members whereas gcc just memset(0)
's the whole struct then manually sets the handful of members to their non-zero values. This results in a very large size difference between the two functions. Is there a reason clang doesn’t do what gcc does at -Oz?
As far as I know, nobody has tried to implement that optimization. Maybe memcpyopt could be improved to handle it.
Updating with Richard Smith’s response:
Seems likely that the IR emitted by Clang doesn’t let LLVM know that it’s allowed to overwrite the padding.
If you make a constexpr variable to hold the initializer and assign from that instead of using the macro, clang generates a memcpy instead.
Can confirm using a constexpr variable allows clang to clear the whole struct then set individual non-zero values to members.
Padding is an issue in some related cases, but for this particular construct, there is enough information. The only instructions that write to the class are two memcpys from allocas, and we can safely zero-fill an alloca if we want. LLVM optimizations doesn’t really try to take advantage of that at the moment, though.
Alternatively, maybe clang could be improved to be a bit more clever here (although that’s a less general optimization).