Hi all,
I’m working on a new pass to optimize comparison chains.
Motivation
Clang currently generates inefficient code when dealing with contiguous member-by-member structural equality. Consider:
struct A {
bool operator==(const A& o) const { return i == o.i && j == o.j; }
uint32 i;
uint32 j;
};
This generates:
mov eax, dword ptr [rdi]
cmp eax, dword ptr [rsi]
jne .LBB0_1
mov eax, dword ptr [rdi + 4]
cmp eax, dword ptr [rsi + 4]
sete al
ret
.LBB0_1:
xor eax, eax
ret
I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code:
mov rax, qword ptr [rdi]
cmp rax, qword ptr [rsi]
sete al
ret
And thanks to recent improvements in the memcmp codegen, this can be made to work for all sizes.
Impact of the change
I’ve measured the change on std:pair/std::tuple. The pass typically makes the code 2-3 times faster with code that’s typically 2-3x times smaller.
A more detailed description can be found here and a proof of concept can be seen here.
Do you see any aspect of this that I may have missed?
For now I’ve implemented this as a separate pass. Would there be a better way to integrate it?
Thanks !