Hi All,
I have a two-part de-virtualization enhancement that I’m considering
working on and am looking for any feedback on how feasible it is. The two
parts are:
1. llvm: Extending inter-procedural SCCP (or some other IPO module
pass) to propagate llvm.assume’s across function calls. The basic idea
would be to collect the set of assumptions for each argument at each call
sight and compute the intersection across all call sites, then duplicate
the intersection assumption computations in the callee. The reason I’m
starting with SCCP is that it already deals with keeping track of computing
when all of a function’s possible call sites are known, as well as merging
values in a lattice.
Given that we use !invariant.group loads when loading vptrs, what
additional value do you think you can get from this? An example of a case
where you could do better than the current approach of
-fstrict-vtable-pointers with this technique would help a lot in
understanding this.
2. clang: Emitting llvm.assume vtable load sequences for each
global variable with virtual functions referenced inside a function. This
is similar to what is currently done for local variables and would produce
more vtable load assumptions to be propagated by (1).
Given that it's valid to placement new another object on top of a global,
there are some limits on what we can do here -- we can only emit these
assumption loads at places in the code where we know the original vptr is
present. For instance, we can do this at any point where we emit a member
access or member function call on an object of known dynamic type (whether
it's local or global), but we cannot do so when such an object is passed by
reference into a function or when its address is taken (those operations
don't require the object to be within its lifetime).
Related to (2), does anyone know what the status is of enabling clang’s
–fstrict-vtable-pointers by default? Are there known issues with this code
that would need to be resolved as well?
There are two known issues:
1) At the IR level (but not at the object code level), it introduces an ABI
break: for LTO, all modules must be built with the same setting of the flag
or the necessary invariant barriers may be missing, resulting in incorrect
devirtualization in rare cases. (If you try to LTO modules with different
settings for the flag, we trap the problem and issue an error.)
2) Not all optimization passes have been updated to understand
@llvm.invariant.group.barrier, and as such, inserting it can sometimes
result in a pessimization when optimization passes are unable to correctly
reason about it. Thus the flag may degrade performance.
Plus, of course, it can cause existing code that breaks the language rules
to start misbehaving (as with any of the -fstrict-* flags that optimize on
UB).