LLVM IR bitcode to show pointer provenance


Various optimizations in LLVM rely on determining pointer provenance for alias analysis.
Is there any way to output an LLVM IR module in human readable form, and include the provenance information? A bit like llvm-dis .bc/.ll output but also include provenance information.
I can then view that information, and if a pointer looks like it has the wrong provenance, I can re-craft the C++ code until is does, and as a result create better optimized code.

If there is not such a feature, how would I go about adding it? Any ideas/suggestions on where to start?
Would it be a good idea to add this sort of information to LLVM ir bitcode?
It might highlight some edge cases where some provenance controlling LLVM IR instructions might help if introduced.
It might also highlight some bugs in the LLVM code, if everyone is assuming X is provenance Y, but in fact the LLVM compiler is treating it as something else.

Kind Regards


Hi James,

This is not my area of expertise, but I happened to be looking into this recently. You may want to check the following:

Tl;Dr Jeroen Dobbelaere has been working on this for a while (google “full restrict support”). AFAIK, some of the stuff that was proposed has been merged, but not everything.


Right, there is no way of printing pointer provenance ATM.
However, you can print the result of alias analysis, which sounds sufficient for what you want. See here, for example: LLVM Alias Analysis Infrastructure — LLVM 15.0.0git documentation

Writing pointer provenance in the IR is not trivial. If you add a new instruction/intrinsic to the IR, then you need to patch all optimization to learn about it so you don’t have regressions. E.g. optimizations need to know that instruction doesn’t clobber memory, has no cost (for inlining), can be moved around, etc. It’s a lot of work.

Furthermore, alias analysis (AA) can’t return “noalias” even if it doesn’t know that 2 pointers have different provenance. AA uses other rules than simply provenance. For example, it can use information from the offsets and overflow semantics of “gep inbounds” to separate accesses.