[GSoC] Fixing fundamental issues in LLVM IR

Hi all,

My proposal “Fixing fundamental issues in LLVM IR: Introducing a byte type to solve load type punning” was accepted for GSoC 2021. My project is about resolving an issue when optimizations introduce an implicit ptrtoint instruction to load a pointer as an integer, breaking alias analysis. My full proposal is here: https://docs.google.com/document/d/1C6WLgoqoDJCTTSFGK01X8sR2sEccXV5JMabvTHlpOGE/edit?usp=sharing.

I am looking forward to working with my mentors Nuno Lopes and Juneyoung Lee, as well as excited about contributing to the LLVM community.



This project looks very interesting. Would you be able to describe a bit how this intersects with the opaque pointer work? Your motivating example would lose the bitcasts with opaque pointer but, if I understand correctly, the underlying issue remains and would just be more obvious with opaque pointers.

The SelectionDAG impact also looks interesting. In the CHERI back ends, we have to be very careful in the lowering because loads / stores of integers do not carry pointer provenance and so will lose the tag bits if used to copy structures that contain pointers. If we could differentiate between a 128-bit integer copy and a 128-bit type-oblivious copy (which must conservatively go via capability registers if it is 128-bit aligned) then that would probably lead to better codegen.


Hi David,

Yes, the bitcasts would go away, but the underlying issue would still be the same. Opaque pointers work does not really intersect with this project, but it definitely makes the issue more visible.

Regarding the SelectionDAG part and CHERI, the loads/stores from unknown memory would remain over the byte type (So CHERI will lower these as carefully as before). But at the same time, all integer load/stores are guaranteed not to come from pointer casts, thereby leading to more aggressive integer optimizations and arguably a better codegen!