Preserving the type of structure


LLVM IR flattens out the structure with one integer element to i32. Is there any way to disable this? I want to preserve the type information of the structure. I tried compiling the program with O0, it maintains the type for most of the instructions but for all.


No - if you really want this sort of information you’d likely find it in the debug info or maybe TBAA data.

Hi David,

Thanks for the response. I was wondering if you have any concrete steps to get this information from TBAA data?
If you can point me towards some source or example then it would be really helpful.


Disabling the SROA pass might help? But even if it works, it won’t be likely a reliable approach in general.

What you’ll observe depends on the front end you’re using. If you’re using clang, you may need to get in very early to see the TBAA metadata. Like all metadata, it can be dropped whenever the IR is transformed.

For example:

In this case, the front end itself transforms the structure to a single i32 for the purposes of argument passing. This may vary, depending on the target ABI, but for x86-64 on Linux this is the way small structures are passed by value. The function body still shows the TBAA information when the IR is generated by clang, but SROA will optimize that away very early in the pipeline. Disabling SROA is a bad idea for anything other than experimentation, but if you did that some other optimization might do the same thing.

In general, LLVM IR operates on values (which have semantically significant type information) and memory locations (which do not). Although LLVM IR has a type system that represents composite data types such as structs, there is no guarantee that these have any correspondence to the source language data types. If you need source type information, that needs to be generated by the front end in some way (such as TBAA), but the front end uses metadata to represent this information it may not be preserved. Any method that is guaranteed to be preserved would most likely block some optimizations.

I’ve run into this problem in a couple of different contexts, so I’d be curious to hear more about why you want this information if you are able to share.



P.S. If you look at the IR above, you can probably figure out the TBAA metadata. It’s described here:

I don’t have a lot more detail, sorry.

I will say: generally LLVM IR doesn’t preserve this sort of information. If you’re building optimizations, please revisit the basis/premise of your optimizations. If you’re implementing source analysis tools - consider doing that using Clang tooling and/or the Clang Static Analyzer.