Handling First Element of a Struct in LLVM Store Instruction after O3 Optimization

ankit.anand · September 24, 2024, 9:34am

Hello everyone,

I’m encountering an issue while working on an LLVM pass that processes StoreInst instructions. Specifically, I’m having trouble with accessing the first element of a structure after LLVM’s optimization phase. It seems that when the first element of a structure is being stored, LLVM optimizes away the GetElementPtr (GEP) instruction and directly accesses the base address, making it difficult to capture the store instruction for the first element of the struct in my pass and deduce the structure name and field

Additionally, I noticed that in newer versions of LLVM, both getPointerElementType() and getNonOpaquePointerElementType() have been deprecated. I am aware that LLVM now uses opaque pointers, and I need to adapt my pass to this change, but I’m struggling to find the best approach.

Issue Details:

In debug mode, when I store to the first element of a structure, the LLVM IR uses %global, and I can handle it as expected with a GEP instruction. However, in release mode, the IR uses %0, and there’s no GEP, making it impossible to detect that I’m storing to the first element of a struct.

Here’s an example of the difference:

•	Debug Mode:
     Store i32%val , ptr %global_structtypedef


•	Release Mode:

store i32 %val, ptr %0, align 4

As you can see, the GEP instruction for the first element is removed, and LLVM directly uses %0 as the pointer to the base address. This prevents my pass from detecting that this is the first element of the structure.

Steps I Have Taken:

1.	I’ve tried using the now-deprecated getPointerElementType() and getNonOpaquePointerElementType(), but I understand that I need to adopt a new approach since opaque pointers do not carry type information.
2.	I’ve attempted to directly analyze the pointer (getOperand(1) in the store instruction), but without the GEP for the first element, I’m unsure how to correctly associate it with the structure’s first field.

My Questions:

1.	What is the recommended approach in newer LLVM versions to detect the type of the memory being written to?
2.	How can I manually identify that a store is targeting the first element of a structure when LLVM optimizes away the GEP?
3.	Is there a way to adjust the LLVM optimization to retain more debug information about the first element of a struct, even in release mode?

Any suggestions or insights on how to address this issue would be greatly appreciated!

Thank you!

nikic · September 24, 2024, 10:12am

The general approach is to iterate over the users of the pointer, and collect at which offset and with which type it is accessed. If you have a unique type being used at offset 0, then that would be the type of the first “struct member”.

Of course, it can happen that one offset is accessed with different types, or that there are overlapping accesses. This means that a clean segregation into struct members is not possible – for example, because this is actually a C union or a Rust enum.

ankit.anand · September 24, 2024, 11:25am

understand that iterating over the users of the base pointer and checking the offset and type can help deduce the first struct member, as explained earlier. However, my challenge is with reliably identifying the structure and its fields in optimized LLVM IR, where the first field gets optimized out.

For all other fields, we can easily use getelementptr (GEP) to retrieve the structure’s name and the member index, but for the first element, the base pointer is used directly, which complicates tracking. I can use getSourceElementType to retrieve structure information for subsequent fields, but is there a reliable way to obtain the structure name and first element information when GEP is not generated? Essentially, I’m looking for a workaround or LLVM utility that can help associate the first element access back to its structure along with struct name it belongs

nikic · September 24, 2024, 12:18pm

The only information you can determine at the LLVM IR level is the offset + type decomposition I already mentioned. You should not be directly inspecting the GEP source element type, and instead use methods like stripAndAccumulateConstantOffsets() to look through GEP instructions and determine the accessed offsets.

It is architecturally impossible to reliably determine the name of a struct from LLVM IR – because struct names do not carry any semantic meaning. This kind of information can only be extracted from debug records in the IR.

It would be helpful if you could share what your end goal here is.

ankit.anand · September 24, 2024, 12:54pm

Thank you for your insights regarding the retrieval of struct member information at the LLVM IR level. I understand that the primary approach involves utilizing offset + type decomposition and employing methods like stripAndAccumulateConstantOffsets() to analyze GEP instructions for determining accessed offsets.

As we aim to use our LLVM pass for frequency counting of member of structure across project for store instructions , our end goal is to accurately maintain the mapping of struct members and frequency count of member including the first element that gets optimized away to leverage some optimisation on that. So the main issue i am facing is with first element of structure which is not coming as a GEP while going through all the store instruction and not any other type

Topic		Replies	Views
Is there any way to retain the GEPs which lead to the first field of a structure IR & Optimizations	4	517	March 7, 2023
Structure field access (C++) Beginners llvm	4	348	November 28, 2023
GEPping GEPs and first-class structs LLVM Dev List Archives	7	102	February 5, 2009
StructType field names LLVM Dev List Archives	4	149	March 28, 2009
More Qs about llvm IR to access struct fields LLVM Dev List Archives	3	125	April 21, 2014

Handling First Element of a Struct in LLVM Store Instruction after O3 Optimization

Related topics