Generally speaking (not llvm in this context) structure types in C language would be preferred by a programmer because of its simplicity which holds multiple basic data types in it. It is not just size and alignment, but it is also language constructs which helps him access these structures hiding the layout as to which field at which offset. The programmer could rather allocate a chunk of memory (*i8) or array and try to keep multiple fields in it and track which field is at which byte offset himself, but that would be terribly complicated right!. Simplicity is the one which has driven the compiler world where the languages have grown like
1) machine code to assembly,
2) assembly to IR,
3) IR to high-level
4) and so on.
And I believe that 20y old beautiful doc was highlighting these special features which no other IR was providing.
Answering 1 and 2
- Struct Type allocation is not just about size & alignment but its layout as well which field at which offset. And the abstract constructs (Geps) which help us to access them.
- Once Struct Type and its accesses is ascertained it gives us ways to change its layout may be
a) reorder fields,
b) remove dead fields,
c) change fields size or type,
etc…
Collecting offsets and painting the picture of structure access would be complicated in opaque pointer world or with PtrAdds. Intermittent bit casts would complicate the process (may be like how far you would want to go to understand the things). I do agree that it may or would be possible with PtrAdds tracking the offset and painting the picture but that would be complicated than what we can do in the typed world.
I have attached a sample test case SM1.c and the two IRs
- typed IR
- Opaque IR
In the typed IR
%14 = call noalias i8* @calloc(i64 noundef %13, i64 noundef 24) #13
%15 = bitcast i8* %14 to %struct.Node1*
Can be treated as a single instruction and this gives a clear clue that ‘calloc’ is allocating an object of type %struct.Node1.
Same applies to the following,
%48 = bitcast %struct.Node1* %15 to i8*
call void @free(i8* noundef %48) #11
Apart from the above there are no
a) bit cast instructions to %struct.Node1*
b) bit cast instructions from %struct.Node1*
Giving a clear clue that %struct.Node1* are accessed as %struct.Node1* itself.
Now any GEP instruction in the entire program operating on %struct.Node1* would be consistent,
Consistent meaning,
-
%struct.Node1* would be incremented or decremented only in multiples of %struct.Node1 size, when operated through GEP
-
Any access to fields would only be through GEP instruction only.
(Note care has to be taken to ensure that pointers to fields are not used as arrays)
And the GEP would readily provide the types of fields as well.
Here the analysis has not traversed the complete path of how %struct.Node1* flows but has still able to quickly analyze its discipled access in this case. Further here we are not doing lots of book keeping for,
a) which instruction or pointer is of which type,
b) which function has a pointer argument of %struct.Node1*,
c) which global is of %struct.Node1*,
d) which calloc is of %struct.Node1*,
c) which pointer field in structure is of %struct.Node1*
Following picture readily is available in the IR,
{
%struct.Node1 = type { i32, i64, %struct.Node1* }
define internal i32 @func_check(%struct.Node1* nocapture noundef readonly %0, i32 noundef %1)
define internal void @func(%struct.Node1* nocapture noundef writeonly %0, i32 noundef %1)
define internal void @func_1(%struct.Node1* noundef %0, i32 noundef %1)
define internal i32 @func_1_check(%struct.Node1* noundef readonly %0, i32 noundef %1)
define internal i32 @func_12_check(%struct.Node1* noundef readonly %0, i32 noundef %1)
}
All we would be doing is that just confirm or a certain that whatever information is present in IR is correct (no need of any book keeping). With this consistent picture of disciplined access it should be easier to change the layout of the %struct.Node1.
In the Opaque IR
Here it would not be readily evident that
%13 = call noalias ptr @calloc(i64 noundef %12, i64 noundef 24) #13
is allocating a %struct.Node1* .
There is not explicit bit cast instructions any were, we have to traverse all the path of flow of %13 to check if it is accessed in %struct.Node1* way. We need to ensure that no other object get mixed up with the usage path of %13. Thank god at least we still have GEPs which gives some indication that it is operating on %struct.Node1* 's Protocol.
Now we need to do book keeping on ,
a) which instruction or pointer is of which type,
b) which function has a pointer argument of %struct.Node1*,
c) which global is of %struct.Node1*,
d) which calloc is of %struct.Node1*,
c) which pointer field in structure is of %struct.Node1* …..
Since all these information will not be readily available as in opaque world.
{
%struct.Node1 = type { i32, i64, ptr }
define internal i32 @func_check(ptr nocapture noundef readonly %0, i32 noundef %1)
define internal void @func(ptr nocapture noundef writeonly %0, i32 noundef %1)
define internal void @func_1(ptr noundef %0, i32 noundef %1)
define internal i32 @func_1_check(ptr noundef readonly %0, i32 noundef %1)
define internal i32 @func_12_check(ptr noundef readonly %0, i32 noundef %1)
}
Once PTRAdd comes in may be guessing which structure types are active also would be like moving in a dark room to examine something !
I felt that opaque pointers and PtrAdds could have been implemented as an optimization pass and could have pitched in at some final stages of compilation or LTO stage!. Rather than being a default from the start or front end.
SM1.c (2.8 KB)
temp_op_ll.c (11.2 KB)
temp_ty_ll.c (13.1 KB)