Structure Types and ABI sizes

Hi!

I followed the discussion on structure types with the example

struct I {
   int a;
   char b;
};

struct J : I {
   char c;
};

Dave said that this translates to

%I = type { i32, i8, i16 }
%J = type { %I, i8, i16 }

because the frontend has to communicate the ABI to llvm since llvm is language agnostic.
What I really wonder is why it isn't

%I = type { i32, i8 }
%J = type { %I, i16, i8 }

because llvm at least knows alignment rules by

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16...

Therefore llvm has no other choice than assigning %I a size of 8
since an array may consist of %I elements and size of 5 would violate
the aligment of the i32 member.
If the ABI requires that member c has an offset of 8 instead of 5 then
of course a padding behind %I is necessary in %J.

-Jochen

Jochen Wilhelmy <j.wilhelmy@arcor.de> writes:

struct I {
   int a;
   char b;
};

struct J : I {
   char c;
};

Dave said that this translates to

%I = type { i32, i8, i16 }
%J = type { %I, i8, i16 }

It translates to that in OUR compiler. It's not the only answer.

because the frontend has to communicate the ABI to llvm since llvm is
language agnostic.

Correct.

What I really wonder is why it isn't

%I = type { i32, i8 }
%J = type { %I, i16, i8 }

because llvm at least knows alignment rules by

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16...

Therefore llvm has no other choice than assigning %I a size of 8
since an array may consist of %I elements and size of 5 would violate
the aligment of the i32 member.

I can't quite parse this. %I doesn't get "assigned" a size by anyone.
Do you meant the size of struct I is eight bytes? Yes, that's true.

If the ABI requires that member c has an offset of 8 instead of 5 then
of course a padding behind %I is necessary in %J.

Yes, the padding is required. I believe %J = type { %I, i16, i8 } would
work just as well as long as %I = type { i32, i8 } as in your example.

Our frontend is far from "perfect" in the sense of aesthetics. :slight_smile:

                          -Dave

What I really wonder is why it isn't

%I = type { i32, i8 }
%J = type { %I, i16, i8 }

because llvm at least knows alignment rules by

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16...

Therefore llvm has no other choice than assigning %I a size of 8
since an array may consist of %I elements and size of 5 would violate
the aligment of the i32 member.

I can't quite parse this. %I doesn't get "assigned" a size by anyone.
Do you meant the size of struct I is eight bytes? Yes, that's true.

Yes I mean that

%I = type { i32, i8 }
is 8 bytes given the alignment rules (i.e. llvm "assigns" a size of
8 bytes to this struct after parsing it)

Yes, the padding is required. I believe %J = type { %I, i16, i8 } would
work just as well as long as %I = type { i32, i8 } as in your example.

Yes but given the ABI requires the last member to be at offset 5, which may happen
(i.e. no tail padding if I is derived from), then your solution

%I = type { i32, i8, i16 }
is problematic or do you switch struct generation dependent on the ABI?
The question arises to me since I would use an "always working" solution
(with no case distinction) but of course I'm not deep enough in the matter.

-Jochen

Jochen Wilhelmy <j.wilhelmy@arcor.de> writes:

Yes, the padding is required. I believe %J = type { %I, i16, i8 } would
work just as well as long as %I = type { i32, i8 } as in your example.

Yes but given the ABI requires the last member to be at offset 5,
which may happen
(i.e. no tail padding if I is derived from), then your solution

No, this is not true for this example. This is getting into extremely
delicate areas of the Itanium C++ ABI.

In this example, %I is a "POD for the purposes of layout" type. Such
types cannot have their tail padding overlapped when they are inherited
from. So %I is eight bytes in all contexts.

If %I is not a "POD for the purposes of layout" type, that it's tail
padding MUST be overlapped when inherited from. In this case, we
end up creating two types for %I, %I and %I' and use %I' as the
type when it is inherited from.

Fun, eh? :-/

                           -Dave

Some other fun examples...

** POD-layout:

struct I { int, char }; // size 8 = { i32, i8 };
struct J : I { char }; // size 12 = { %I, i8 };
struct K : J { char }; // size 12 = { [9 x i8], i8, [2 x i8] };

I is POD-layout, but J is NOT.

** Default C-tor:

struct I { int, char }; in C++ has the default and copy constructors
created automatically, right? So I is POD-layout.

But struct A { int, char, A(){} }; has the default constructor
overwritten exactly the same way, but A is not a POD-layout any more.
So:

struct A { int, char }; // size 8 = { i32, i8 };
struct B : A { char }; // size 8 = { [5 x i8], char, [2 x i8] }
struct C : B { char }; // size 8 = { [6 x i8], i8, i8 }

Of course, as David said, all those types have their "normal sized"
components, so there is B-full (8 bytes) and B-inheritable (6
bytes)...

cheers,
--renato

If %I is not a "POD for the purposes of layout" type, that it's tail
padding MUST be overlapped when inherited from. In this case, we
end up creating two types for %I, %I and %I' and use %I' as the
type when it is inherited from.
   

But this is the question why two types in this case.

if

%I = type { i32, i8 };

then %I has 8 bytes if used directly and when used in %J

%J = type { %I, i8 }

then %I has only 5 bytes. Of course %I' could be

%I' = type { i32, i8, i16 };
or
%I' = type { i32, i8, i8, i16 };

but I don't see the point of this since %I already does the job
or do I miss something?

-Jochen

but I don't see the point of this since %I already does the job
or do I miss something?

If you're saying that:

%I = type { i32, i8 };

has size 5, yes, you're missing the alignment.

According to the standard, the alignment of a structure is the
alignment of its most-aligned member (and some other cases in the ABI,
too).

So, %I has an int (align 4) and a char (align 1), so the final
alignment is 4, so the size is rounded up to 8. LLVM knows that, and
the size of:

%I = type { i32, i8 };

is 8, not 5.

To get size 5 you need the "packed" keyword (or similar attributes) or
transform it to a [5 x i8].

cheers,
--renato

If you're saying that:

%I = type { i32, i8 };
     

has size 5, yes, you're missing the alignment.
   

Ah, now I see. But I didn't say that

%I = type { i32, i8 };

has 5 bytes (because it has 8) but I thought that it has
5 bytes when being a member of %J, i.e.

%J = type { %I, i8 }

In this case %I also has 8 bytes right?
I was thinking too much in terms of C++ inheritance.

Then perhaps the tailpadding should be specified explicitly :wink:

%I = type { i32, i8 }; // 5 bytes
%I' = type { %I, tailpad}; // 8 bytes
%J = type { %I, i8 } // 6 bytes

-Jochen

That would break C code (and whatever else relies on alignment).

I don't see a way of specifying two structures, but I like the idea of
using a packed structure for inheritance and the "normal" one for
types.

cheers,
--renato

%I = type { i32, i8 }; // 5 bytes
%I' = type { %I, tailpad}; // 8 bytes
%J = type { %I, i8 } // 6 bytes
     

That would break C code (and whatever else relies on alignment).
   

why would it break C code? of course a C frontend should generate only tailpadded types.

I don't see a way of specifying two structures, but I like the idea of
using a packed structure for inheritance and the "normal" one for
types.
   

or something like

%J = type { inherit %I, i8 }

the inherit keyword before %I removes the tailpadding

-Jochen

why would it break C code? of course a C frontend should generate only
tailpadded types.

It's not about the size, but the offset. If you had a char field in
the inherited class:

%I' = type { %I, i8, tailpad};

The offset of that i8 has to be 8, not 5. If all structures are
packed, that would be 5, which is correct for non-POD in C++ but wrong
for everything else.

%J = type { inherit %I, i8 }

the inherit keyword before %I removes the tailpadding

That's what the packed is for.

%Base = type { i32, i8 }; // size = 8
%POSDerived = type { %Base, i8 }; // i8 offset = 8, size 12

%Basep = packed type { i32, i8 }; // size = 5
%nonPOSDerived = type { %Basep, i8 }; // i8 offset = 5, size 8

cheers,
--renato

why would it break C code? of course a C frontend should generate only
tailpadded types.
     

It's not about the size, but the offset. If you had a char field in
the inherited class:

%I' = type { %I, i8, tailpad};

The offset of that i8 has to be 8, not 5. If all structures are
packed, that would be 5, which is correct for non-POD in C++ but wrong
for everything else.
   

I know therefore in this case %I has to tailpadded. but packing and tailpadding are different
things, aren't they? in a packet type {i8, i32} the i32 type has offset 1 while in a non-tailpadded
type it still has offset 4.

%J = type { inherit %I, i8 }

the inherit keyword before %I removes the tailpadding
     

That's what the packed is for.
   

I don't think so because packing removes alignment constraints of all members.

-Jochen

True.

cheers,
--renato