Struct padding

Hi,

I am wondering how I can tell whether a field of a struct is introduced by padding or not.

For example, if I have a struct:

struct foo1 {
char p; / 8 bytes /
char c; /
1 byte
long x; /* 8 bytes */
};

clang may generate:

struct foo1 {
char p; / 8 bytes /
char c; /
1 byte
char pad[7]; /* 7 bytes /
long x; /
8 bytes */
};

Is there any way that I can tell the “pad” array is generated by padding?

Thanks a lot
Hongbin

Hi Hongbin,

You can pass -Wpadded to clang. For your particular example it will print something along the lines of


warning: padding struct 'foo1' with 7 bytes to align 'x' [-Wpadded]
long x;

Jonas

Hi Jonas,

Thanks a lot.
In an LLVM pass, how can I check the related information? will clang emit some metadata table?

Thanks
Hongbin

What are you actually trying to achieve? LLVM knows the alignment and size of each component. You could iterate over the different types and identify when there is a difference in “calculated total size and the current alignment requirement”, but LLVM does automatically pad structures [unless you specifically ask it not to].

Note that there is no actual field added for padding, it’s just the size and alignment itself.

Hi Mats,

When the struct is packed, explicit byte array is introduced to pad the struct. (I saw this happened in clang 3.9.)

I want to check if a byte or byte array in an LLVM struct is introduce for explicit padding or not.

I don’t need to worry about this problem in case the newest clang do not introduce byte array anymore.

Thanks
Hongbin

How do you mean that a byte array is added? Because at least in my experiments, I don’t see that:

struct A
{
int a;
char b;
long c;
};

struct A a;

produces:

; ModuleID = ‘pad.c’
target datalayout = “e-m:e-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-unknown-linux-gnu”

%struct.A = type { i32, i8, i64 }

@a = common global %struct.A zeroinitializer, align 8

!llvm.ident = !{!0}

Adding a call to printf,

extern int printf(const char fmt, …);
void func(struct A
a)
{
printf(“c=%ld”, a->c);
}

and outputting assembler, we can see that the offset to “c” in that struct is 8:

func: # @func
.cfi_startproc

BB#0: # %entry

movq 8(%rdi), %rsi
movl $.L.str, %edi
xorl %eax, %eax
jmp printf # TAILCALL

So, can you provide an example of this padding, because I don’t see it. This is clang 3.8, but 3.9 did the same thing (I went back to 3.8 to check if it was different)

There will be padding in the actual data structure, based on the need for aligning (better performance if not required by the hardware), so if we for example initalize the data:
struct A a = { 3, ‘a’, 4711 };

then there will be LLVM-code like this:
@a = global %struct.A { i32 3, i8 97, i64 4711 }, align 8

and in the machine code there will be:
a:
.long 3 # 0x3
.byte 97 # 0x61
.zero 3
.quad 4711 # 0x1267

Because three bytes of zeros are needed to fill the data between the ‘a’ and the long of 4711. But nowhere other than in the machine-code is that padding anything more than “difference between theoretical closest offset and aligned offset”.

the packed + aligned attribute will automatically introduce explicit padding byte array:
https://godbolt.org/g/TlHX2g

Sometimes Clang will decide to automatically pack the struct/class in C++, I don’t know the details here, but looks like it is related to inheritance.

Thanks
Hongbin

In that particular example, it’s because the WHOLE structure needs to be aligned to 4 bytes, but the contents inside it is packed (because that’s what your attributes request - packed, and then align to 4).

So, yes, if you use attribute packed or attribute aligned to change the natural alignment WITHIN a structure, then you will (if necessary) get extra elements added to the struct. This largely because LLVM doesn’t have a (good) way to express this in a StructType.

Still don’t understand what it is you are trying to do here. Definitely something clang does, not something LLVM does. Also, I don’t think you can tell the difference between a manuall padded and an automatically padded struct. Adding a char d[3]; to the struct with the packed,align 4 attribute, it produces the same type. The only difference is that it will zero initialize d, where the anonymous padding is undef (allows the compiler to optimise it away at times, I think).

This is one of the underspecified corner cases in the C spec (and the subject of some ongoing WG14 discussions). In particular, for atomic structs to work, struct padding is required to be stable, so undef isn’t quite right (an optimiser is permitted to spot an atomic compare and exchange on a struct containing undef and allow assume undef != undef and so it will always fail). Some architectures (for example, Alpha) make sub-word stores much more expensive and so field updates on these architectures may modify the following padding (which is the reason for the vagueness in the C spec and why sizeof(T) and sizeof(_Atomic(T)) are not required to be the same - on Alpha you’d likely want _Atomic(char) to be 64 bits).

It would be nice if LLVM had a way to differentiate between padding and non-padding struct fields (even if it were metadata, because losing the ‘padding’ attribute would impede optimisation but shouldn’t harm correctness).

David