How to distinguish between padding and real struct fields?

In some cases, the structure/class definitions in LLVM IR have some byte array padding, and I want to distinguish between them, how can I do it?

For example the following two examples:
1.

struct A
{
    int a;
    char b;
    long c;
} __attribute__((packed, aligned(4)));

struct A a;
%struct.A = type <{ i32, i8, i64, [3 x i8] }>
@a = dso_local global %struct.A zeroinitializer, align 4
struct A
{
    int a;
    char b;
    long c;
};

struct A a;
class B : A{
 char a;
};

B b;
%struct.A = type { i32, i8, i64 }
%class.B = type <{ %struct.A, i8, [7 x i8] }>

@a = dso_local global %struct.A zeroinitializer, align 8
@b = dso_local global %class.B zeroinitializer, align 8

I suspect there is no reliable way to identify padding at the IR level. You could try looking for bytes that are not accessed, but that won’t identify everything (accesses may be widened, copying a whole struct may copy the padding, etc).

I expect the only way to robustly identify padding is in the frontend, RecordLayoutBuilder.

1 Like

Thank you for your reply.
Do you mean I need to modify the frontend of clang to get information about paddings?
How do I get the information and bind it to the IR?

Someone more familiar with the frontend would have to answer these questions.

Once you know where the padding is, what do you plan to do with that knowledge? If you explained your goals it might be easier to help.

Thanks for your reply, I would like to get the mapping between C++ source class members and IR class definitions.

struct A
{
    int a;
    char b;
    long c;
};

%struct.A = type <{ i32, i8, i64, [3 x i8] }>

For example, a corresponds to i32, b corresponds to i8, c corresponds to i64.

The debug-info metadata can provide that kind of information, although maybe not as directly as you would prefer.

@a = global %struct.A zeroinitializer, align 4, !dbg !0

!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "a", scope: !2, file: !3, line: 6, type: !5, isLocal: false, isDefinition: true)
!2 = distinct !DICompileUnit(language: DW_LANG_C11, file: !3, producer: "clang version 17.0.6 (PS5 clang version 9.00.0.501 cdbd5f6a)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
!3 = !DIFile(filename: "padding.c", directory: "D:\\Dev\\ours\\scratch", checksumkind: CSK_MD5, checksum: "766809f898304fd25444f40ce70e17fa")
!4 = !{!0}
!5 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "A", file: !3, line: 2, size: 128, align: 32, elements: !6)
!6 = !{!7, !9, !11}
!7 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !5, file: !3, line: 3, baseType: !8, size: 32)
!8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!9 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !5, file: !3, line: 4, baseType: !10, size: 8, offset: 32)
!10 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
!11 = !DIDerivedType(tag: DW_TAG_member, name: "c", scope: !5, file: !3, line: 5, baseType: !12, size: 64, offset: 40)
!12 = !DIBasicType(name: "long", size: 64, encoding: DW_ATE_signed)

Variable @a points to the debug-info description at !0. This points to the variable at !1, which has its type at !5, which points to the list of members, and so on.

Sizes and offsets of members are in bits. There are no member descriptions for padding, so you can derive the size and location of padding bits by what parts of the struct are not covered by members.

I am not deeply familiar with the APIs for navigating the debug info. If I were working on a project like this, I’d probably look first at the IR verifier to see how it walks the tree of debug-info metadata.

1 Like