Alignment of bitfield structs

Hi,

I'm trying to fix what appears to be incorrect codegen to do with alignment
of bitfield structs, but have hit a wall as to how to approach this. With
the following testcase:

extern struct T {
  int b0 : 8;
  int b1 : 24;
  int b2 : 1;
} g;

int foo() {
  return g.b1;
}

before optimization, Clang produces the following (for ARM, but I believe
this is just as valid elsewhere):

%struct.T = type { i40 }

@g = external global %struct.T

define arm_aapcscc i32 @foo() #0 {
entry:
  %bf.load = load i64, i64* bitcast (%struct.T* @g to i64*), align 4
   ...

This in itself seems correct, Clang has combined the bitfields into an i40
but also noted an alignment of 4 to preserve the ABI alignment on the
bitfield. The record layout being as below, which again, seems correct.

Layout: <CGRecordLayout
  LLVMType:%struct.T = type { i40 }
  IsZeroInitializable:1
  BitFields:[
    <CGBitFieldInfo Offset:0 Size:8 IsSigned:1 StorageSize:64
StorageAlignment:4>
    <CGBitFieldInfo Offset:8 Size:24 IsSigned:1 StorageSize:64
StorageAlignment:4>
    <CGBitFieldInfo Offset:32 Size:1 IsSigned:1 StorageSize:64
StorageAlignment:4>
]>

However, when this IR goes down to LLVM there is no context of the bitfields
left so LLVM just sees an i64 load from an i40. InstCombine sees this and
blindly (but correctly I believe, based on the information it has) converts
this load to the ABI alignment of i64, which is now incorrect.

From what I can tell there are only 2 ways to fix this, either have clang

not combine the bitfield into an i40, or somehow pass the bitfield layout
down to LLVM. Neither of these seem particularly easy to do, what are others
thoughts on this, have I missed something? Thanks.

Regards,
Bradley Smith

Hi,

I'm trying to fix what appears to be incorrect codegen to do with alignment
of bitfield structs, but have hit a wall as to how to approach this. With
the following testcase:

extern struct T {
  int b0 : 8;
  int b1 : 24;
  int b2 : 1;
} g;

int foo() {
  return g.b1;
}

before optimization, Clang produces the following (for ARM, but I believe
this is just as valid elsewhere):

%struct.T = type { i40 }

@g = external global %struct.T

define arm_aapcscc i32 @foo() #0 {
entry:
  %bf.load = load i64, i64* bitcast (%struct.T* @g to i64*), align 4
   ...

This in itself seems correct, Clang has combined the bitfields into an i40
but also noted an alignment of 4 to preserve the ABI alignment on the
bitfield. The record layout being as below, which again, seems correct.

Layout: <CGRecordLayout
  LLVMType:%struct.T = type { i40 }
  IsZeroInitializable:1
  BitFields:[
    <CGBitFieldInfo Offset:0 Size:8 IsSigned:1 StorageSize:64
StorageAlignment:4>
    <CGBitFieldInfo Offset:8 Size:24 IsSigned:1 StorageSize:64
StorageAlignment:4>
    <CGBitFieldInfo Offset:32 Size:1 IsSigned:1 StorageSize:64
StorageAlignment:4>
]>

However, when this IR goes down to LLVM there is no context of the bitfields
left so LLVM just sees an i64 load from an i40. InstCombine sees this and
blindly (but correctly I believe, based on the information it has) converts
this load to the ABI alignment of i64, which is now incorrect.

Why is this incorrect? It's only done because @g doesn't have an
explicit alignment, and will thus have the i64 ABI alignment. If
that's not the expected behavior, would adding an explicit ", align 4"
to @g solve the issue?

-Ahmed

This seems suspicious to me. I always thought that 'load i32..., align 1'
was a supported way to ask the backend to do the load slicing if needed,
and this instcombine breaks that.

However, when this IR goes down to LLVM there is no context of the
bitfields
left so LLVM just sees an i64 load from an i40. InstCombine sees this and
blindly (but correctly I believe, based on the information it has)
converts
this load to the ABI alignment of i64, which is now incorrect.

This seems suspicious to me. I always thought that 'load i32..., align 1'
was a supported way to ask the backend to do the load slicing if needed, and
this instcombine breaks that.

Right, but it does that because it can see that @g will be
ABI-aligned, no? If I do "@g ..., align 4" the load alignment doesn't
change, as expected.

-Ahmed

I’m having trouble reproducing. Can you please give the IR you are feeding to InstCombine?

Hi Bradley,

We had similar issues when we did te EDG bridge and IIRC, the issue
depends on how you lower the bitfield access logic.

Your two proposals are similar in that they suggest keeping the
information down the pipeline. Either way, keeping that information
valid for too long is going to be a pain, mainly because the current
passes don't account for anything special from bitfields. In theory,
bitfields are just a blob with complicated accessors, and no pass
should break that, unless the pass itself is broken.

Can you give an example of a piece of IR that breaks and maybe finding
if there is an optimisation pass that, if omitted, produces good code?

cheers,
--renato

> However, when this IR goes down to LLVM there is no context of the
bitfields
> left so LLVM just sees an i64 load from an i40. InstCombine sees this
and
> blindly (but correctly I believe, based on the information it has)
converts
> this load to the ABI alignment of i64, which is now incorrect.

Why is this incorrect? It's only done because @g doesn't have an
explicit alignment, and will thus have the i64 ABI alignment. If
that's not the expected behavior, would adding an explicit ", align 4"
to @g solve the issue?

From what I understand, the ABI alignment of (the actual) @g should be 4 since it is a struct containing int bitfields, Clang has invented the i40 type. Although as I said, from InstCombine's perspective the ABI alignment is indeed 8 as all it sees is the i40, so I don't believe it is InstCombine in itself that is wrong.

Regards,
Bradley smith

Right, I may be oversimplifying things, but I think Clang is
incorrect, here. It should have created something like { i32, i32 }
instead of { i40 }. InstCombine would probably not mess the alignment
up.

Is that what you mean?

cheers,
--renato

> From what I can tell there are only 2 ways to fix this, either have
clang
> not combine the bitfield into an i40, or somehow pass the bitfield
layout
> down to LLVM. Neither of these seem particularly easy to do, what are
others
> thoughts on this, have I missed something? Thanks.

Hi Bradley,

We had similar issues when we did te EDG bridge and IIRC, the issue
depends on how you lower the bitfield access logic.

Your two proposals are similar in that they suggest keeping the
information down the pipeline. Either way, keeping that information
valid for too long is going to be a pain, mainly because the current
passes don't account for anything special from bitfields. In theory,
bitfields are just a blob with complicated accessors, and no pass
should break that, unless the pass itself is broken.

Perhaps an alternative is to somehow mark the load as 'this is really only align 4, please don't increase it', although that could just cause cases where things can't be optimized as well as they could be..

Can you give an example of a piece of IR that breaks and maybe finding
if there is an optimisation pass that, if omitted, produces good code?

Attached is the IR I am using (produced from the C code in my original mail), when run through opt -instcombine the behavior I noted is observed.

Regards,
Bradley Smith

test.ll (1021 Bytes)

This could indeed be the cause, although from what I understand of the comments in CGRecordLayourBuilder this is entirely intentional behaviour. (Although I could be wrong).

Regards,
Bradley Smith

I can fathom why this was intentional, but as you said, it can't
possibly take ABI alignment into account. What about the GNU attribute
alignment? Can't we lower the variable with a fake attribute,
generated by Clang, that forces ABI alignment?

cheers,
--renato

PS: I love using bitfields. I hate supporting bitfields. Damn dichotomy!

Is “ABI alignment” just the alignment of an i64 as specified in DataLayout, or is it something else? Maybe we are computing that alignment incorrectly, and the alignment of an i40 should be 4.

Alternatively, we could have Clang slap ‘align 4’ on the global and call it fixed.

Is "ABI alignment" just the alignment of an i64 as specified in DataLayout,
or is it something else? Maybe we are computing that alignment incorrectly,
and the alignment of an i40 should be 4.

Alignment of bitfields in ARM depends on the alignment of the declared
type. In this case, int, or 4 bytes. What happened in this example is
that there are more bits than 32, but the ABI states that you must use
containers aligned to the declared type, so even if you use more than
32 bits, you still need to align to 4 bytes. This essentially means
that you just keep using 4-byte integer containers until all bits fit
inside.

Since i40 doesn't exist in any target, LLVM does the obvious thing and
promotes it to i64, but again, on ARM, long long (which is what i64
means) aligns to 8 bytes, then you have the conflict. I cannot imagine
a pass in LLVM that would see an i40 and split it into i32+i8 and
convert to { i32, i32 } just because it used to be a bitfield. Too
many things can go wrong, including alignment.

Alternatively, we could have Clang slap 'align 4' on the global and call it
fixed.

Would LLVM understand that correctly? Would the ARM backend interpret
that correctly and "realise" it's supposed to extend only the extra i8
to an i32? Would that work on other backends? Or would it be better to
let Clang, which knows about the ARM ABI, to emit { i32, i32 }?

I think those would be the questions that I'd be trying to answer
before proposing any changes to how we represent bitfields in the IR.

cheers,
--renato

> Is "ABI alignment" just the alignment of an i64 as specified in
DataLayout,
> or is it something else? Maybe we are computing that alignment
incorrectly,
> and the alignment of an i40 should be 4.

Alignment of bitfields in ARM depends on the alignment of the declared
type. In this case, int, or 4 bytes. What happened in this example is
that there are more bits than 32, but the ABI states that you must use
containers aligned to the declared type, so even if you use more than
32 bits, you still need to align to 4 bytes. This essentially means
that you just keep using 4-byte integer containers until all bits fit
inside.

Since i40 doesn't exist in any target, LLVM does the obvious thing and
promotes it to i64, but again, on ARM, long long (which is what i64
means) aligns to 8 bytes, then you have the conflict. I cannot imagine
a pass in LLVM that would see an i40 and split it into i32+i8 and
convert to { i32, i32 } just because it used to be a bitfield. Too
many things can go wrong, including alignment.

> Alternatively, we could have Clang slap 'align 4' on the global and call
it
> fixed.

Would LLVM understand that correctly?

Why wouldn't it? It looks like the problem is shallow: we happen to emit a
global in such a way that LLVM thinks it should be 8 byte aligned and Clang
wanted it to be 4 byte aligned. If the alignment inferred from the type
isn't the one we wanted, emitting an explicit alignment is exactly the
right thing to do.

Would the ARM backend interpret
that correctly and "realise" it's supposed to extend only the extra i8
to an i32? Would that work on other backends? Or would it be better to

let Clang, which knows about the ARM ABI, to emit { i32, i32 }?

The type we use for a global variable, beyond the size and implied
alignment, is not supposed to matter. If some backend is using the type of
a global variable for something ABI-relevant, we have a problem (we assume
that we can change the type on a whim, for instance to make it match the
type of a constant initializer).

The type does matter, for instance, on a big vs little endian issue, as to where the final bits will end up. Literal i40 may work for big endian, but has to be converted to i32 + i32 for little endian.

That depends on the type used for loads and stores, not on the type used
for the global, no?

Yes. I agree that having "{ i40 } align 4" will probably do the trick,
this is why I proposed it back then. I just think we should try both
solutions and use what's better.

Making the back-end understand subtleties of types and alignments due
to bitfield ABI may not be the right place, but it can work. And Clang
already knows about ABI issues and bitfields, this is why I think it's
a better place to fix that. But I'm not against doing it in the
back-end and add align 4 to int bitfields types, at least on ARM.

cheers,
--renato