I’m working on improving clang’s lowering of bitfields to underlying storage units. The current algorithm is making the assumption that unaligned memory accesses are cheap. For instance, given[*]:
struct S { char a : 8; char b : 8; };
it’ll place those two bitfields into a single i16 storage unit. Ignoring that the structure itself is char-aligned. This causes difficulties on targets where an unaligned i16 memory access has to be synthesized from pairs of i8 accesses.
There doesn’t appear to be an existing TargetInfo predicate for ‘are unaligned accesses cheap’ – either in TargetInfo
(clang/include/clang/Basic/TargetInfo.h), or TargetCodeGenInfo
(clang/lib/CodeGen/TargetInfo.h).
- did I miss it?
- where might be best to add this? As I need it from
CGRecordLowering::accumulateBitFields
, TargetInfo
seems better – it already has some non-ABI target stuff like getRegisterWidth
.
[*] presume these are bunches of small bitfields, and ignore that it would be better in this case to make these unsigned short
.
A number of years ago, Clang was intentionally changed to lower such bitfields into large integer types in order to give the backend the ability to do a better job of codegen. I don’t recall the rationale/justification anymore, but I’m a bit skeptical of a proposal to change it back.
Do you have an example of bad lowering arising from this struct? Trying simple examples seem to show appropriate codegen for this, emitting an appropriate single load-byte. (Including on architectures that don’t support misaligned loads).
That’s interesting. I do have examples of badness. There are a couple of issues.
-
The motivating case was similar to struct X { char a : 8; char b : 8; char c:8 };
We ended up with an i24, with alignment 1. That was really awkward to generate code for.
-
We’ll create storage units that are (unnecessarily) larger than a register (for example, consider extending the above with a bunch more bitfields). Even if you have register-pair load/stores, that increases register pressure.
While sometimes, the larger unit is accessed at smaller widths, we have observed some quite poor codegen involving unaligned loads, bitmanipulation and then unaligned stores, where some of those loads/stores never touch the bitfields that are being altered.
The current code is confusing, and repeats work – see ⚙ D157914 [clang][NFC]: bitfield storage units which does some refactoring (something that I think’s good in its own right). It’s trying to do two different things in a single pass, and I don’t think that’s possible.
-
figure out which bitfields must be in the same storage unit – i.e. bitfields that share parts of a single byte.
-
which storage units can be merged.
Currently that merging decision is made simply by looking at the next bitfield, which has insufficient information. For instance why should struct Neat { char a : 8; char b : 8;};
have a single i16, but struct Gap { char a : 7; char b : 7;};
have two i8s? IMHO those should be the same.
I think paying attention to register size, and the complexity of unaligned accesses can guide storage unit merging. Do you have particular benchmarks/examples of interest.
Thanks, my grep-fu failed me