> By default, clang emits all bitfield load/store operations using the width of the entire sequence of bitfield members. If you look at the LLVM IR for your testcase, all the bitfield operations are i16. (For thread safety, the C/C++ standards treat a sequence of bitfield members as a single "field".)
> If you look at the assembly, though, an "andb $-2, (%rdi)" slips in. This is specific to the x86 backend: it's narrowing the store to save a couple bytes in the encoding, and a potential decoding stall due to a 2-byte immediate. Maybe we shouldn't do that, or we should guard it with a better heuristic.
Until the end of LLVM IR phase all operations are still with type i16. In DAGCombine some load/store and related operations are reduced to byte, but others are not, so we have the narrow store and wide load, and caused the load to be stalled for long time. It is done in function DAGCombiner::ReduceLoadOpStoreWidth. Can we guard it with a new command line option -reduce-load-op-store-width=true?