Summary
GEPs into vectors of overaligned elements are currently allowed and generated by generic passes like SROA. However, such GEPs are broken, because offsets are incorrectly computed in many cases.
I’d like to discuss how to improve the situation.
See also https://discourse.llvm.org/t/status-of-overaligned-i8 for a recent similar discussion regarding overaligned types.
Details
Vectors can contain overaligned elements, in which case the elements are tightly packed in the vector, not respecting the ABI alignment (LLVM Language Reference Manual — LLVM 16.0.0git documentation).
GEPs into such vectors are currently allowed. However, offsets of such GEPs are inconsistently computed, as many places (including GEPOperator
itself) incorrectly use getTypeAllocSize()
for element sizes, which respects ABI alignment.
The dedicated GEP guide (The Often Misunderstood GEP Instruction — LLVM 16.0.0git documentation) mentions that GEPs into vectors are not recommended, and that GEPs into vectors might be outright disallowed in the future.
Even if frontends avoid GEPs into vectors, generic passes introduce them, so there is a need for change: For example, SROA tries to rewrite byte-based accesses as "natural GEP"s using DataLayout::getGEPIndexForOffset
which (correctly) returns GEP indices into such a vector if the byte access happens to match a vector element. However, later steps (e.g. GetElementPtrInst::accumulateConstantOffset
) compute incorrect offsets.
See the test case overaligned-datalayout.ll
in ⚙ D139034 [IR] GEP: Fix byte-offsets in vectors of overaligned types for a miscompilation caused by this issue.
Options
I see the following options to improve the situation:
Fix offset calculations
I recently tried to fix these GEP offsets in ⚙ D139034 [IR] GEP: Fix byte-offsets in vectors of overaligned types
But @nikic correctly pointed out that there are far more places in LLVM that rely on the same assumption, and suggested to add some sort of gep_offset_iterator
that could be used everywhere instead. It seems this would be possible, but it’s not clear to me whether the nontrivial work for that is actually needed.
Forbid GEPs into vectors of overaligned elements
Given the currently broken state of such GEPs, it seems unlikely there are any users depending on such GEPs. For DXIL, DXC seems to replace vectors by arrays in case objects are alloca’ed (@beanz, can you comment on this?).
So we could forbid such GEPs instead, which I’d personally prefer. I’m not sure how such a rule could be enforced though, except for updating the LangRef and adding a few asserts?
Forbid GEPs into vectors
This would formally also be an option. I don’t have an opinion on this, but it seems to be a fairly large change that is not sufficiently motivated by this corner case?