GEPs into vectors of overaligned elements are currently allowed and generated by generic passes like SROA. However, such GEPs are broken, because offsets are incorrectly computed in many cases.
I’d like to discuss how to improve the situation.
See also https://discourse.llvm.org/t/status-of-overaligned-i8 for a recent similar discussion regarding overaligned types.
Vectors can contain overaligned elements, in which case the elements are tightly packed in the vector, not respecting the ABI alignment (LLVM Language Reference Manual — LLVM 16.0.0git documentation).
GEPs into such vectors are currently allowed. However, offsets of such GEPs are inconsistently computed, as many places (including
GEPOperator itself) incorrectly use
getTypeAllocSize() for element sizes, which respects ABI alignment.
The dedicated GEP guide (The Often Misunderstood GEP Instruction — LLVM 16.0.0git documentation) mentions that GEPs into vectors are not recommended, and that GEPs into vectors might be outright disallowed in the future.
Even if frontends avoid GEPs into vectors, generic passes introduce them, so there is a need for change: For example, SROA tries to rewrite byte-based accesses as "natural GEP"s using
DataLayout::getGEPIndexForOffset which (correctly) returns GEP indices into such a vector if the byte access happens to match a vector element. However, later steps (e.g.
GetElementPtrInst::accumulateConstantOffset) compute incorrect offsets.
See the test case
overaligned-datalayout.ll in ⚙ D139034 [IR] GEP: Fix byte-offsets in vectors of overaligned types for a miscompilation caused by this issue.
I see the following options to improve the situation:
I recently tried to fix these GEP offsets in ⚙ D139034 [IR] GEP: Fix byte-offsets in vectors of overaligned types
But @nikic correctly pointed out that there are far more places in LLVM that rely on the same assumption, and suggested to add some sort of
gep_offset_iterator that could be used everywhere instead. It seems this would be possible, but it’s not clear to me whether the nontrivial work for that is actually needed.
Given the currently broken state of such GEPs, it seems unlikely there are any users depending on such GEPs. For DXIL, DXC seems to replace vectors by arrays in case objects are alloca’ed (@beanz, can you comment on this?).
So we could forbid such GEPs instead, which I’d personally prefer. I’m not sure how such a rule could be enforced though, except for updating the LangRef and adding a few asserts?
This would formally also be an option. I don’t have an opinion on this, but it seems to be a fairly large change that is not sufficiently motivated by this corner case?