Memory alignment model on AVX, AVX2 and AVX-512 targets

Hi,

I think that
def FeatureVectorUAMem : SubtargetFeature<“vector-unaligned-mem”,
“HasVectorUAMem”, “true”,
“Allow unaligned memory operands on vector/SIMD instructions”>;

should be switched-ON on AVX and AVX-512 instructions because:

According to the AVX spec:
“Most arithmetic and data processing instructions encoded using the VEX prefix and
performing memory accesses have more flexible memory alignment requirements
than instructions that are encoded without the VEX prefix. Specifically,
• With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions,
most VEX-encoded, arithmetic and data processing instructions operate in
a flexible environment regarding memory address alignment, i.e. VEX-encoded
instruction with 32-byte or 16-byte load semantics will support unaligned load
operation by default. Memory arguments for most instructions with VEX prefix
operate normally without causing #GP(0) on any byte-granularity alignment
(unlike Legacy SSE instructions).”

And the same for AVX-512.

We do not require any alignment while folding loads on the “Peephole Optimizations” on these targets.

  • Elena

FWIW, this makes sense to me. I’d be interested to hear from folks that are supporting AMD processors which do support AVX to ensure that there isn’t an undue runtime penalty for these.

AFAIK, there is no additional penalty for AMD processors.

Ok, if no objections, I’ll add “HasVectorUAMem” feature to all AVX processors.

I’ll allow to fold an unaligned load to instruction.

Ok, if no objections, I’ll add “HasVectorUAMem” feature to all AVX processors.

I’ll allow to fold an unaligned load to instruction.

This is memop definition:

def memop : PatFrag<(ops node:$ptr), (load node:$ptr), [{

return Subtarget->hasVectorUAMem()

cast(N)->getAlignment() >= 16;

}]>;

I want to fold all unaligned loads in instructions on AVX architectures.

I’ll write some tests for this.