Instcombine, struct w/ padding & aggregate loads/stores as scalars

Hi,

LLVM sometimes optimizes small aggregate load/store into a single integer load/store.
Is that a valid transformation if such an aggregate contains uninitialized field padding?

This popped up during investigation of a miscompilation in CUDA - Incorrect code generated when -O1 or more is enabled · Issue #59156 · llvm/llvm-project · GitHub

The root cause is that instcombine eliminates the store when the padding is uninitialized:

My question is – who did the wrong thing here? Some preceding pass (SROA?) which packed uninitialized struct fields into an int which left it with poison bits, or the instcombine which discarded the write with the data we expected to read back later?

–Artem

Poison is byte-wise, so whatever pass forgot to freeze the bits
before inserting them into larger type is at fault.
Can you give minimal end-to-end example?

Roman

AFAICT, the sequence of events looks roughly like this:

  • struct is set up as alloca + per-field stores to init the fields. The pad is left uninitialized.
  • it’s memcpy’ed into another alloca
  • SROA converts memcpy into partially filled i64 + store i64
  • instcombine eliminates the i64 store.

I believe the faulty transform here is InstCombine converting a memcpy into an integer load/store pair: llvm-project/InstCombineCalls.cpp at d93eb3a942d83029e4cc68354e3e441db957bf1b · llvm/llvm-project · GitHub

This transform is “well known” to be incorrect, because it will propagate poison in individual bytes across the whole value. It sounds like you’re the lucky person to hit this in a real-world miscompile.

Nominally, the solution to this is simple, which is to convert the memcpy into a <8 x i8> style load/store instead, which preserves poison. The reality is that this does not optimize well in practice.

An alternative is the introduction of a byte type ([RFC] Introducing a byte type to LLVM), which solves this problem plus questions around provenance.

cc @nlopes

1 Like

Actually, there is a second culprit here, which is this transform: Compiler Explorer

We currently remove insertvalue of undef, which is not correct if the original value at that position was poison.

Fixing that issue should avoid the problem as well.

I’ve fixed the insertvalue simplification bug in [InstSimplify] Do not remove insertvalue of undef into poison · llvm/llvm-project@8a09875 · GitHub.

This does appear to fix your original issue, the resulting IR is now:

define dso_local void @_Z4bad1PiiP7Wrapper(ptr nocapture noundef writeonly %0, i32 noundef %1, ptr nocapture noundef %2) local_unnamed_addr #0 !dbg !11 {
  %.sroa.04.0.insert.ext = zext i32 %1 to i64, !dbg !15
  %.sroa.04.0.insert.insert = or i64 %.sroa.04.0.insert.ext, 4294967296, !dbg !15
  store i64 %.sroa.04.0.insert.insert, ptr %2, align 4, !dbg !20
  store i32 %1, ptr %0, align 4, !dbg !21, !tbaa !22
  ret void, !dbg !26
}

I also put up ⚙ D139615 [InstCombine] Expand memcpy to vector instead of integer load/store for the memcpy expansion issue, but we probably don’t want to pursue this at this time. For example, this is the mess it produces in this case:

define dso_local void @_Z4bad1PiiP7Wrapper(ptr nocapture noundef writeonly %0, i32 noundef %1, ptr nocapture noundef %2) local_unnamed_addr #0 !dbg !11 {
  %4 = bitcast i32 %1 to <4 x i8>, !dbg !15
  %.sroa.04.0.vecblend = shufflevector <4 x i8> %4, <4 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>, !dbg !15
  %.sroa.04.7.vec.insert = shufflevector <8 x i8> %.sroa.04.0.vecblend, <8 x i8> <i8 poison, i8 poison, i8 poison, i8 poison, i8 1, i8 0, i8 0, i8 poison>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 undef>, !dbg !15
  store <8 x i8> %.sroa.04.7.vec.insert, ptr %2, align 4, !dbg !20
  %bc = bitcast <8 x i8> %.sroa.04.7.vec.insert to <2 x i32>, !dbg !21
  %5 = extractelement <2 x i32> %bc, i64 0, !dbg !21
  store i32 %5, ptr %0, align 4, !dbg !22, !tbaa !23
  ret void, !dbg !27
}

This is something we’ll have to address before we switch uninitialized memory to be poison, but probably we need the byte type to make that a reality anyway.

Thank you!

I can confirm that things do indeed work now.