[RFC] Making Bit-field Codegen Poison Compatible

jmciver · June 16, 2022, 6:29pm

Abstract

Current memory semantics state that loading an uninitialized memory location yields undef. This prevents optimization of single branch initializations due to a mismatch in undefined behavior type between the undef of the uninitialized variable and other variables which may be poison. The solution is to convert uninitialized memory accesses to poison. However, this is problematic for bit-fields where single member variable initialization always results in loading uninitialized memory. This proposal outlines the current limitations with respect to bit-fields and potential solutions to making the clang bit-field codegen compatible with uninitialized memory loads generating poison.

This work is being proposed under Google Summer of Code 2022 topic Remove undef: move uninitialized memory to poison.

Mentee: John McIver @jmciver
Mentor: Nuno Lopes @nlopes

Introduction

Currently the LLVM IR uses undef when dereferencing an uninitialized memory location. This prevents optimization of single branch initializations from being performed. The following is an example of single branch initialization:

int someFunction();

int phi_example(bool cond) {
  int a;

  if (cond) {
    // No assumptions can be made about the value assigned.
    a = someFunction();
  }

  return a;
}

After optimization, the following IR is produced:

define i32 @phi_example(i1 zeroext %cond) {
entry:
  br i1 %cond, label %if.then, label %if.end

if.then:
  %call = call i32 @someFunction()
  br label %if.end

if.end:
  %a.0 = phi i32 [ %call, %if.then ], [ undef, %entry ]
  ret i32 %a.0
}

However, because %call can be poison and poison is stronger than undef, we cannot replace the phi with %call as it may make %a less defined for the else case.

The solution is to convert loading of uninitialized memory locations to emit poison. This proposal outlines implementation changes to clang codegen of loads related to bit-fields. This work will contribute to the eventual retirement of undef within the LLVM IR.

Bit-field Initialization

Bit-fields initialization can occur either by a struct initializer or individual member assignment. However, when struct members are individually initialized through assignment or a struct initializer with non-constants is used, a load, value, shift, clear, and set sequence occurs.

The following code provides a simple example of a two bit-field based struct:

struct Bitfields {
  unsigned int a : 3;
  unsigned int b : 3;
};

unsigned int load_example() {
  Bitfields s;
  s.a = 4;
  s.b = 6;
  return s.a + s.b; // Result is 10
}

Which results in the following IR annotated with poison semantics:

%struct.Bitfields = type { i8, [3 x i8] }

define i32 @load_example() {
entry:
  %s = alloca %struct.Bitfields, align 4
  ; No store has been performed on %s, so memory is poison

  ; Attempt to store 4 into “a” bit-field
  %bf.load1 = load i8, ptr %s, align 4    ; Load contiguous bit-field memory, result poison
  %bf.clear1 = and i8 %bf.load1, -8       ; Propagate poison
  %bf.set1 = or i8 %bf.clear1, 4          ; Propagate poison
  store i8 %bf.set1, ptr %s, align 4      ; Store poison

  ; Attempt to store 6 into “b” bit-field
  %bf.load2 = load i8, ptr %s, align 4    ; Previous store was poison, result poison
  %bf.clear2 = and i8 %bf.load2, -57      ; Propagate poison
  %bf.set2 = or i8 %bf.clear2, 48         ; Propagate poison
  store i8 %bf.set2, ptr %s, align 4      ; Store poison

  ; Perform addition of bit-fields
  %bf.load3 = load i8, ptr %s, align 4    ; Load poison
  %bf.clear3 = and i8 %bf.load3, 7
  %bf.cast3 = zext i8 %bf.clear3 to i32
  %bf.load4 = load i8, ptr %s, align 4    ; Load poison
  %bf.lshr4 = lshr i8 %bf.load4, 3
  %bf.clear4 = and i8 %bf.lshr4, 7
  %bf.cast4 = zext i8 %bf.clear4 to i32
  %add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is poison
  ret i32 %add
}

In the C++ example the expectation is the bit-field locations have been initialized. When the semantics of uninitialized memory loads is converted to yield poison the IR assumes poison, resulting in discontinuity. Without modification all subsequent bit-field load, modify, and store cycles will result in poison due to poison propagation semantics in IR.

Implementation Steps

Three incremental implementation steps are proposed. The first is to make clang compliant with future poison memory semantics. The second and third are to reduce the performance impact.

Emit Freeze on all Bit-field Loads

The first implementation is to apply a freeze instruction post every bit-field load. This resolves the initial load of an uninitialized memory address by converting poison to a value of the target type. A side effect is hiding poison in the computation of the value stored rather than only uninitialized memory loads. The following IR shows the effect this modification would have on the original IR listing from the previous section.

%struct.Bitfields = type { i8, [3 x i8] }

define i32 @load_example() {

entry:
  %s = alloca %struct.Bitfields, align 4
  ; No store has been performed on %s, so memory is poison

  ; Attempt to store 4 into “a” bit-field
  %bf.load1 = load i8, ptr %s, align 4    ; Load contiguous bit-field memory, result poison
  %bf.freeze1 = freeze i8 %bf.load1       ; Inserted freeze instruction, result is an integer
  %bf.clear1 = and i8 %bf.freeze1, -8     ; Zeroize first three bits
  %bf.set1 = or i8 %bf.clear1, 4          ; 0bxxxxx100
  store i8 %bf.set1, ptr %s, align 4      ; Store integer value

  ; Attempt to store 6 into “b” bit-field
  %bf.load2 = load i8, ptr %s, align 4    ; Load contiguous bit-field memory
                                          ; 0bxxxxx100
  %bf.freeze2 = freeze i8 %bf.load2       ; Inserted freeze instruction, result is 0bxxxxx100
  %bf.clear2 = and i8 %bf.freeze2, -57    ; Zeroize bits 3 through 5
  %bf.set2 = or i8 %bf.clear2, 48         ; 0bxx110100
  store i8 %bf.set2, ptr %s, align 4      ; Store integer value

  ; Perform addition of bit-fields
  %bf.load3 = load i8, ptr %s, align 4
  %bf.freeze3 = freeze i8 %bf.load3
  %bf.clear3 = and i8 %bf.freeze3, 7
  %bf.cast3 = zext i8 %bf.clear3 to i32
  %bf.load4 = load i8, ptr %s, align 4
  %bf.freeze4 = freeze i8 %bf.load4
  %bf.lshr4 = lshr i8 %bf.freeze4, 3
  %bf.clear4 = and i8 %bf.lshr4, 7
  %bf.cast4 = zext i8 %bf.clear4 to i32
  %add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is 10
  ret i32 %add
}

This solution is compatible with both stack and heap allocated objects. However, the added freeze instructions may mask some undefined behavior and prevent LLVM from performing related optimizations. Once this modification is in place the test suite will be used to determine runtime impact.

Zero Initialize and Forego Freeze on Stack Allocated Bit-fields

The second implementation step is to test zero initialization of all stack-allocated bit-fields. Because a store is performed per adjacent bit-fields, loads induced by individual bit-field initializations do not contain poison. The following adds a zeroization store instruction to the original IR listing. A limitation to this solution is that zeroization of heap allocated bit-field objects cannot be accomplished.

%struct.Bitfields = type { i8, [3 x i8] }

define i32 @load_example() {
entry:
  %s = alloca %struct.Bitfields, align 4
  store i8 0, ptr %s, align 4             ; Insert zeroization store to bit-field
  ; Zeroization has been performed

  ; Attempt to store 4 into “a” bit-field
  %bf.load1 = load i8, ptr %s, align 4    ; Load contiguous bit-field memory, result 0b0
  %bf.clear1 = and i8 %bf.load1, -8
  %bf.set1 = or i8 %bf.clear1, 4          ; Result 0b00000100
  store i8 %bf.set1, ptr %s, align 4      ; Store 0b00000100

  ; Attempt to store 6 into “b” bit-field
  %bf.load2 = load i8, ptr %s, align 4    ; Load 0b00000100
  %bf.clear2 = and i8 %bf.load2, -57
  %bf.set2 = or i8 %bf.clear2, 48         ; Result 0b00110100
  store i8 %bf.set2, ptr %s, align 4      ; Store 0b00110100

  ; Perform addition of bit-fields
  %bf.load3 = load i8, ptr %s, align 4
  %bf.clear3 = and i8 %bf.load3, 7
  %bf.cast3 = zext i8 %bf.clear3 to i32
  %bf.load4 = load i8, ptr %s, align 4
  %bf.lshr4 = lshr i8 %bf.load4, 3
  %bf.clear4 = and i8 %bf.lshr4, 7
  %bf.cast4 = zext i8 %bf.clear4 to i32
  %add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is 10
  ret i32 %add
}

Freeze Based on Access Analysis

The third implementation step is to limit the emission of freeze instructions based on the conjunction of two factors. First, is the allocation of type heap or type unknown? Second, is this the first access to the memory location containing the bit-field? If both statements are true, then a freeze instruction is produced.

This will require the creation of a data structure to map adjacent bit-fields to initialization state of the element of the type list. The following example shows the reduction in freeze instructions for a bit-field with allocation type heap or type unknown:

%struct.Bitfields = type { i8, [3 x i8] }

define i32 @load_example(ptr %s) {
entry:
  ; Cannot assume heap or stack allocation type of bit-field and if memory has
  ; prior initialization.

  ; Attempt to store 4 into "a" bit-field
  %bf.load1 = load i8, ptr %s, align 4    ; Load contiguous bit-field memory, not initialized,
                                          ; maybe poison
  %bf.freeze1 = freeze i8 %bf.load1       ; Inserted freeze instruction, result is an integer
  %bf.clear1 = and i8 %bf.freeze1, -8     ; Zeroize first three bits
  %bf.set1 = or i8 %bf.clear1, 4          ; 0bxxxxx100
  store i8 %bf.set1, ptr %s, align 4      ; Store integer value

  ; Attempt to store 6 into "b" bit-field
  %bf.load2 = load i8, ptr %s, align 4    ; Load contiguous bit-field memory
                                          ; 0bxxxxx100
  %bf.clear2 = and i8 %bf.load2, -57      ; Zeroize bits 3 through 5
  %bf.set2 = or i8 %bf.clear2, 48         ; 0bxx110100
  store i8 %bf.set2, ptr %s, align 4      ; Store integer value

  ; Perform addition for bit-fields
  %bf.load3 = load i8, ptr %s, align 4
  %bf.clear3 = and i8 %bf.load3, 7
  %bf.cast3 = zext i8 %bf.clear3 to i32
  %bf.load4 = load i8, ptr %s, align 4
  %bf.lshr4 = lshr i8 %bf.load4, 3
  %bf.clear4 = and i8 %bf.lshr4, 7
  %bf.cast4 = zext i8 %bf.clear4 to i32
  %add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is 10
  ret i32 %add
}

efriedma-quic · June 16, 2022, 8:10pm

For " Emit Freeze on all Bit-field Loads", I assume you’re specifically freezing loads that that are generated as part of assignment to a bitfield? I’m not sure why you’d need to freeze a simple load. Unless I’m forgetting something…

For “Freeze Based on Access Analysis”, it looks like you’re planning to write a generic IR pass that does freeze removal for everything, not just bitfields?

nikic · June 16, 2022, 8:27pm

I think inserting freeze for bitfield manipulation is fine – nowadays, we are quite good at managing freeze in the middle-end optimizer, so I’m not too concerned about this causing performance regressions anymore.

I’m not sure you really need any optimization beyond that. Stack-allocated objects will generally be SROAd, at which point freeze poison will fold to zero anyway. For objects that cannot be SROAd (not on stack or escaped), GVN will perform store-to-load forwarding, and redundant freezes will be dropped based on guaranteed-not-poison reasoning.

nlopes · June 16, 2022, 8:31pm

Yes, that’s a typo, sorry. You only need to freeze the loads corresponding to bit-field stores. No need for freeze in bitfield loads.

No, it’s something simpler The idea is just to avoid introducing freezes in clang codegen when generating code for bit-field stores. You can skip the freeze after the load if you know that specific field has already been initialized before. But as Nikita says, it may not be needed at all. We will measure perf impact.

aeubanks · June 16, 2022, 8:41pm

Slightly related, it would be good to have an LLVM doc that talks about migrating from undef to poison for out of tree users and mentions strategies like this for when we do eventually get rid of undef.

efriedma-quic · June 16, 2022, 8:44pm

I’m not sure how you identify that a field is initialized without some sort of alias analysis. Even if the field was initialized at some point in the past, it could be overwritten with a poison value. For example:

struct S { int a; int b: 10; };
union U {int a; struct S s; };
int f() {
  union U a = {10}, b = {10}; // Field "b" is poison
  a.s.b = 10; // Freeze initial bitfield load
  a = b; // Overwrite with poison
  a.s.b = 10; // Need freeze again
  return a.s.b;
}

jmciver · June 16, 2022, 9:32pm

The target scope I am working toward is getting the bit-fields to a state of full initialization so that optimizations could be performed. If poison was introduced upon later re-assignment I would consider that intentional. That is an interesting case (nice example) and I agree that would require alias analysis in order for the assignment to work. Emitting a freeze on every bit-field load, modify, and store cycle would resolve issues like this, but will cover up legitimate poison propagation.

PS. I apologize for the typo in the section heading. As @nlopes stated this proposal only applies to the loads in the service of bit-field initialization.

nlopes · June 16, 2022, 9:33pm

You are right that one needs alias analysis, but I was hopping some simple pattern matching on clang side would cover half of the cases.
We lose information when we go to LLVM IR, specially in the case of bitfields, so getting rid of freezes for bitfields is tricky.
But, anyway, point taken, thank you, we have to study this optimization further.

Topic		Replies	Views
[RFC] Load Instruction: Uninitialized Memory Semantics IR & Optimizations llvm	40	2242	July 23, 2023
Remove undef: move uninitialized memory to poison GSoC	13	1487	July 15, 2022
What is the status of the "Killing Undef and Spreading Poison" RFC? LLVM Dev List Archives	8	130	March 22, 2018
RFC: Proposal to Remove Poison LLVM Dev List Archives	19	185	February 17, 2015
Undef and Poison round table follow-up & a plan LLVM Dev List Archives	10	263	October 12, 2020