Abstract
Current memory semantics state that loading an uninitialized memory location yields undef. This prevents optimization of single branch initializations due to a mismatch in undefined behavior type between the undef of the uninitialized variable and other variables which may be poison. The solution is to convert uninitialized memory accesses to poison. However, this is problematic for bit-fields where single member variable initialization always results in loading uninitialized memory. This proposal outlines the current limitations with respect to bit-fields and potential solutions to making the clang bit-field codegen compatible with uninitialized memory loads generating poison.
This work is being proposed under Google Summer of Code 2022 topic Remove undef: move uninitialized memory to poison.
Mentee: John McIver @jmciver
Mentor: Nuno Lopes @nlopes
Introduction
Currently the LLVM IR uses undef when dereferencing an uninitialized memory location. This prevents optimization of single branch initializations from being performed. The following is an example of single branch initialization:
int someFunction();
int phi_example(bool cond) {
int a;
if (cond) {
// No assumptions can be made about the value assigned.
a = someFunction();
}
return a;
}
After optimization, the following IR is produced:
define i32 @phi_example(i1 zeroext %cond) {
entry:
br i1 %cond, label %if.then, label %if.end
if.then:
%call = call i32 @someFunction()
br label %if.end
if.end:
%a.0 = phi i32 [ %call, %if.then ], [ undef, %entry ]
ret i32 %a.0
}
However, because %call
can be poison and poison is stronger than undef, we cannot replace the phi
with %call
as it may make %a
less defined for the else case.
The solution is to convert loading of uninitialized memory locations to emit poison. This proposal outlines implementation changes to clang codegen of loads related to bit-fields. This work will contribute to the eventual retirement of undef within the LLVM IR.
Bit-field Initialization
Bit-fields initialization can occur either by a struct initializer or individual member assignment. However, when struct members are individually initialized through assignment or a struct initializer with non-constants is used, a load, value, shift, clear, and set sequence occurs.
The following code provides a simple example of a two bit-field based struct
:
struct Bitfields {
unsigned int a : 3;
unsigned int b : 3;
};
unsigned int load_example() {
Bitfields s;
s.a = 4;
s.b = 6;
return s.a + s.b; // Result is 10
}
Which results in the following IR annotated with poison semantics:
%struct.Bitfields = type { i8, [3 x i8] }
define i32 @load_example() {
entry:
%s = alloca %struct.Bitfields, align 4
; No store has been performed on %s, so memory is poison
; Attempt to store 4 into “a” bit-field
%bf.load1 = load i8, ptr %s, align 4 ; Load contiguous bit-field memory, result poison
%bf.clear1 = and i8 %bf.load1, -8 ; Propagate poison
%bf.set1 = or i8 %bf.clear1, 4 ; Propagate poison
store i8 %bf.set1, ptr %s, align 4 ; Store poison
; Attempt to store 6 into “b” bit-field
%bf.load2 = load i8, ptr %s, align 4 ; Previous store was poison, result poison
%bf.clear2 = and i8 %bf.load2, -57 ; Propagate poison
%bf.set2 = or i8 %bf.clear2, 48 ; Propagate poison
store i8 %bf.set2, ptr %s, align 4 ; Store poison
; Perform addition of bit-fields
%bf.load3 = load i8, ptr %s, align 4 ; Load poison
%bf.clear3 = and i8 %bf.load3, 7
%bf.cast3 = zext i8 %bf.clear3 to i32
%bf.load4 = load i8, ptr %s, align 4 ; Load poison
%bf.lshr4 = lshr i8 %bf.load4, 3
%bf.clear4 = and i8 %bf.lshr4, 7
%bf.cast4 = zext i8 %bf.clear4 to i32
%add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is poison
ret i32 %add
}
In the C++ example the expectation is the bit-field locations have been initialized. When the semantics of uninitialized memory loads is converted to yield poison the IR assumes poison, resulting in discontinuity. Without modification all subsequent bit-field load, modify, and store cycles will result in poison due to poison propagation semantics in IR.
Implementation Steps
Three incremental implementation steps are proposed. The first is to make clang compliant with future poison memory semantics. The second and third are to reduce the performance impact.
Emit Freeze on all Bit-field Loads
The first implementation is to apply a freeze instruction post every bit-field load. This resolves the initial load of an uninitialized memory address by converting poison to a value of the target type. A side effect is hiding poison in the computation of the value stored rather than only uninitialized memory loads. The following IR shows the effect this modification would have on the original IR listing from the previous section.
%struct.Bitfields = type { i8, [3 x i8] }
define i32 @load_example() {
entry:
%s = alloca %struct.Bitfields, align 4
; No store has been performed on %s, so memory is poison
; Attempt to store 4 into “a” bit-field
%bf.load1 = load i8, ptr %s, align 4 ; Load contiguous bit-field memory, result poison
%bf.freeze1 = freeze i8 %bf.load1 ; Inserted freeze instruction, result is an integer
%bf.clear1 = and i8 %bf.freeze1, -8 ; Zeroize first three bits
%bf.set1 = or i8 %bf.clear1, 4 ; 0bxxxxx100
store i8 %bf.set1, ptr %s, align 4 ; Store integer value
; Attempt to store 6 into “b” bit-field
%bf.load2 = load i8, ptr %s, align 4 ; Load contiguous bit-field memory
; 0bxxxxx100
%bf.freeze2 = freeze i8 %bf.load2 ; Inserted freeze instruction, result is 0bxxxxx100
%bf.clear2 = and i8 %bf.freeze2, -57 ; Zeroize bits 3 through 5
%bf.set2 = or i8 %bf.clear2, 48 ; 0bxx110100
store i8 %bf.set2, ptr %s, align 4 ; Store integer value
; Perform addition of bit-fields
%bf.load3 = load i8, ptr %s, align 4
%bf.freeze3 = freeze i8 %bf.load3
%bf.clear3 = and i8 %bf.freeze3, 7
%bf.cast3 = zext i8 %bf.clear3 to i32
%bf.load4 = load i8, ptr %s, align 4
%bf.freeze4 = freeze i8 %bf.load4
%bf.lshr4 = lshr i8 %bf.freeze4, 3
%bf.clear4 = and i8 %bf.lshr4, 7
%bf.cast4 = zext i8 %bf.clear4 to i32
%add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is 10
ret i32 %add
}
This solution is compatible with both stack and heap allocated objects. However, the added freeze instructions may mask some undefined behavior and prevent LLVM from performing related optimizations. Once this modification is in place the test suite will be used to determine runtime impact.
Zero Initialize and Forego Freeze on Stack Allocated Bit-fields
The second implementation step is to test zero initialization of all stack-allocated bit-fields. Because a store is performed per adjacent bit-fields, loads induced by individual bit-field initializations do not contain poison. The following adds a zeroization store instruction to the original IR listing. A limitation to this solution is that zeroization of heap allocated bit-field objects cannot be accomplished.
%struct.Bitfields = type { i8, [3 x i8] }
define i32 @load_example() {
entry:
%s = alloca %struct.Bitfields, align 4
store i8 0, ptr %s, align 4 ; Insert zeroization store to bit-field
; Zeroization has been performed
; Attempt to store 4 into “a” bit-field
%bf.load1 = load i8, ptr %s, align 4 ; Load contiguous bit-field memory, result 0b0
%bf.clear1 = and i8 %bf.load1, -8
%bf.set1 = or i8 %bf.clear1, 4 ; Result 0b00000100
store i8 %bf.set1, ptr %s, align 4 ; Store 0b00000100
; Attempt to store 6 into “b” bit-field
%bf.load2 = load i8, ptr %s, align 4 ; Load 0b00000100
%bf.clear2 = and i8 %bf.load2, -57
%bf.set2 = or i8 %bf.clear2, 48 ; Result 0b00110100
store i8 %bf.set2, ptr %s, align 4 ; Store 0b00110100
; Perform addition of bit-fields
%bf.load3 = load i8, ptr %s, align 4
%bf.clear3 = and i8 %bf.load3, 7
%bf.cast3 = zext i8 %bf.clear3 to i32
%bf.load4 = load i8, ptr %s, align 4
%bf.lshr4 = lshr i8 %bf.load4, 3
%bf.clear4 = and i8 %bf.lshr4, 7
%bf.cast4 = zext i8 %bf.clear4 to i32
%add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is 10
ret i32 %add
}
Freeze Based on Access Analysis
The third implementation step is to limit the emission of freeze instructions based on the conjunction of two factors. First, is the allocation of type heap or type unknown? Second, is this the first access to the memory location containing the bit-field? If both statements are true, then a freeze instruction is produced.
This will require the creation of a data structure to map adjacent bit-fields to initialization state of the element of the type list. The following example shows the reduction in freeze instructions for a bit-field with allocation type heap or type unknown:
%struct.Bitfields = type { i8, [3 x i8] }
define i32 @load_example(ptr %s) {
entry:
; Cannot assume heap or stack allocation type of bit-field and if memory has
; prior initialization.
; Attempt to store 4 into "a" bit-field
%bf.load1 = load i8, ptr %s, align 4 ; Load contiguous bit-field memory, not initialized,
; maybe poison
%bf.freeze1 = freeze i8 %bf.load1 ; Inserted freeze instruction, result is an integer
%bf.clear1 = and i8 %bf.freeze1, -8 ; Zeroize first three bits
%bf.set1 = or i8 %bf.clear1, 4 ; 0bxxxxx100
store i8 %bf.set1, ptr %s, align 4 ; Store integer value
; Attempt to store 6 into "b" bit-field
%bf.load2 = load i8, ptr %s, align 4 ; Load contiguous bit-field memory
; 0bxxxxx100
%bf.clear2 = and i8 %bf.load2, -57 ; Zeroize bits 3 through 5
%bf.set2 = or i8 %bf.clear2, 48 ; 0bxx110100
store i8 %bf.set2, ptr %s, align 4 ; Store integer value
; Perform addition for bit-fields
%bf.load3 = load i8, ptr %s, align 4
%bf.clear3 = and i8 %bf.load3, 7
%bf.cast3 = zext i8 %bf.clear3 to i32
%bf.load4 = load i8, ptr %s, align 4
%bf.lshr4 = lshr i8 %bf.load4, 3
%bf.clear4 = and i8 %bf.lshr4, 7
%bf.cast4 = zext i8 %bf.clear4 to i32
%add = add nsw i32 %bf.cast3, %bf.cast4 ; Result is 10
ret i32 %add
}