[RFC] Load Instruction: Uninitialized Memory Semantics

Abstract

Current memory semantics state that loading an uninitialized memory location yields undef. This prevents optimization of single branch initializations due to a mismatch in undefined behavior type between the undef of the uninitialized variable and other variables which may be poison. Converting uninitialized memory accesses to poison introduces issues with bit-fields where single member variable initialization always results in loading uninitialized memory (see GSoC topic post). This proposal outlines the current limitations of uninitialized memory loads returning undef and a potential solution of uninitialized memory loads returning poison or a nondeterministic value.

Mentee: John McIver @jmciver

Mentor: Nuno Lopes @nlopes

Introduction

Interactions between undef and poison are nuanced and can be difficult to reason about. Additionally, the interaction between the two types can prevent or make existing optimizations incorrect. For these reasons we would like to replace undef with poison wherever possible.

The constant undef was created to represent the return value of uninitialized memory but was later used as any arbitrary constant in other constructs. Later, poison was added to represent signed overflow as undef was insufficient for the desired optimizations and undefined behavior (UB) was too strong. UB is used whenever an operation may trap the CPU (e.g., divide by zero), for example.

Phi with Undef and Poison

Currently the LLVM IR uses undef when dereferencing an uninitialized memory location. The following is an example of single branch initialization:

int phi_example(bool cond, int* ptr) {
  int a;
  if (cond) {
    a = *ptr;
  }
  return a;
}

After optimization, the following IR is produced:

define i32 @phi_example(i1 %cond, ptr dereferenceable(4) %ptr) {
  br i1 %cond, label %if.then, label %if.end

if.then:
  %load = load i32, ptr %ptr
  br label %if.end

if.end:
  %a.0 = phi i32 [ %load, %if.then ], [ undef, %entry ]
  ret i32 %a.0
}

Ideally, the phi could be folded to:

define i32 @phi_example(i1 %cond, ptr %ptr) {
  %load = load i32, ptr %ptr
  ret i32 %load
}

However, this is not correct if %cond were false and load were to yield poison then we would be replacing undef with poison. Poison is less defined than undef and therefore the fold is not correct.

The solution is to convert loading of uninitialized memory locations to emit poison.

Current Uninitialized Memory Semantics

The following figure shows current uninitialized memory load semantics using undef. Cases 2 shows that a uninitialized byte (0x??) results in an i8 undef. Case 3 is similar to case 4, but with 4 bytes of uninitialized memory resulting in an i32 undef. Case 4 shows one defined byte and three uninitialized bytes results in a i32 of both defined and undef values.

undef-load-semantics

For example the following xor_a function can be optimized into a single return statement of zero. If %load.2 were to come from a different pointer variable dereferencing the same or different uninitialized memory location then the return value of the optimized function would be undef.

; Example of multiple loads of the same address with undef semantics

; Unoptimized
define i32 @xor_a(ptr %a.addr) {
entry:
  %load.1 = load i32, ptr %a.addr
  %load.2 = load i32, ptr %a.addr
  ; Note: Values of %load.1 and %load.2 are the same nondeterministic value.
  %xor = xor i32 %load.1, %load.2
  ret i32 %xor
}

; Optimized
define i32 @xor_a(ptr %a.addr) {
  ret i32 0
}

Semantic Proposal

For a majority of use cases initialized loads can be converted to poison. However, this is not possible with language features such as bit-fields. Additionally, any semantic changes need to be backwards compatible with bitcode generated with undef.

Our proposal is to add a new version of the load instruction via an unique opcode. This will provide an easily detectable trigger by which automatic semantic conversion can occur. Additionally, a flag will be added to the new load instruction allowing uninitialized memory to be converted to a nondeterministic value, this provides the semantic capability to mimic undef.

The new flag !uninit_is_nondet will be used to convert poison to a nondeterministic value. Like undef-semantics, this nondeterminism will be applied at the byte level. The following figure shows examples of the byte level nondeterminism for single and multi-byte types.

poison-load-semantics

Behavioral Discrepancies

Current uninitiated memory load semantics and load duplication results in the same nondeterministic value. Duplicate loads using the proposed load semantics and !uninint_is_nondet may potentially not result in the same nondeterministic value. This is because each load is effectively performing a series of byte level nondeterministic generations.

Another consideration is load instruction relocation. If a load is using the !uninit_is_nondet flag it cannot be pushed inside a loop. This is due to the potential increase in nondeterministic results. The load outside a loop will result in a single nondeterministic value as where inside the loop body the load
can potentially generate many differing nondeterministic results.

Emplace Upgrade Path

Bitcode generated using load semantics with undef can be automatically upgraded using AutoUpgrade translation. Loads performed with the earlier opcode will be converted to the new opcode and the !uninint_is_nondet attribute will be applied.

loadv1 <ty>, ptr <pointer> ā†’ loadv2 <ty>, ptr <pointer>, !uninit_is_nondet !{}

Mem2reg Example

Taking the phi with undef and poison example with poison semantics, if the load were decorated with !uninit_is_nondet then the result would always be a nondeterministic value. This would result in a definedness mismatch at the phi as poison is less defined.

define i32 @phi_example(i1 %cond, ptr dereferenceable(4) %ptr) {
  br i1 %cond, label %if.then, label %if.end

if.then:
  %load = load i32, ptr %ptr, !uninit_is_nondet ; Causes definedness mismatch
                                                ;   at phi
  br label %if.end

if.end:
  %a.0 = phi i32 [ %load, %if.then ], [ poison, %entry ]
  ret i32 %a.0
}

To resolve this issue a freeze poison would need to be inserted in the entry:

define i32 @phi_example(i1 %cond, ptr dereferenceable(4) %ptr) {
  %freeze = freeze i32 poison                   ; Add nondeterministic value to
                                                ;   match definedness at phi
  br i1 %cond, label %if.then, label %if.end

if.then:
  %load = load i32, ptr %ptr, !uninit_is_nondet
  br label %if.end

if.end:
  %a.0 = phi i32 [ %load, %if.then ], [ %freeze, %entry ]
  ret i32 %a.0
}

Bit-field Examples

struct Bitfields {
  unsigned int a : 3;
  unsigned int b : 3;
};

unsigned int load_example() {
  Bitfields s;
  s.a = 4;
  s.b = 6;
  ...
}

With undef

The following example shows generated IR using the current undef semantics. The initial load instruction results in a nondeterministic value, which can be cleared, set, and stored. Subsequent loads for the bit-field are a combination of the nondeterministic value and set fields.

; Attempt to store 4 into ā€œaā€ bit-field
%bf.load1 = load i8, ptr %s, align 4 ; Load of uninitialized memory
                                     ;   results in sampled undef,
                                     ;   assume  0bXXXXXXXX
%bf.clear1 = and i8 %bf.load1, -8    ; Propagate 0bXXXXX000
%bf.set1 = or i8 %bf.clear1, 4       ; Propagate 0bXXXXX100
store i8 %bf.set1, ptr %s, align 4   ; Store     0bXXXXX100

; Attempt to store 6 into ā€œbā€ bit-field
%bf.load2 = load i8, ptr %s, align 4 ; Previous store was
                                     ;           0bXXXXX100
%bf.clear2 = and i8 %bf.load2, -57   ; Propagate 0bXX000100
%bf.set2 = or i8 %bf.clear2, 48      ; Propagate 0bXX110100
store i8 %bf.set2, ptr %s, align 4   ; Store     0bXX110100

With poison

The following example shows the previous IR, but with loading of uninitialized memory using poison semantics. The initial load instruction will result in poison rather than undef. Because poison propagates until it encounters a freeze instruction, subsequent loads will result in poison. This
will result in a reduced number of optimizations.

; Attempt to store 4 into ā€œaā€ bit-field
%bf.load1 = load i8, ptr %s, align 4    ; Load contiguous bit-field
                                        ;   memory, result poison
%bf.clear1 = and i8 %bf.load1, -8       ; Propagate poison
%bf.set1 = or i8 %bf.clear1, 4          ; Propagate poison
store i8 %bf.set1, ptr %s, align 4      ; Store poison

; Attempt to store 6 into ā€œbā€ bit-field
%bf.load2 = load i8, ptr %s, align 4    ; Previous store was poison,
                                        ;   result poison
%bf.clear2 = and i8 %bf.load2, -57      ; Propagate poison
%bf.set2 = or i8 %bf.clear2, 48         ; Propagate poison
store i8 %bf.set2, ptr %s, align 4      ; Store poison

With poison and !uninit_is_nondet

The following adds the !uninit_is_nondet to the with poison example on the first load. This provides a similar nondeterministic behavior to that of the undef example, which allows the clear, set, and store cycle to represent concrete values.

; Attempt to store 4 into ā€œaā€ bit-field
%bf.load1 = load i8, ptr %s, align 4, !uninit_is_nondet ; Semantically equivalent to
                                                        ;   freeze i8 poison, assume
                                                        ;           0bXXXXXXXX
%bf.clear1 = and i8 %bf.load1, -8                       ; Propagate 0bXXXXX000
%bf.set1 = or i8 %bf.clear1, 4                          ; Propagate 0bXXXXX100
store i8 %bf.set1, ptr %s, align 4                      ; Store     0bXXXXX100

; Attempt to store 6 into ā€œbā€ bit-field
%bf.load2 = load i8, ptr %s, align 4                    ; Previous store was
                                                        ;           0bXXXXX100
%bf.clear2 = and i8 %bf.load2, -57                      ; Propagate 0bXX000100
%bf.set2 = or i8 %bf.clear2, 48                         ; Propagate 0bXX110100
store i8 %bf.set2, ptr %s, align 4                      ; Store     0bXX110100

Conclusion

We propose the use of a new load instruction that supports the use of attribute !uninit_is_nondet. This enables poison based semantics where possible, provides a nondeterministic capability similar to that of undef where needed, and allows for an automatic upgrade path to existing bitcode. However, load duplication with !uninit_is_nondet is not supported.

Thanks for taking the time to read and your feedback is greatly appreciated.

In this proposal, is ā€œuninitializedā€ memory different from memory initialized with poison? Distinguishing between the two was mentioned in some of the earlier discussions, but itā€™s not clear if thatā€™s intended to be part of the proposal.

Itā€™s probably worth explicitly describing how mem2reg handles !uninit_is_nondet loads from allocas. And which LLVM IR constructs produce uninitialized memory. (I guess just alloca and calls to malloc/malloc-equivalents.)

The premise is uninitialized memory is different from memory initialized with poison. With the introduction of a new load opcode this would need to be part of its semantic definition.

Thank you for the suggestion. Iā€™ll add mem2reg with !uninit_is_nondet content to the proposal.

The implications of making uninitialized memory act like poison seem pretty scary to me, and Iā€™d expect such a change to break a lot more code than just bitfield manipulation generated internally to the compiler. I think thereā€™s a lot of other code out there which does similar sorts of things at source level ā€“ with the expectation that math on the result of an uninitialized memory load is going to at least give some non-deterministic result.

E.g. look at something like https://github.com/bminor/musl/blob/master/src/string/strlen.c ā€“ it reads the string word-by-word until it sees one with a zero byte. If thereā€™s uninitialized data after the end-of-string zero, no problem under the undef-bits semantics. But if the entire load turns to poison, thatā€™ll be totally busted.

With undef semantics, multiple loads of the same address results in the same nondeterministic value.

Are you sure about that? It doesnā€™t sound right.

For example the following xor_a function can be optimized into a single return statement of zero.

It can be optimized to zero without that guarantee, because xor undef, <anything> is undef, and undef can be correctly replaced by any arbitrary value ā€“ such as zero.

There are two separable questions here; what LLVM IR semantics we want here, and what clang should generate. If we want, we can make clang tag all generated loads !uninit_is_nondet, instead of just bitfields. (Or add a flag to do that.)

Proposal is fixed. Thanks.

@efriedma-quic you are correct that alloca and builtin functions of the malloc family create uninitialized memory. Functions decorated with the allockind attribute can create it as well.

We could. But, Iā€™m not seeing why weā€™d want to ā€“ is there good enough motivation for introducing new load semantics treating uninitialized data as poison? My baseline assumption is that we donā€™t actually need (or want) those semantics at all.

@jyknight The motivation is to improve overall correctness. There are a significant number of optimizations that are incorrect due to undef (see alive2 run). Based on this we believe it would be advantageous to remove undef in favor of poison.

Given the goal of removal of undef in favor of poison, do you have any ideas for an alternative solution I can investigate?

I think this code is anyway UB since the load might go out-of-bounds. Imagine I have an alloca of size 1 that stores a 0 byte. Calling strlen on that is perfectly legal, but (assuming suitable alignment) that strlen will do a word-sized load from this alloca, which is UB.

On top of that, doing such a load at word type probably violates the effective type rule (strict aliasing), making it UB in C for yet another reason.

I brought up this concern in Remove undef: move uninitialized memory to poison, and was told that thereā€™d be a flag that can be set on load to add an implicit freeze on all individual bytes, meaning that we donā€™t have a single uninit byte turn the entire load into poison. It seems to me like !uninit_is_nondet is exactly that flag? I am confused that it does not have ā€˜freezeā€™ in the name. It also introduces a new concept to the IR, ā€œuninitā€, which is not something LLVM currently has (it has undef and poison, both of which are different kinds of uninit).

Why is it necessary to introduce an entirely new concept to the IR here? How is ā€œuninitā€ defined and how does it interact with all the other opcodes? For instance, if I do a (non-freezing) load of uninit, and then store that back, is that memory still uninit? Presumably it has to be because this load-store roundtrip could be optimized away. But this means that SSA values can now be ā€œuninitā€ besides being poison/undef, and we need to define how uninit propagates through arithmetic operations and so on. We already have enough issues with undef and poison, I donā€™t think we want more of that. IMO this proposal would be much improved by not adding a 3rd, new concept to the mix. :slight_smile: Instead it can build on the existing concepts of undef, poison, and freeze, making it much easier to explain and understand what happens.

I would suggest to make the semantics: each byte is loaded separately as if it was a regular i8 load, then freeze is called on them, and then the bytes are put together to form a value of the load type. (There are some subtleties around loads of pointer type and loading provenance that might make a proper description of the semantics more complicated, but the LangRef doesnā€™t really talk about those aspects of LLVM semantics currently so seems fine to also omit that here, for now.)

This behaves the same on poison and undef. We can then eventually drop undef entirely, say that new memory is initially poison, and everything should be coherent.

I donā€™t understand this example. What if cond == false and ptr == null? Then the optimization is clearly wrong since we are now dereferencing a null pointer!

Are you assuming a dereferenceable(4) or something like that?

Why are optimizations the concern here? The poison completely broke the bitfield! That is the main problem, right? clang using poison semantics for bitfield operations with the current load (v1) is just an incorrect lowering to LLVM IR ā€“ this is a lowering correctness issue, not an optimization opportunity issue.

uninit is just short for unitialized memory. Right now a load of unitialized yields undef.
Our proposal is to change that to yield poison by default, or a non-deterministic value (aka ā€˜freeze poisonā€™) when the load is tagged with !uninit_is_nondet (lacking a better name).
Thereā€™s no new concept and no freeze of data going on.

Include a freeze on every load is not desirable. It blocks a ton of optimizations. Itā€™s true that !uninit_is_nondet also blocks a few, but fewer, since the semantics is loser than freezing every bit all the time.

ā€œLoad from uninitialized memoryā€ is a new concept. Itā€™s not well-defined, or at least I have not seen it defined. We could of course say that every byte of memory tracks whether it has ever been written to, and ā€œuninitialized memoryā€ is memory that was never written do. But if we do that, then it is incorrect to optimize away

%x = load i32, ptr %ptr // loads poison
store i32 %x, %ptr // overwrites uninit with poison

The program as above marks the affected bytes as initialized, after removing the store memory would remain uninitialized, thus potentially changing program behavior down the road. Basically every byte of memory can now be poison, undef, and uninit, complicating everything.

What is the difference?

To implement the semantics we would like to use in Rust, we probably need a freezing load. (Well, ideally weā€™d have a multi-byte load where poison does not ā€œexpandā€ on load, something like a raw byte type.) We definitely want to perform loads of memory that explicitly had poison written into 1 byte, without losing the contents of the other bytes. I donā€™t know your definition of ā€œuninitialized memoryā€ but if this counts as initialized then that is a problem. (I think it is a problem for C as well but itā€™s hard to say since C does not precisely define this aspect of its semantics. De-initializing previously initialized memory can happen e.g. through padding bytes, or memcpy of uninitialized memory.)

Thatā€™s the intent of the proposal.

Under the proposed semantics, anything thatā€™s well-defined for poisoned memory is also well-defined for uninitialized memory, so removing the store is legal. Inserting a load/store pair like that isnā€™t legal, but thatā€™s less relevant. (Although not completely irrelevant; LICM store promotion currently can do that.)

This proposal is written under the assumption that memory should not contain poison. If unoptimized code stores poison to memory, the whole approach breaks down.

For IR generated by clang, memory should not contain poison. There are no C operators that produce poison, except in cases of undefined behavior, so itā€™s not possible to assign poison to a variable or field. Bitfield loads/stores use !uninit_is_nondet to avoid any assumptions about whether other fields are initialized.

poison only starts showing up once optimizations start moving code around.

I imagine Rust can enforce a similar model; memory operations involving types like the Rust MaybeUninit would use !uninit_is_nondet.

The proposal doesnā€™t currently address this. But I think any operation that copies the padding bytes of a C struct should use !uninit_is_nondet.

Oh I see, so poison is less defined that uninit memory.

I donā€™t quite see how it matters whether the code that stores poison in memory is original IR generated by the frontend, or IR that arises later after some optimizations? The rules are all the same. Every operation that works with memory needs to document what it does with all 3 cases: uninit, poison, undef.

I was so far working under the assumption that these would just all be the same, once undef is gone. So what is the reason for treating uninit as a separate state? This is a serious complication, what is the benefit?

Oh I see. Yeah it is true that the only way UB-free Rust generates poison is via uninit memory.

What exactly does the memcpy intrinsic in LLVM do when parts of the source are ā€œuninitā€?

  • Is the target memory always initialized, but filled with poison for the uninit parts of the source? That would mean well-defined C can have poison in its memory, so I doubt it.
  • Or does it pick some non-deterministic values to fill in the target? That would be a disaster for Rust, where copies are specified to preserve the init state exactly; if we have to turn this into uninit_is_nondet for LLVM that will lose some crucial optimizations.
  • Or does this perfectly copy the uninit state? That would mean that it is impossible to achieve the same effect as memcpy in a manual copy loop ā€“ there exist no load/store pair that can preserve the uninit state. I was hoping this proposal would at least leave the door open to adding a ā€œbyteā€ type in the future that is suited for this purpose, but if memory has a magic ā€œuninitā€ state that cannot be represented in SSA values, then that cannot work.

The answer to this question also affects the legality of converting between load/store pairs and memcpy.

I donā€™t mean that frontends are directly forbidden to store poison to memory. I just mean that since !uninit_is_nondet doesnā€™t freeze poisoned memory, memory that might be loaded via a !uninit_is_nondet canā€™t contain poison. So frontends need to ensure that poison is never stored to those locations. But in most languages, the only straightforward way to ensure that is to never store poison anywhere.

Optimizations on !uninit_is_nondet loads require inserting fewer freeze operations, compared to a hypothetical !freeze_poisoned_bits load.

I havenā€™t thought this through; the proposal should be updated to address it.

I am looking into this.

Regarding memcpy, it is a raw copy of memory. It cannot be emulated with a loop in current IR semantics indeed.
Only a ā€œbyteā€ type solves that problem. And it also solves all the pointer escapes issues we have.

This is a good point, thanks. I hadnā€™t thought about it. Eliā€™s reply makes sense.

Letā€™s see the semantics of the different loads (the argument is the value of the raw byte in memory):

  • load uninit ā†’ poison
  • load uninit, !uninit_is_nondet ā†’ nondet
  • load poison ā†’ poison
  • load poison, !uninit_is_nondet ā†’ poison

Removing a store that initializes memory, will transform poison into nondet in one of the cases. Thatā€™s a valid refinement, so we are good.
Example:

%x = load i32, ptr %ptr // loads poison
store i32 %x, %ptr      // overwrites uninit with poison
%x = load i32, ptr %ptr, !uninit_is_nondef // loads poison; removing the store makes it return nondet

I think generally this is something we should do. I really think we should take another look at the name.

@nikic You might want to chime in?

What sorts of optimizations would we miss if instead of moving toward
load uninit -> poison
load uninit, !uninit_is_nondet -> nondet
but rather just
load uninit -> nondet
?