[RFC] Load Instruction: Uninitialized Memory Semantics

RalfJung · January 10, 2023, 8:16am

I think this code is anyway UB since the load might go out-of-bounds. Imagine I have an alloca of size 1 that stores a 0 byte. Calling strlen on that is perfectly legal, but (assuming suitable alignment) that strlen will do a word-sized load from this alloca, which is UB.

On top of that, doing such a load at word type probably violates the effective type rule (strict aliasing), making it UB in C for yet another reason.

I brought up this concern in Remove undef: move uninitialized memory to poison, and was told that there’d be a flag that can be set on load to add an implicit freeze on all individual bytes, meaning that we don’t have a single uninit byte turn the entire load into poison. It seems to me like !uninit_is_nondet is exactly that flag? I am confused that it does not have ‘freeze’ in the name. It also introduces a new concept to the IR, “uninit”, which is not something LLVM currently has (it has undef and poison, both of which are different kinds of uninit).

Why is it necessary to introduce an entirely new concept to the IR here? How is “uninit” defined and how does it interact with all the other opcodes? For instance, if I do a (non-freezing) load of uninit, and then store that back, is that memory still uninit? Presumably it has to be because this load-store roundtrip could be optimized away. But this means that SSA values can now be “uninit” besides being poison/undef, and we need to define how uninit propagates through arithmetic operations and so on. We already have enough issues with undef and poison, I don’t think we want more of that. IMO this proposal would be much improved by not adding a 3rd, new concept to the mix. Instead it can build on the existing concepts of undef, poison, and freeze, making it much easier to explain and understand what happens.

I would suggest to make the semantics: each byte is loaded separately as if it was a regular i8 load, then freeze is called on them, and then the bytes are put together to form a value of the load type. (There are some subtleties around loads of pointer type and loading provenance that might make a proper description of the semantics more complicated, but the LangRef doesn’t really talk about those aspects of LLVM semantics currently so seems fine to also omit that here, for now.)

This behaves the same on poison and undef. We can then eventually drop undef entirely, say that new memory is initially poison, and everything should be coherent.

Topic		Replies	Views
Should data races become poison instead of undef? IR & Optimizations	7	416	March 15, 2022
Semantics of udef values in PHI instructions LLVM Dev List Archives	5	87	August 16, 2021
Undef/poison semantics LLVM Dev List Archives	0	104	February 20, 2018
beneficial optimization of undef examples needed LLVM Dev List Archives	7	77	June 19, 2017
Poison values LLVM Dev List Archives	1	106	December 2, 2016

[RFC] Load Instruction: Uninitialized Memory Semantics

Related Topics