Remove undef: move uninitialized memory to poison

Hello,

I am a Ph.D. student at the University of New Mexico and am very interested in contributing to this project. I have made direct contact with @nlopes. Some key points from the discussion are:

The first step of the proposal may be studying the required components and developing a viable implementation plan.

Main implementation issues (scoped to GSoC):

  • Bit-fields
  • Scalar Replacement Of Aggregates (SROA): replace phi(undef) with phi(poison) for conditional stores
  • Upgrade path: it must be possible to auto-upgrade old IR (which is not versioned)

My understanding of undef and poison has been developed by reading:

I am currently reading Reconciling high-level optimizations and low-level code in LLVM.

Are there any other references on undef and poison that would be beneficial to review?

Thanks!
John

3 Likes

I think those references are a very good start. I would be impressed if you can parse all of them :slight_smile:

For curiosity, maybe have a look at this post about the Evolution of undef and poison over time in LLVM’s code base.

Thanks for the additional reference.

The GSoC site says projects can have the standard 12 week timeline or an extended 22 week timeline (same total hours). For this proposal do you prefer a 12 or 22 week timeline?

I don’t have a strong preference either way, as long as it’s the large project size. Working hours per week are negotiable. We understand you may have other commitments.

Thanks for the feedback. I’ll go ahead and produce the schedule around a 12 week period of performance.

Question concerning bitfields and inplace upgrades:

In https://github.com/llvm/llvm-project/issues/52930 it mentions that a possible way to resolve introduction of poison on first assignment of a bit-field is to freeze the loaded value and then mask, set, mask, and store. Ideally there would be no further emission of freeze instructions on subsequent loads or stores as this would potentially mask poison(s) induced within a user’s code. Issue #52930 also states that poison being set to other bit-fields (within same struct) would result in poison spilling to other fields.

Thinking about this from the standpoint of auto-inplace upgrades of LLVM bytecode I am not sure what would make a good trigger for the emission of freeze. If struct allocation and first assignment can be detected then freeze could be inserted. Otherwise there are cases were a decisive conclusion cannot be obtained. For example, in the compilation of a library function that takes a struct containing bit-field(s) as pass-by-reference. It could be that the library function is inducing the first store into a bit-field object or it could have already happened in the caller’s code.

Is there a straightforward way to detect first assignment of a bit-field? If not, is applying freeze on every bit-field store harmful?

If I am missing something please feel free to point me to a reference (paper, code, etc.).

Thanks in advance!
John

Upon re-reading of issue #52930 and associated Phabricator discussion it looks like auto inplace bit-field loads would need a freeze instruction inserted on every load. I.E. there is no way to identify first load from subsequent loads, which would limit optimization on old bitcode.

In general we cannot identify if a load is the first or not in clang. But we can try, as it’s not that infrequent to have multiple consecutive accesses to a bit-field. Note that we don’t need a definitive answer, just an approximation of whether an access to the bitfield has happened already or not.
Surely, using too many freezes is a bad thing. I’m not too concerned as I don’t think bit-fields are common or perf sensitive. But we should optimize way freezes as much as possible for sure.

I haven’t studied carefully the pros and cons of each solution in terms of development time, semantics & upgrade path, so I think that should be one of the first tasks of the project.

I have a question/concern about the ramifications of this change for Rust. In Rust, we consider the following code to be well-defined:

use std::mem::MaybeUninit;

fn main() { unsafe {
    let mut x: MaybeUninit<i32> = MaybeUninit::uninit();
    x.as_mut_ptr().cast::<u8>().write(0);
    let y = (&x as *const MaybeUninit<i32>).read(); // in LLVM, this is a load at type i32
    let val = y.as_ptr().cast::<u8>().read();
    assert_eq!(val, 0);
} }

This translates to LLVM IR that

  1. does alloca for an i32
  2. writes 0 to the first byte of that allloca
  3. then leads the entire alloca at type i32
  4. and truncates that i32 to an i8
  5. and finally asserts that that i8 is 0

It is my understanding that under current LLVM semantics (where alloca is filled with undef), this program is well-defined. However, if the initial value inside an alloca becomes poison, then (because poison “infects” the entire i32 value), step 4 will produce a poison of type i8 and step 5 will be UB.

So, looks like this is a breaking change for Rust. Is there any way that can be avoided?

The proposal is not to break clients. We will keep existing API users correct, but potentially with worse performance.
I guess that for safe Rust you’ll be able to migrate to a poison-uninit world. For unsafe Rust, the generated IR will be migrated automatically to use a bunch of freezes (or freezing loads).

Uninit values are the last big remaining user of undef, hence it’s critical that we move towards changing the uninit value.

We have one codegen for both safe+unsafe Rust; given things like inlining there is not really a meaningful distinction between them this late in the compilation pipeline.

I understand and I am fully in favor. :slight_smile: Uninit memory in Rust is much closer to posion than to undef anyway. However, we have types like MaybeUninit<i32> that need to be fully ABI-compatible with a 32bit integer in C while also preserving initialization bytewise. (We tried defining it as an aggregate with a 32bit-field and the perf regression from that is way too big. This gets even worse when you consider MaybeUninit of a SIMD type, it absolutely needs to use SIMD registers.)

So it seems to me that this transition requires an LLVM type that can hold “bytewise poison data” – either by changing iN to track poison bytewise (or even bitwise), or by introducing a new type that does that, or some other way I cannot think of right now. Currently it seems like a feature is being removed from LLVM, and it is a feature that Rust has come to rely on quite a bit.

I suppose we could make all our MaybeUninit loads be freezing loads (they would have to freeze before exploding the poison to the whole value, so this cannot be expressed via load+freeze in 2 separate instructions), but that doesn’t sound great for performance either. People usually use these unsafe things to get better performance, after all. Is that essentially what the automatic migration you mentioned would do?

Yes!

Do you have some Rust benchmarks we could use to test different strategies and to make sure we don’t regress on those?
Thanks for bringing up this issue nevertheless!

2 Likes

Ah, good question. As a macrobenchmark we do have the Rust compiler itself, which has some pretty detailed perf tracking. So if you have an LLVM patch, ideally one that applies against our LLVM fork, then we can ask our perf bot to compare that against regular rustc. @nikic can hopefully help with that, they are often doing the LLVM bumps in rustc. :slight_smile:

However, I am not sure how much that will actually hit critical codepaths with MaybeUninit. And I am not aware of a good set of microbenchmarks, sadly.

@RalfJung we are considering three load instruction forms, one of which already exists. We would like to know if they meet the needs of Rust? Which ones in particular?

1. Load with Automatic Freeze (replacing today’s undef):

load <ty>, <ty>* <pointer> 

In the semantics each byte has an initialization bit. If false, the loaded byte is non-deterministic, but a fixed value via a freeze.

2. Load without Automatic Freeze (Uninitialized is Poison):

load <ty>, <ty>* <pointer>, !uninit_is_poison 

Like the previous model, but without the automatic conditional freeze per byte. If any byte being loaded is poison the whole return value would be poison.

3. Load with noundef:

load <ty>, <ty>* <pointer>, !noundef 

We currently have this but mentioning for completeness. Through deduction we know that the value is always concretely defined (or so we are hoping). If initialization does not occur the load results in UB.

Comparison of semantics:

Can be duplicated? Can be hoisted?
1. Load with Automatic Freeze No Yes
2. Load without Automatic Freeze Yes Yes
3. Load with noundef Yes No**

** 3. Load with noundef can be downgraded into 2. Load without Automatic Freeze for hoisting.

All of these have the advantage of removing undef!

Currently Clang generally emits 3. Load with noundef and we are working toward 1. Load with Automatic Freeze for bit-fields.

“Meet the needs” in the sense of “we can have a sound compilation scheme” or “reflects the semantics we want without loss”? :wink:

Of these three, we need option 1 to be available in order to express our semantics at all in LLVM IR. Making every load freeze would clearly be a correct compilation strategy for all our types. Some of our types could have noundef annotations (which of course can be weakened as LLVM sees fit, e.g. to perform hoisting). The discussion for which types can or cannot have noundef is still ongoing on our side, but that should not affect you much I think.

However, this still loses some precision compared to Rust semantics: we have a type MaybeUninit<T> that can hold arbitrary data (including undef/poison) of the same size as T, e.g. MaybeUninit<i64> is a 64bit-value that can contain undef/poison. As that value gets copied around, which byte is and is not initiailized is preserved perfectly.
We compile this type to i64 in LLVM, since we need the performance of passing this around in registers, so there is no other type we could use. (If there was a “byte” type, we’d use it, but alas.) We could of course use a freezing load to load such a type, but then the uninit parts of the type will unnecessarily be considered fixed and initialized by LLVM. I don’t know if this is actually a problem, but it does represent some irreversible information loss.

Yeah I have seen that and I don’t think it is correct for C/C++ but that is a different discussion.