[RFC] LLVM: new `initialized` parameter attribute for improved interprocedural DSE

Haopeng_Liu · March 1, 2024, 4:39pm

1. Motivation
DSE is a function pass and only removes redundant stores intra-procedurally. It cannot cross function boundaries and eliminate inter-procedural dead stores as shown in this example:

class MyType {
 public:
  MyType(int foo, int bar, int baz) : foo_(foo), bar_(bar), baz_(baz) {}

 private:
  int foo_, bar_, baz_;
};

define void @myState()(ptr noalias sret(%class.MyType) %0) {
  store i32 10, ptr %0, align 4, !dbg !11
  %2 = getelementptr inbounds %class.MyType, ptr %0, i64 0, i32 1, !dbg !14
  store i32 20, ptr %2, align 4, !dbg !14
  %3 = getelementptr inbounds %class.MyType, ptr %0, i64 0, i32 2, !dbg !15
  store i32 100, ptr %3, align 4, !dbg !15
  ret void, !dbg !16
}

define void @forFun()() {
  %x = alloca %class.MyType, align 4
  call void @llvm.lifetime.start.p0(i64 12, ptr nonnull %x) #5
  // A redundant memset call as "%x" is fully initialized in myState().
  call void @llvm.memset.p0.i64(ptr align 4 %x, i8 0, i64 12, i1 false)
  call void @myState()(ptr writable sret(%class.MyType) align 4 %x)
  ...
}

It’s highly desirable to solve this issue especially when “-ftrivial-auto-var-init” is set, which emits initialization conservatively at each auto var declaration and relies on the backend optimizers to eliminate redundant initializations as much as possible.

2. Proposal
LangRef draft: https://github.com/llvm/llvm-project/commit/69b09b…

We propose adding a new LLVM attribute, initialized((Lo1,Hi1),(Lo2,Hi2),...), which expresses the notion of memory space (i.e., intervals, in bytes) that the argument pointing to is initialized in the function. Our current dereferenceable(n), writable, or writeonly/readonly/readnone attribute doesn’t quite fit the bill.

2.1 Attribute syntax
The initialized((Lo1,Hi1),(Lo2,Hi2),...) attribute is a ParamAttr, allowing it to be attached to each parameter’s attribute list. The attribute value provided in parentheses is a non-empty list of const ranges concatenated with commas: (Lo1,Hi1),(Lo2,Hi2),... This attribute may only be applied to pointer typed parameters.

initialized attribute syntax: initialized((Lo1,Hi1),(Lo2,Hi2),...)
Parameter syntax: <type> [parameter Attrs] [name]
Example: ptr initialized((0,4),(8,16)) %ptr

Note that a). LoN/HiN are 64-bit ints; Negative values are allowed in case a pointer to something partway through the allocation is passed to. b). continuous or overlapping intervals are not allowed, and the ranges will be stored in ascending order. In the motivation example, %0 is initialized with 3 store instructions in 3 ranges [0,4), [4,8), and [8,12). The initialized attribute for %0 would be initialized((0,12)).

2.2 Attribute semantics
initialized((Lo1,Hi1),(Lo2,Hi2),...) implies that the memory (in bytes) pointed by the parameter and each range, [%p+LoN, %p+HiN), are initialized in the function and are not read before initialization within the function.

Initializations are non-volatile writes (StoreInst or CallInst to library APIs like memset) to a memory location pointed to by the parameter, [%p+LoN, %p+HiN), that satisfy: a). Dominate any other memory access to the pointer in the function; b). Post-dominate the function entry.

Note that this attribute does not apply to the unwind edge. In the motivating example, assuming @myState unwinds, DSE can still apply the initialized attribute and eliminate the redundant store in @forFun only if the %0 argument is marked dead_on_unwind. This won’t affect the optimization for -ftrivial-auto-var-init as we are typically trying to eliminate dead stores to an alloca, and allocas are essentially implicitly dead_on_unwind.

This attribute implies that the function initializes and does not read before initialization through this pointer argument, even though it may read the memory before initialization that the pointer points to, such as through other arguments.

The writable or dereferenceable attribute doesn’t imply the initialized attribute. Note that initialized does not imply writeonly since cases that read from the pointer after write, can be initialized but not writeonly.

2.3 Attribute representation
To properly represent const range list attributes, we propose adding a new kind of attribute, ConstRangeListAttribute.

2.4 Follow-ups
With this attribute, we plan to:

Infer in FunctionAttrs.cpp: https://github.com/llvm/llvm-project/commit/0bd68a…
Apply it in DSE: https://github.com/llvm/llvm-project/commit/b3866d…

@jvoung @aeubanks @rnk @alina @mingmingl-llvm

vtjnash · March 1, 2024, 5:16pm

How would this compare to inserting a second llvm.lifetime.start.p0 to express the information that no previous write will be read back from it during the call? Is the proposal just making this more efficient in the IR? I am asking this as a front-end developer, to be aware of how the intended usage should be applied in other cases.

Haopeng_Liu:

define void @forFun()() {
  %x = alloca %class.MyType, align 4
  call void @llvm.lifetime.start.p0(i64 12, ptr nonnull %x) #5
  // A redundant memset call as "%x" is fully initialized in myState().
  call void @llvm.memset.p0.i64(ptr align 4 %x, i8 0, i64 12, i1 false)
  // ptr is over-written in `myState`, so notify LLVM that we won't read it and will overwrite it subsequently
  call void @llvm.lifetime.end.p0(i64 12, ptr nonnull %x) #5
  call void @llvm.lifetime.start.p0(i64 12, ptr nonnull %x) #5
  call void @myState()(ptr writable sret(%class.MyType) align 4 %x)
  ...

mehdi_amini · March 2, 2024, 5:46am

This only seems to represent constant ranges?
So we can’t model API such as:

void foo(int *p, int from, int to); // initialize &p[from]->&p[to]

jdoerfert · March 4, 2024, 4:20am

Like with generic __builtin_assume encoding, I would very much recommend using the IR as a way to encode arbitrary properties.

Your

ptr initialized((0,4),(8,16)) %ptr

would become (as an example):

define void @__ptr_initialized_helper(ptr %p) { 
   %v4 = call i32 @__opaque4()
   %v8 = call i32 @__opaque8()
   store i32 %v4, ptr %p
   store i32 %v8, ptr (getelementptr ptr %p, i32 8) 
   ret void
}
ptr initialized_fn(ptr @__ptr_initialized_helper) %ptr

And @mehdi_amini’s example (in C for simplicity):

void __foo_initialized_helper(int *p, int from, int to) {
  for (int i = from; i < to; ++i) {
    p[i] = __opaque(); 
  }
}
void foo(int *p, int from, int to); // attribute((initialized_fn(__foo_initialized_helper)))

aeubanks · March 4, 2024, 5:06pm

vtjnash:

How would this compare to inserting a second llvm.lifetime.start.p0 to express the information that no previous write will be read back from it during the call? Is the proposal just making this more efficient in the IR? I am asking this as a front-end developer, to be aware of how the intended usage should be applied in other cases.
Haopeng_Liu:
define void @forFun()() {
  %x = alloca %class.MyType, align 4
  call void @llvm.lifetime.start.p0(i64 12, ptr nonnull %x) #5
  // A redundant memset call as "%x" is fully initialized in myState().
  call void @llvm.memset.p0.i64(ptr align 4 %x, i8 0, i64 12, i1 false)
  // ptr is over-written in `myState`, so notify LLVM that we won't read it and will overwrite it subsequently
  call void @llvm.lifetime.end.p0(i64 12, ptr nonnull %x) #5
  call void @llvm.lifetime.start.p0(i64 12, ptr nonnull %x) #5
  call void @myState()(ptr writable sret(%class.MyType) align 4 %x)
  ...

LangRef: After ‘llvm.lifetime.start’, the stack object that ptr points is marked as alive and has an uninitialized value.

In your case the memset can always be eliminated regardless of myState since after the second lifetime begin, the alloca is a essentially completely new object and the lifetime begin intrinsic implicitly resets the memory to uninitialized.

This is supposed to model the “same object” being redundantly initialized.

aeubanks · March 4, 2024, 5:45pm

So

define void @__ptr_initialized_helper(ptr %p) { 
   %v4 = call i32 @__opaque4()
   %v8 = call i32 @__opaque8()
   store i32 %v4, ptr %p
   store i32 %v8, ptr (getelementptr ptr %p, i32 8) 
   ret void
}

is essentially a “summary” of what the function does to the pointer argument? And to infer this some pass would analyze myState and generate __ptr_initialized_helper?

Having this information in a separate function makes it very unwieldy to use. For example, to actually use this in DSE, you basically have to rerun an analysis over __ptr_initialized_helper in a separate context from the original myState. Not to mention that we shouldn’t be looking in other functions during a function pass.

All in all, I think while theoretically this approach could work with a ton of work (like the proposed assume improvements), it seems overkill and hard to use for this use case.

I’d much rather go with the much more contained solution in the RFC that has had measurable impact. If we ever get outlined assumes working well, we can always autoupgrade a bunch of these types of attributes to use that, but I’m doubtful we’ll see that anytime soon.

vtjnash · March 4, 2024, 6:12pm

Can you explain what you meant by “regardless of myState” here? The frontend/middle-end was only legally allowed to put the llvm.lifetime.start there because it had the information that those exact bytes would be initialized by myState and thus can permit redundant stores to those bytes to be eliminated, if they are not observed elsewhere first. That makes this encoding dependent on myState in the same way as the initialized attribute, since both need to be inferred from examining the behavior of myState, and thus I think they should encode the equivalent information, if I am understanding correctly what property you are proposing representing.

aeubanks · March 4, 2024, 6:24pm

The two pairs of lifetime begins/ends indicate that the alloca being memset and the alloca getting passed to myState are unrelated. You could think of them as being two separate allocas. The memset initializes the alloca, then the lifetime end/begin throw away the memset by uninitializing the alloca, then myState receives an uninitialized alloca. We could remove the memset regardless of what myState does since it’s essentially initializing an unrelated alloca according to the lifetime intrinsics.

So the frontend wouldn’t be able to put the lifetime intrinsics on the alloca if the memset and myState were called on the same variable.

Also, we’d really like this to be inferrable from optimizations on IR as opposed to making frontends figure out which functions initialize parameters.

vtjnash · March 4, 2024, 6:46pm

For sure. I am asking if this existing lifetime end/start pair is exactly equivalent to the new proposed annotation, and proposing that it is the same information. But I think I see what you are saying now: that it is much less efficient to need to update all of the callers to have this lifetime info as an instruction, rather than being able to update the attributes of the callee with “initialized” for this common case, and then be able to have AA use that info quickly.

The langref states that they are the same alloca, if it is possible to detect a difference:

After llvm.lifetime.end is called, ‘llvm.lifetime.start’ on the stack object can be called again. The second ‘llvm.lifetime.start’ call marks the object as alive, but it does not change the address of the object.

The same could be said of the initialize attribute: if the behavior does not match the annotations, then it is not well-defined what will happen.

Why not? If the front end knows the old value cannot be read ever again in a data-race-free program, because it will be overwritten first, it could annotate that.

efriedma-quic · March 4, 2024, 7:35pm

The semantics of lifetime intrinsics are ambiguous; see incorrect transformation around icmp: unclear semantics of "lifetime" intrinsics leads to miscompilation · Issue #45725 · llvm/llvm-project · GitHub . We could invent something simpler that just means “store poison to the given address” if we wanted it here… but if we plan to infer the property based on the body of the function, an attribute is more convenient.

I’m not sure “this attribute does not apply to the unwind edge” actually gives you the semantics you want. Suppose you have a pointer that’s “initialized” and “dead_on_unwind”. On the non-unwind path, initialization happens, and on the unwind path, the value is dead after unwinding. But the value could still be read before unwinding, so this set of attributes isn’t sufficient to actually allow DSE.

Really, the semantics you want is just “on entry to the function, the marked intervals are overwritten with poison”. That way, you don’t need to explicitly specify what stores happen when inside the function, or the interaction with unwinding.

aeubanks · March 4, 2024, 9:57pm

I think the intention was that the initializing function doesn’t read the pointer before writing to it regardless of unwinding or not, but the memory may not actually be written to when unwinding. The wording should be fixed.

With these semantics we can’t infer initialized if a random call in the initializing function may unwind. e.g. we can’t infer initialized below because we don’t know if %p will be read on the unwind edge by a caller.

define void @f(ptr %p) {
  call void @maythrow()
  store i32 0, ptr %p
  ret void
}

This may come up if you have a helper function to initialize some subset of a larger object.

With the RFC’s proposed semantics, we can still infer initialized.

Ultimately, we’re trying to perform interprocedural DSE, so the caller of the initializing function needs to know if the object is an alloca or dead_on_unwind to handle unwind paths. If %p above were marked dead_on_unwind we could infer initialized, but it’s hard for an initializing function to know if a pointer parameter is dead_on_unwind or not, that’s mostly dependent on the caller who may see that an object is an alloca.

aeubanks · March 4, 2024, 10:13pm

I think what you’re saying is possible in some cases, but at that point the frontend is doing a lot of inter-procedural analysis that’s better meant for the middle-end (I think this is basically your first comment in the post). There are always cases where the frontend can’t know that some function initializes its parameter.

efriedma-quic · March 4, 2024, 10:15pm

Okay, makes sense.

efriedma-quic · March 4, 2024, 10:23pm

Actually thinking about it a bit more, both forms of semantics might be useful in different contexts. Consider the following example:

define void @foo(ptr %x) {
  store i32 0, ptr %x
  call void @bar(ptr initialized((0, 4) %x)
  ret void
}

Under the “overwritten with poison” semantics, the store is dead. Under the “memory may not be written to” semantics, it isn’t.

aeubanks · March 4, 2024, 10:32pm

Yes, there are different pros and cons to various semantics around unwinding. It’s mainly a “how much can you infer the attribute” vs “how much can you DSE given the attribute” tradeoff. We chose this one because it allows for maximum inference, while being composable with dead_on_unwind (and allocas are common and implicitly dead_on_unwind) on caller side to allow more DSE. Other variations, like your proposal, make it harder to infer initialized and it’s hard to work around that in the initializing function.

efriedma-quic · March 4, 2024, 10:42pm

We could implement both. (Of course, we don’t need to do that in the initial implementation.)

jdoerfert · March 5, 2024, 5:11am

Yes to the former, and this is an example. For the latter, it could also be the frontend/user. At the end of the day, I’m trying to argue we should encode the information in base IR rather than in some specialized attribute that encodes only a special case anyway.

Your argument boils down to:
Let’s keep the passes “simple”(/the same) and make the (universe of) annotations more complex (e.g., elaborate, wordy, or some other form of complexity).
I’m trying to suggest that this doesn’t work, long term but the opposite has better chances to age well.

We used to add more and more specific attributes while we always hard-code trade-offs.
New use cases are then either going to be expensive, or impossible.

If we were open to adjusting our model, we could approach it from a different angle, unlimited encodings and the user decides how much they want to invest in interpreting the information.
Allowing more complex encodings is then simple. And we could have cache (passes) for the information to keep the cost down. Only if you’d add information (=update the attribute), you’d need to invalidate the cache.

Wrt. “function passes”, I am not aware of anyone actually running them in parallel (or planning to). If so, they could just as well be SCC passes which are allowed to look at the encoding functions.

All that said, we can make the encoding smarter, I was just suggesting some easy syntax.

aeubanks · March 5, 2024, 5:24pm

I agree that your approach has benefits, but it also has downsides and hasn’t been proven to work in practice. I wouldn’t want to sign up to try prototyping this, blocking DSE improvements on it. As I said, if somebody ever does get around to adding this sort of capability to LLVM and it’s proven to be useful, we can autoupgrade many existing attributes to use this, including the proposed attribute here.

jdoerfert · March 6, 2024, 4:26pm

I will see if I can find someone to do this. No promises.

Topic		Replies	Views
Interprocedural DSE for -ftrivial-auto-var-init LLVM Dev List Archives	10	117	May 14, 2019
Dead store elimination in the backend for -ftrivial-auto-var-init LLVM Dev List Archives	8	92	August 26, 2019
Dead Store Elimination LLVM Dev List Archives	3	90	September 11, 2012
Why doesn't this LLVM IR code optimize away? Beginners llvm , llvm-ir	2	208	November 13, 2023
Zero init of structs and optimizations LLVM Dev List Archives	0	133	September 2, 2021

[RFC] LLVM: new `initialized` parameter attribute for improved interprocedural DSE

Related topics