Global removal pass - potential for improvement?

Karl_Rehm · January 28, 2020, 5:08pm

Hey everyone,
I was looking into how the global optimization pass fares against things like what’s reported in https://bugs.llvm.org/show_bug.cgi?id=44676
Looking at this, I think it would be pretty trivial to optimize that down given that there are already threading assumptions made: https://godbolt.org/z/u6ZqoB

Is this something I can look into? Another thing is that currently all external calls break this optimization, including calls to intrinsics that probably shouldn’t: https://godbolt.org/z/pK7Cew
Maybe add some sort of way (i.e. an attribute) to mark that a call can be optimized through in this way?

LebedevRI · January 28, 2020, 5:22pm

Hey everyone,
I was looking into how the global optimization pass fares against things like what's reported in 44676 – Missed optimization: Reverted modification of a global/thread-local that need not be visible to any external calls not optimized out
Looking at this, I think it would be pretty trivial to optimize that down given that there are already threading assumptions made: Compiler Explorer
Is this something I can look into?

Another thing is that currently *all* external calls break this optimization, including calls to intrinsics that probably shouldn't: Compiler Explorer

I think during load propagation, there is a legality check "here's a
load, and here's a store.
Is there anything in between that may have clobbered that memory location?".
For calls, there are some attributes that are helpful here:
https://llvm.org/docs/LangRef.html#function-attributes
So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can
be annotated with readonly attribute,
thus signalling that it won't clobber that memory location?

Maybe add some sort of way (i.e. an attribute) to mark that a call can be optimized through in this way?

Roman

Karl_Rehm · January 28, 2020, 5:38pm

Looks like I’m dumb, seems like this only happens with the one intrinsic I chose which happens to not have the attribute (and probably can’t have it because of its behavior) :^)

_Johannes_Doerfert · January 28, 2020, 5:44pm

Hi Karl, Roman,

> I was looking into how the global optimization pass fares against
> things like what's reported in
> 44676 – Missed optimization: Reverted modification of a global/thread-local that need not be visible to any external calls not optimized out

I need to take a closer look but I would have expected BasicAA to be
able to determine that `do_log` and `R` cannot alias. In the -Os version
(lower right here Compiler Explorer), the write to `R`
clobbers the read from `do_log` which prevents us from removing the
load/store pair. My reasoning would have been that we know the size of
`do_log` to be less than the size accessed via `R`. What exactly goes
wrong or if my logic is flawed needs to be examined. I would start
looking at the debug generated by the code parts touched here:
⚙ D66157 [BasicAA] Use dereferenceability to reason about aliasing

> Looking at this, I think it would be pretty trivial to optimize that
> down given that there are already threading assumptions made:
> Compiler Explorer

Optimizing more aggressively based on forward process guarantees will
get us in more trouble than we are already in. I don't have the link
handy but as far as I remember the proposed solution was to have a
forward process guarantee function attribute. I would recommend we look
into that first before we start more aggressive optimizations which will
cause problems for a lot of (non C/C++) folks.

> Is this something I can look into?

Sure

> Another thing is that currently *all* external calls break this
> optimization, including calls to intrinsics that probably shouldn't:
> Compiler Explorer
I think during load propagation, there is a legality check "here's a
load, and here's a store.
Is there anything in between that may have clobbered that memory location?".

Right now we only have `__attribute__((pure/const))` but we want to
expose all LLVM-IR attributes to the user soon [0] which will allow way
more fine-grained control. Intrinsics are a different story again.

For calls, there are some attributes that are helpful here:
LLVM Language Reference Manual — LLVM 18.0.0git documentation
So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can
be annotated with readonly attribute,
thus signalling that it won't clobber that memory location?

While target specific intrinsics are a bit more complicated we see the
problem often with generic intrinsic already. We proposed the other day
[1] to change the default semantics of non-target specific intrinsics
such that you have to opt-in for certain effects.

For the above example you want `llvm.x86.flags.write` to be `writeonly` and
`inaccesiblememonly`. Also `nosync`, `willreturn`, ...

Cheers,
Johannes

[0] https://www.youtube.com/watch?v=elmio6AoyK0
[1] [llvm-dev] `opt-out` attributes for intrinsics

Karl_Rehm · January 28, 2020, 6:06pm

I need to take a closer look but I would have expected BasicAA to be
able to determine that do_log and R cannot alias. In the -Os version
(lower right here https://gcc.godbolt.org/z/KLaeiH), the write to R
clobbers the read from do_log which prevents us from removing the
load/store pair. My reasoning would have been that we know the size of
do_log to be less than the size accessed via R. What exactly goes
wrong or if my logic is flawed needs to be examined. I would start
looking at the debug generated by the code parts touched here:

https://reviews.llvm.org/D66157

Well, if I recall correctly there’s actually a problem with AA being too aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw
There’s a bug report about this somewhere, don’t remember where though.

_Johannes_Doerfert · January 28, 2020, 6:23pm

>
> I need to take a closer look but I would have expected BasicAA to be
> able to determine that `do_log` and `R` cannot alias. In the -Os version
> (lower right here Compiler Explorer), the write to `R`
> clobbers the read from `do_log` which prevents us from removing the
> load/store pair. My reasoning would have been that we know the size of
> `do_log` to be less than the size accessed via `R`. What exactly goes
> wrong or if my logic is flawed needs to be examined. I would start
> looking at the debug generated by the code parts touched here:
> ⚙ D66157 [BasicAA] Use dereferenceability to reason about aliasing
>

Well, if I recall correctly there's actually a problem with AA being *too*
aggressive for this reason: Compiler Explorer
There's a bug report about this somewhere, don't remember where though.

That seems to be related purely to TBAA: Compiler Explorer

What I described should fire if you see the allocation. Alternatively,
we can add a new "max object size" attribute [0] which gives a size
upper bound for the pointed to memory. In either case, if you know the
object pointed to by A is maximal N bytes of size, and the object
pointed to by B is at most M bytes of size, and N < M, A and B do not
alias. As of now, we use `dereferenceable` to determine the minimum size
of the underlying object* and require to see the definition (I think) to
obtain a maximum size.

* It's actually a lower bound on the minimum size based on the pointer
but that is conservatively fine too.

[0] https://www.youtube.com/watch?v=HVvvCSSLiTw

Karl_Rehm · January 28, 2020, 6:42pm

I’m thinking about this a bit more, and I don’t know if it’s something that can be used in a generic backend like LLVM. Should accessing a smaller object from a bigger pointer be considered UB?

In contrast, sing the assumption that bigger types can not be dereferenced from smaller types is a good idea. If this is already added to BasicAA, I wonder why it does not optimize this out?

_Johannes_Doerfert · January 28, 2020, 6:51pm

It’s not about the pointer per se but the object it points to. If we have 2 objects of which we know the minimal/maximal size respectively we can sometimes conclude the objects must be different. Accesses to different objects do not alias.

We already use that logic but we don’t have the upper bound size as an attribute. Hence, it only applies if we know the exact size, basically is we see the definition. I might have explained it in the YouTube link I included before.

I don’t know what you mean by llvm being a generic backend but what I describe above and in the tutorial should make it in as soon as the code is cleaned up. It’s on my GitHub already.

Karl_Rehm · January 28, 2020, 6:54pm

Ok, thanks for the clarification. By generic I meant language-agnostic; I was thinking that if there are languages that allow for accesses to smaller objects from pointers to bigger objects this might be an issue. Then again, I’m new to this so I don’t know exactly how it works. Thanks again!

_Johannes_Doerfert · January 28, 2020, 6:59pm

I see. TBH, I think the object assumption is backed into everything. You can have a pool allocation in your language to make it a single object if you need to. If you use malloc, globals and stack allocations this kind of reasoning about objects happens in llvm already.

Topic		Replies	Views
optimizing access to global LLVM Dev List Archives	1	58	May 18, 2011
Optimization pass questions LLVM Dev List Archives	5	153	August 13, 2010
Optimization passes organization and tradeoffs LLVM Dev List Archives	10	114	May 22, 2008
optimization passes through c++ LLVM Dev List Archives	1	89	May 12, 2009
global question LLVM Dev List Archives	11	63	March 8, 2009

Global removal pass - potential for improvement?

Related topics