Hi Karl, Roman,
> I was looking into how the global optimization pass fares against
> things like what's reported in
> 44676 – Missed optimization: Reverted modification of a global/thread-local that need not be visible to any external calls not optimized out
I need to take a closer look but I would have expected BasicAA to be
able to determine that `do_log` and `R` cannot alias. In the -Os version
(lower right here Compiler Explorer), the write to `R`
clobbers the read from `do_log` which prevents us from removing the
load/store pair. My reasoning would have been that we know the size of
`do_log` to be less than the size accessed via `R`. What exactly goes
wrong or if my logic is flawed needs to be examined. I would start
looking at the debug generated by the code parts touched here:
⚙ D66157 [BasicAA] Use dereferenceability to reason about aliasing
> Looking at this, I think it would be pretty trivial to optimize that
> down given that there are already threading assumptions made:
> Compiler Explorer
Optimizing more aggressively based on forward process guarantees will
get us in more trouble than we are already in. I don't have the link
handy but as far as I remember the proposed solution was to have a
forward process guarantee function attribute. I would recommend we look
into that first before we start more aggressive optimizations which will
cause problems for a lot of (non C/C++) folks.
> Is this something I can look into?
Sure 
> Another thing is that currently *all* external calls break this
> optimization, including calls to intrinsics that probably shouldn't:
> Compiler Explorer
I think during load propagation, there is a legality check "here's a
load, and here's a store.
Is there anything in between that may have clobbered that memory location?".
Right now we only have `__attribute__((pure/const))` but we want to
expose all LLVM-IR attributes to the user soon [0] which will allow way
more fine-grained control. Intrinsics are a different story again.
For calls, there are some attributes that are helpful here:
LLVM Language Reference Manual — LLVM 18.0.0git documentation
So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can
be annotated with readonly attribute,
thus signalling that it won't clobber that memory location?
While target specific intrinsics are a bit more complicated we see the
problem often with generic intrinsic already. We proposed the other day
[1] to change the default semantics of non-target specific intrinsics
such that you have to opt-in for certain effects.
For the above example you want `llvm.x86.flags.write` to be `writeonly` and
`inaccesiblememonly`. Also `nosync`, `willreturn`, ...
Cheers,
Johannes
[0] https://www.youtube.com/watch?v=elmio6AoyK0
[1] [llvm-dev] `opt-out` attributes for intrinsics