Smart Pointer Lifetime Optimizations

zoecarver · June 5, 2020, 6:45pm

Hello all,

I’m planning to do some work to add lifetime optimization passes for smart pointers and reference-counted objects. I’ll use this email as a sort of proposal for what I hope to do.

Scope

As I’m developing the pass, I’m trying to keep it general and create utilities that could work across multiple smart pointers. But, right now, I’m focussing on unique_ptr and applying specific ownership optimizations to unique_ptr only.

unique_ptr Optimzations

The pass I’m currently developing adds a single, simple, optimization: constant fold the destructor based on ownership information. unique_ptr has a lot of ownership information communicated with reference semantics. When a unique_ptr is moved into another function, that function takes over ownership of the unique_ptr, and subsequent destructors can be eliminated (because they will be no-ops). Otherwise, branchless functions are often complicated after inlining unique_ptr’s destructor so, this optimization should be fairly beneficial.

unique_ptr’s reset and release methods both complicate this optimization a bit. Because they are also able to transfer and remove ownership, all unknown instructions must be ignored. However, in the future, knowledge of those methods might be able to make the pass more robust.

With unique_ptr, it’s difficult to prove liveness. So, it is hard to constant fold the destructor call to always be there. Maybe in the future, this would be possible, though (with sufficient analysis).

Last, an optimization that I hope to do is lowering the unique_ptr to a raw pointer if all lifetime paths are known. I think removing this layer of abstraction would make it easier for other optimization passes to be successful. Eventually, we may even be able to specialize functions that used to take a unique_ptr to now take a raw pointer, if the argument’s lifetime was also able to be fully analyzed.

Lifetime Annotations

Right now, the pass relies on (pre-inlined) function calls to generate ownership information. Another approach would be to add ownership annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).

ARC Optimizations

There are a huge number of large and small ARC optimizations already in LLVM. For unique_ptr specifically, I’m not sure these are of any benefit because unique_ptr doesn’t actually do any reference counting. But, later on, when I start working on generalizing this pass to support more smart pointers (specifically shared_ptr) I think the ARC optimization pass, and especially the utilities it contains, could be very beneficial. If anyone has experience with ARC optimizations, I’d love to hear your thoughts on extending them to other reference counted objects.

trivial_abi and Hidden References

Arthur O’Dwyer made a good point, which is that a lot of these optimizations can be applied when with the trivial_abi attribute. However, given that’s not a standard attribute and these optimizations only happen to work with trivial_abi (i.e., in a more complicated program, they may not continue to work). I think lifetime utilities and specific lifetime optimization passes are still beneficial (especially if they can be applied to other smart pointers in the future).

Because all smart pointers have non-trivial destructors, they are always passed by hidden references. With unique_ptr, this is as simple as bit-casting the pointer member to unique_ptr, which would allow for it to be lowered to a single raw pointer instead of a stack-allocated object. Even without the trival_abi attribute, I think this is an optimization that could be done.

Results

Here’s the unique_ptr pass I’ve been talking about: D81288 Opt Smart pointer lifetime optimizations pass.

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Best,

Zoe

John_McCall1 · June 5, 2020, 8:09pm

Unfortunately, these are not legal optimizations for your test case:

guaranteed is permitted to escape a reference (or pointer) to the
object it was passed. Tat references and pointers remain valid
until the object goes out of scope.
The object can be mutated through that reference because the underlying
object is not const. Being passed a const reference is not a
semantic contract in C++.
Through a combination of the above, the call to owner may change
the value of p, and so the caller may not rely on it still being
in a trivially-destructible state after that call.
owner may leave the value of its parameter object in a
non-trivially-destructible state, and under the Itanium C++ ABI, cleaning
up that object is the caller’s responsibility. I agree that this is a
bad rule for optimization purposes, but it’s the rule. This can only be
optimized with a more global, interprocedural optimization that shifts
responsibility to owner to destroy its argument.

John.

zoecarver · June 6, 2020, 5:47pm

John,

Thanks, those are good points. I think we can still remove one of the destructors (which could also be done by a more powerful DSE+load propagation) but, you’re right; one needs to stay.

This can only be optimized with a more global, interprocedural optimization that shifts responsibility to owner to destroy its argument.

I’ll think about implementing something like this, but I suspect any possible optimizations will already happen with inlining and analysis.

Thanks for the response,
Zoe

John_McCall1 · June 6, 2020, 9:07pm

John,

Thanks, those are good points. I think we can still remove one of the
destructors (which could also be done by a more powerful DSE+load
propagation) but, you’re right; one needs to stay.

Can you explain in more detail which destructor you think you can
eliminate?

This can only be optimized with a more global, interprocedural

optimization that shifts responsibility to owner to destroy its argument.

I’ll think about implementing something like this, but I suspect any
possible optimizations will already happen with inlining and analysis.

Yeah. For the narrow case of std::unique_ptr, since its operations
are easily inlined and can be easily optimized after copy propagation,
there’s not much more that can be done at a high level.

Note that trivial_abi (if it could be adopted on std::unique_ptr)
also changes the ABI to make the callee responsible for destruction.
So as part of getting a more efficient low-level ABI, you also get a
more optimizable high-level one.

One idea I’ve personally been kicking around is some way to mark
declarations as having an “unstable ABI”: basically, a guarantee that
all the code that uses them will be compiled with a single toolchain,
and therefore a license for the implementation to alter the ABI however
it likes with any code that uses any of those declarations.

A type would be unstable if it was composed even partially from a
declaration marked unstable. So class Unstable would be unstable,
but so would const Unstable * — and, crucially, so would
std::unique_ptr<Unstable>. But for soundness reasons, this would
need to ignore type sugar (so no marking typedefs), and it wouldn’t
be able to automatically descend into fields.

There are a few ways that we could use that directly in the compiler.
The big restriction is that you’re not breaking ABI globally and so
you always need an unstable “contaminant” that permits using the
unstable ABI. For example, we can’t just change function ABIs
for all unstable functions because function pointers have to remain
compatible. On the other hand, programs aren’t allowed to call
function pointers under the wrong type, so if the function type is
unstable, we can change anything we want about its ABI.

(For functions specifically, there’s another option: we could emit
the functions with an unstable ABI and then introduce thunks that
adapt the calling convention when the address is taken. But that’s
a non-trivial code-size hit that we might have to do unconditionally.
It also can’t adapt a callee-destroy ABI into a caller-destroy one
without introducing an extra move, which isn’t necessarily semantically
allowed.)

Probably more importantly, though, we could allow unstable-ness to
be detected with a type trait, and that would allow the standard
library to take advantage of it. So std::unique_ptr<int> would
be stuck with the stable ABI, but std::unique_ptr<Unstable> could
switch to be trivial_abi.

That does leave the problem of actually doing the annotation.
Adding an attribute to every class is probably beyond what people
would accept. There are several ways to do mass annotation. Pragmas
are problematic because you don’t want to accidentally leave the
pragma on when you exit a file and then have it cover a system
include. We do have some pragmas that prevent file changes while
the pragma is active, which is a decent solution for that problem.
An alternative is to mark namespaces. That probably needs to be
lexical: that is, you wouldn’t be able to mark the entire clang
namespace, you would mark a specific namespace clang declaration
in a single header. But that’s still much more manageable, and
after all, the cost to missing an annotation is just a missed
optimization.

We could also implicitly make all anonymous-namespace declarations
unstable.

John.

zoecarver · June 7, 2020, 10:52pm

John, sorry for my delayed response.

> Can you explain in more detail which destructor you think you can eliminate?

In the Godbolt link in my original post, there are two unique_ptrs, %3 and %4. They are both passed to the move constructor (as the source and destination). In the move constructor, the source’s held pointer is nulled out. If the next use of the pointer is the destructor (as is the case in the Godbolt example) then, the destructor will be a no-op. In the optimized code, it’s a bit more difficult to see. But, the dead store is still there. Two instructions before the call to owner, null is stored into %8. Then, in the 3rd block (%17) that same pointer (%8) is loaded and compared against null (which is always true).

> One idea I’ve personally been kicking around is some way to mark declarations as having an “unstable ABI”…

This is a super interesting idea. When I become more familiar with clang’s internals, I’d be interested in helping to implement it.

> Probably more importantly, though, we could allow unstable-ness to be detected with a type trait, and that would allow the standard library to take advantage of it.

We could actually do this for trivial_abi types too. If we added a builtin type trait to check if a type has the trivial_abi attribute, libc++ could conditionally give unique_ptr the trivial_abi attribute if its base type also had the attribute. Additionally, we could add a config macro that would do this globally when libc++ is in unstable ABI mode.

Best,

Zoe

John_McCall1 · June 8, 2020, 7:21am

John, sorry for my delayed response.

> Can you explain in more detail which destructor you think you can
eliminate?

In the Godbolt link in my original post, there are two unique_ptrs, %3 and %
4. They are both passed to the move constructor (as the source and
destination). In the move constructor, the source’s held pointer is nulled
out. If the next use of the pointer is the destructor (as is the case in
the Godbolt example) then, the destructor will be a no-op. In the optimized
code, it’s a bit more difficult to see. But, the dead store is still there.
Two instructions before the call to owner, null is stored into %8. Then, in
the 3rd block (%17) that same pointer (%8) is loaded and compared against
null (which is always true).

Yes, the destructor for the source (the local variable, as opposed to the
temporary) is the one that you can’t eliminate without proving that a
reference to it isn’t escaped by the first call and then mutated during the
second.

You wouldn’t be the first person to be surprised by the result of this sort
of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without either
interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value, but to
pass an “immutably borrowed” value in C++ we have to pass by reference, which
allows the reference to be escaped and accessed (and even mutated, if the
original object wasn’t declared const) as long as those accesses happen
before destruction.

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *

We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

John.

zygoloid · June 9, 2020, 1:13am

John, sorry for my delayed response.

> Can you explain in more detail which destructor you think you can
eliminate?

In the Godbolt link in my original post, there are two unique_ptrs, %3 and %
4. They are both passed to the move constructor (as the source and
destination). In the move constructor, the source’s held pointer is nulled
out. If the next use of the pointer is the destructor (as is the case in
the Godbolt example) then, the destructor will be a no-op. In the optimized
code, it’s a bit more difficult to see. But, the dead store is still there.
Two instructions before the call to owner, null is stored into %8. Then, in
the 3rd block (%17) that same pointer (%8) is loaded and compared against
null (which is always true).

Yes, the destructor for the source (the local variable, as opposed to the
temporary) is the one that you can’t eliminate without proving that a
reference to it isn’t escaped by the first call and then mutated during the
second.

Concretely: https://godbolt.org/z/4guY_m

Here, both destructors run after the call to ‘owner’ returns, and they destroy two different objects. You can’t eliminate either.

You wouldn’t be the first person to be surprised by the result of this sort
of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without either
interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value, but to
pass an “immutably borrowed” value in C++ we have to pass by reference, which
allows the reference to be escaped and accessed (and even mutated, if the
original object wasn’t declared const) as long as those accesses happen
before destruction.

Perhaps we should expose LLVM’s nocapture attribute to the source level?

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *

We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

I guess I should finish https://reviews.llvm.org/D63748 at some point. (Though I think we probably shouldn’t enable it in libc++ unstable ABI configurations by default, since it also changes observable program semantics due to altering destruction order, and is arguably non-conforming for the same reason.)

zoecarver · June 9, 2020, 2:31am

Richard and John, thanks for those thoughts and comments.

Yes, the destructor for the source (the local variable, as opposed to the temporary) is the one that you can’t eliminate without proving that a reference to it isn’t escaped by the first call and then mutated during the second.

and

Here, both destructors run after the call to ‘owner’ returns, and they destroy two different objects. You can’t eliminate either.

I get it now. The link really helped, thanks, Richard.

Perhaps we should expose LLVM’s nocapture attribute to the source level?

This would be very interesting and probably not too difficult to do. I can add it to my backlog of things to investigate.

trivial_abi is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr for types that are already trivial_abi is just as much of an ABI break as changing it in general would be.

We could make it unstable-ABI only. Or add another config flag to enable it. Looks like Richard’s patch takes this a step further anyway, though.

John_McCall1 · June 9, 2020, 2:52am

You wouldn’t be the first person to be surprised by the result of this sort
of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without either
interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value, but
to
pass an “immutably borrowed” value in C++ we have to pass by reference,
which
allows the reference to be escaped and accessed (and even mutated, if the
original object wasn’t declared const) as long as those accesses happen
before destruction.

Perhaps we should expose LLVM’s nocapture attribute to the source level?

I think we have with __attribute__((noescape)). Of course, adopting it
systematically would be hugely invasive.

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *

We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of
std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that
trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

I guess I should finish https://reviews.llvm.org/D63748 at some point.
(Though I think we probably shouldn’t enable it in libc++ unstable ABI
configurations by default, since it also changes observable program
semantics due to altering destruction order, and is arguably non-conforming
for the same reason.)

It definitely changes observable semantics, but it’s not obviously
non-conforming; [expr.call]p7 gives us a lot of flexibility here:

It is implementation-defined whether the lifetime of a parameter
ends when the function in which it is defined returns or at the
end of the enclosing full-expression.

And note that MSVC has traditionally destroyed parameters in the callee.
IIRC the standard actually originally specified that parameters were
always destroyed at the end of the call and only changed it due to
Itanium doing otherwise.

Now, it’s possible that the copy-elision rules have an unfortunate
impact here. IIRC an object initialized with an elided copy is supposed
to take on the longer of the two natural lifetimes. Does that mean that
if you have a parameter initialized by an elided copy from a temporary,
the parameter needs to live until the end of the calling full-expression
like the temporary would have? If so, you either wouldn’t be able to
use a callee-destroy ABI or you wouldn’t be allowed to elide copies
into parameters, and the latter seems unacceptable.

Even if it’s conforming, I’m sure there are bugs about e.g. the proper
ordering with things like function-try-blocks and exception specifications.

John.

John_McCall1 · June 9, 2020, 3:09am

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *

We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of
std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that
trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

I guess I should finish ⚙ D63748 Pass std::unique_ptr, std::shared_ptr and std::weak_ptr in registers when possible. at some point.
(Though I think we probably shouldn't enable it in libc++ unstable ABI
configurations by default, since it also changes observable program
semantics due to altering destruction order, and is arguably non-conforming
for the same reason.)

It definitely changes observable semantics, but it’s not *obviously*
non-conforming; [expr.call]p7 gives us a lot of flexibility here:

  It is implementation-defined whether the lifetime of a parameter
  ends when the function in which it is defined returns or at the
  end of the enclosing full-expression.

And note that MSVC has traditionally destroyed parameters in the callee.
IIRC the standard actually originally specified that parameters were
always destroyed at the end of the call and only changed it due to
Itanium doing otherwise.

Now, it’s possible that the copy-elision rules have an unfortunate
impact here. IIRC an object initialized with an elided copy is supposed
to take on the longer of the two natural lifetimes. Does that mean that
if you have a parameter initialized by an elided copy from a temporary,
the parameter needs to live until the end of the calling full-expression
like the temporary would have? If so, you either wouldn’t be able to
use a callee-destroy ABI or you wouldn’t be allowed to elide copies
into parameters, and the latter seems unacceptable.

Even if it’s conforming, I’m sure there are bugs about e.g. the proper
ordering with things like function-try-blocks and exception specifications.

If you were just thinking about [class.temporary]’s strict order
rules for non-extended temporaries, though, I don’t think parameters
are actually temporaries for this purpose. There’s discussion of
temporaries for parameters and return values in [class.temporary],
but only for (a particular sense of) trivially-copyable types, and
it’s only a formalism for allowing the objects to be passed directly.

John.

zoecarver · June 9, 2020, 3:20am

Does that mean that if you have a parameter initialized by an elided copy from a temporary, the parameter needs to live until the end of the calling full-expression like the temporary would have?

This is a very good point. Does the trivial_abi attribute force the compiler to pass by value or just allow it to pass by value? If the former, I don’t think we actually have an issue. Because even if the copy is elided, it is still getting passed through registers to the callee. So it shouldn’t have any effect on the elided copy’s source.

jdoerfert · June 10, 2020, 3:11pm

Yes please

There was an RFC (I think) and a prototype patch to expose LLVM-IR

attributes via clang. We can limit the ones allowed if we want.

See https://c.wsmoses.com/presentations/HTO_Presentation.pdf for

another motivation

Quuxplusone · June 10, 2020, 3:31pm

As someone (John?) has said in this thread, the nocapture attribute is already exposed to C/C++ via attribute((noescape)).

Here is an example of attribute((noescape)) doing its thing:
https://godbolt.org/z/rJbEmZ

In function one, the call to noargs is assumed to trash the value of i.
In function two, the call to noargs is not assumed to trash the value of i.

However, notice that the store corresponding to the assignment i = 42; cannot be removed in either case! We initialize i to 42, then pass it by const reference to dont_escape, and yet cannot assume that its value is unchanged by that call. This is required by C++'s loose rules around const. But it doesn’t match up with human intuition very well.

I mentioned to Zoe offline that I’d like to see Clang add an opt-in non-conforming optimization mode, on the lines of -ffast-math, which would assume things like “things passed by const reference don’t change” and “things passed by reference don’t escape,” and then measure to what degree codegen is improved on real codebases by that non-conforming optimization mode.

–Arthur

jdoerfert · June 10, 2020, 3:40pm

You wouldn’t be the first person to be surprised by the result of this

sort

of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without

either

interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value,

but

to pass an “immutably borrowed” value in C++ we have to pass by

reference,

which allows the reference to be escaped and accessed (and even

mutated, if the

original object wasn’t declared const) as long as those accesses happen
before destruction.

Perhaps we should expose LLVM's nocapture attribute to the source level?

Yes please

As someone (John?) has said in this thread, the nocapture attribute is
already exposed to C/C++ via __attribute__((noescape)).

Yeah, I saw that later, thanks. I actually want way more attributes exposed,

e.g., the ones I mention below. But also `dereferenceable`, `nosync`,

`willreturn`, ...

Here is an example of __attribute__((noescape)) doing its thing:
Compiler Explorer
In function `one`, the call to `noargs` is assumed to trash the value of
`i`.
In function `two`, the call to `noargs` is *not* assumed to trash the value
of `i`.

However, notice that the store corresponding to the assignment `i = 42;`
cannot be removed in either case! We initialize `i` to 42, then pass it by
const reference to `dont_escape`, and yet *cannot assume that its value is
unchanged by that call*. This is required by C++'s loose rules around
`const`. But it doesn't match up with human intuition very well.

I mentioned to Zoe offline that I'd like to see Clang add an opt-in
non-conforming optimization mode, on the lines of `-ffast-math`, which
would *assume* things like "things passed by const reference don't change"
and "things passed by reference don't escape," and then measure to what
degree codegen is improved on real codebases by that non-conforming
optimization mode.

Interesting, especially to determine lost performance potential [0].

Should not even be hard, mark `const` as `nocapture`, `readonly`,

and `nofree` during IR-gen.

-- Johannes

[0] Performance Exploration Through Optimistic Static Program Annotations | SpringerLink

Quuxplusone · June 10, 2020, 4:17pm

Exactly! And I see you’re the first author of that paper!
Fulltext PDF: “Performance Exploration Through Optimistic Static Program Annotations” (Doerfert, Homerding, Finkel; 2019)
If I understand correctly, what you did there was have the compiler add annotations to every possible site, and then iteratively use an “acceptance test” to find out which annotations broke the program and must be rolled back, until you arrived at a local minimum. (Similar to what C-Reduce does.)

My idea of a “-ffast-math–style” opt-in mode couldn’t depend on the existence of an “acceptance test” for the program it was compiling. It would blindly use heuristics invented by the Clang developer, such as “things marked const should always be annotated readonly” and “parameters of reference type should always be annotated noescape.”

If your PETOSPA framework hasn’t yet bit-rotted, you could use PETOSPA to sanity-check my heuristics! Suppose you did a special run of your compiler, applying just the readonly and noescape annotations, in exactly the cases outlined in my previous paragraph. And then you run your “acceptance tests” on the resulting binaries. My hypothesis is that the acceptance tests should all pass. That is: applying these heuristics should not break any of your programs. If applying one of these heuristics does break one of your programs, I would be very interested to learn why.

Thoughts? Is there any chance you could run a test like that in the next month or two?
–Arthur

jdoerfert · June 10, 2020, 4:38pm

I mentioned to Zoe offline that I'd like to see Clang add an opt-in
non-conforming optimization mode, on the lines of `-ffast-math`, which
would *assume* things like "things passed by const reference don't

change"

and "things passed by reference don't escape," and then measure to what
degree codegen is improved on real codebases by that non-conforming
optimization mode.

Interesting, especially to determine lost performance potential [0].
[0] Performance Exploration Through Optimistic Static Program Annotations | SpringerLink

Exactly! And I see you're the first author of that paper!
Fulltext PDF: "Performance Exploration Through Optimistic Static Program
Annotations" (Doerfert, Homerding, Finkel; 2019)
<https://github.com/jdoerfert/PETOSPA/blob/master/ISC19.pdf>
If I understand correctly, what you did there was have the compiler add
annotations to every possible site, and then iteratively use an "acceptance
test" to find out which annotations broke the program and must be rolled
back, until you arrived at a local minimum. (Similar to what C-Reduce does.)

Pretty much, yes. (Slides and paper attached in case that helps.)

My idea of a "-ffast-math–style" opt-in mode couldn't depend on the
existence of an "acceptance test" for the program it was compiling. It
would blindly use heuristics invented by the Clang developer, such as
"things marked const should always be annotated readonly" and "parameters
of reference type should always be annotated noescape."

Agreed. Totally. You would probably still run your tests (I mean the

user would). It is just a single manually triggered flag instead of

an automatic exploration. Not to say this is not actually more

useful in the short term

If your PETOSPA framework hasn't yet bit-rotted, *you could use PETOSPA to
sanity-check my heuristics!* Suppose you did a special run of your
compiler, applying just the readonly and noescape annotations, in exactly
the cases outlined in my previous paragraph. And then you run your
"acceptance tests" on the resulting binaries. My hypothesis is that the
acceptance tests should all pass. That is: applying these heuristics should
not break any of your programs. If applying one of these heuristics does
break one of your programs, I would be very interested to learn why.

In Figure 6 we stated that only 3 pointer arguments we annotated with

readnone/readonly/writeonly and none of the nocapture annotated pointers caused

our verification to fail. To be fair, pointer capturing is not really

what you expect many functions in an HPC application to do

Thoughts? Is there any chance you *could *run a test like that in the next
month or two?

I think we should "just" modify the clang codegen and run the test suite

and SPEC. That way we measure the actual thing with similar work overhead.

WDYT?

ISC19.pdf (441 KB)

talk.pdf (715 KB)

Quuxplusone · June 10, 2020, 5:06pm

My idea of a “-ffast-math–style” opt-in mode couldn’t depend on the
existence of an “acceptance test” for the program it was compiling. […]

Agreed. Totally. You would probably still run your tests (I mean the
user would).

Heh, yes. But if the user’s tests failed, they’d have no recourse except “don’t opt in anymore,” similar to what would happen if their tests failed with -ffast-math (or with -O3 or with -fstrict-aliasing, for that matter).

If your PETOSPA framework hasn’t yet bit-rotted, you could use PETOSPA to
sanity-check my heuristics! Suppose you did a special run of your
compiler, applying just the readonly and noescape annotations, in exactly
the cases outlined in my previous paragraph. And then you run your
“acceptance tests” on the resulting binaries. My hypothesis is that the
acceptance tests should all pass. That is: applying these heuristics should
not break any of your programs. If applying one of these heuristics does
break one of your programs, I would be very interested to learn why.

In Figure 6 we stated that only 3 pointer arguments we annotated with
readnone/readonly/writeonly and none of the nocapture annotated pointers caused
our verification to fail. To be fair, pointer capturing is not really
what you expect many functions in an HPC application to do

I see. The paper is unclear to me (on a quick reading) in three ways:
(1) When you say “pointer argument” do you also include “by-reference argument”?
(2) Do you include the implicit this parameter?
(3) IIUC, your statement in Figure 6 says that all-but-3 pointer parameters were successfully annotated with some annotation, but it doesn’t break them down into pass-by-pointer and pass-by-const-pointer, and it doesn’t break the annotations down into readnone/readonly/writeonly. It is theoretically possible that your all-but-3 number includes some const pointers annotated with writeonly(!), or that the 3 includes some const pointers. Either of those outcomes would falsify my hypothesis.

I think we should “just” modify the clang codegen and run the test suite
and SPEC. That way we measure the actual thing with similar work overhead.
WDYT?

“Just” modifying Clang would also be awesome.
If you modify Clang, I expect Clang’s own test suite to fail at least twice, because you’d be deliberately introducing non-conforming compiler behavior, and the test suite should definitely treat that as a “correctness regression.”
Ah: you could use this mode to compile Clang itself and then use “time to run Clang’s test suite” as your benchmark. If that’s what you meant by “run the test suite,” then I agree!

–Arthur

John_McCall1 · June 10, 2020, 5:36pm

This is quite an interesting idea. Having a mode that assumes that
const references don’t get cast back might be quite interesting. I
suspect that enforcing stricter escape rules would break some specific
idioms and types.

John.

jdoerfert · June 10, 2020, 6:02pm

My idea of a "-ffast-math–style" opt-in mode couldn't depend on the
existence of an "acceptance test" for the program it was compiling. [...]

Agreed. Totally. You would probably still run your tests (I mean the
user would).

Heh, yes. But if the user's tests failed, they'd have no recourse except
"don't opt in anymore," similar to what would happen if their tests
failed with `-ffast-math` (or with `-O3` or with `-fstrict-aliasing`, for
that matter).

Agreed

If your PETOSPA framework hasn't yet bit-rotted, *you could use PETOSPA to

sanity-check my heuristics!* Suppose you did a special run of your
compiler, applying just the readonly and noescape annotations, in exactly
the cases outlined in my previous paragraph. And then you run your
"acceptance tests" on the resulting binaries. My hypothesis is that the
acceptance tests should all pass. That is: applying these heuristics

should

not break any of your programs. If applying one of these heuristics does
break one of your programs, I would be very interested to learn why.

In Figure 6 we stated that only 3 pointer arguments we annotated with
readnone/readonly/writeonly and none of the nocapture annotated
pointers caused
our verification to fail. To be fair, pointer capturing is not really
what you expect many functions in an HPC application to do

I see. The paper is unclear to me (on a quick reading) in three ways:
(1) When you say "pointer argument" do you also include "by-reference
argument"?

Yes, it's implemented on IR, there is no(t much) difference.

(2) Do you include the implicit `this` parameter?

Yes, it's implemented on IR. (Though the benchmarks were not C++ heavy, I think.)

(3) IIUC, your statement in Figure 6 says that all-but-3 pointer parameters
were successfully annotated with *some* annotation, but it doesn't break
them down into pass-by-pointer and pass-by-const-pointer, and it doesn't
break the annotations down into `readnone`/`readonly`/`writeonly`. It is
theoretically possible that your all-but-3 number includes some const
pointers annotated with `writeonly`(!), or that the 3 includes some const
pointers. Either of those outcomes would falsify my hypothesis.

It is possible, yes. Doubtful from what I remember, though I had to check the data to make sure.

I think we should "just" modify the clang codegen and run the test suite

and SPEC. That way we measure the actual thing with similar work overhead.
WDYT?

"Just" modifying Clang would also be awesome.
If you modify Clang, I *expect* Clang's own test suite to fail at least
twice, because you'd be deliberately introducing non-conforming compiler
behavior, and the test suite should definitely treat that as a "correctness
regression."

I am not even sure we'd break many tests. I am not aware of tests that

ensure either (1) const_cast + modify work properly, or, (2) let the pointer

escape and trigger some modification (store/free) to be moved prior to the

"hidden" read via the escaped pointer. That said, there are certainly situations

of (1) and (2) in the wild. I would add the annotation, run it on llvm-test-suite,

spec, ... and see what falls out. If there is no/little compile/runtime improvement,

less incentive to go forth with this. Otherwise, we can take a closer look at the

failure rate and determine if a user flag makes sense.

Ah: you could use this mode to *compile Clang itself* and then use "time to
run Clang's test suite" as your benchmark. If that's what you meant by
"run the test suite," then I agree!

Self hosting is certainly also a nice benchmark here.

jdoerfert · June 10, 2020, 6:05pm

Agreed. There are no doubt use cases though. I remember the one vendor compiler

that offered a flag to make *all* arguments restrict. It for sure breaks code,

but some seemed to really like it. There is also the option to only "apply" these

to user code, not system includes (assuming we can reasonably differentiate).

-- Johannes

Topic		Replies	Views
Use of Smart Pointers in LLVM Projects LLVM Dev List Archives	36	339	August 12, 2014
Proposal for Uniqueness Analysis Clang Frontend	3	77	August 12, 2013
Question about -Wlifetime and unique pointers Clang Frontend	1	74	November 4, 2019
New type of smart pointer for LLVM LLVM Dev List Archives	26	145	November 14, 2014
unique_ptr and llvm cast machinery LLVM Dev List Archives	4	88	April 16, 2014

Smart Pointer Lifetime Optimizations

Related Topics