Hi Phillip,
I've been looking into the Capture Tracking Improvements and I was
wondering if there was any research/documentation that you know of that
I could use as background reading?
Many thanks,
Scott
Hi Phillip,
I've been looking into the Capture Tracking Improvements and I was
wondering if there was any research/documentation that you know of that
I could use as background reading?
Many thanks,
Scott
Hey Scott,
There has been quite a lot of research on capture tracking (aka escape analysis) for Java and other dynamic languages.
See e.g.:
https://wiki.openjdk.java.net/display/HotSpot/EscapeAnalysis
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
Nuno
Hi Nuno,
This is great, thank you.
Scott
(+CC LLVM dev - I'd dropped it in my original reply unintentionally and just noticed.)
Thanks for all the write-up, I found it very helpful. Especially since coming from a C/C++ background the distinction captured/escaped wasn't clear to me.
It seems that we indeed assume that "capture" implies "escape" in LLVM (conservatively).
I observed that it matters frequently when you have load/store from/to a global variable, and you can't know it does not "escape" because it is "captured" somehow. The alias analysis has to conservatively consider these load/store as "may alias" with any other pointer load/store.
Great, thanks Philip!
Nuno
Thank you for this write up, it is very useful
Many thanks,
Scott
(+CC LLVM dev - I'd dropped it in my original reply unintentionally and
just noticed.)(This was written in a rush. There may be mistakes; if so I'll try to
correct later.)At the moment, most of LLVM is worried about capture. The only exception
I know of are:
1) isAllocSiteRemovable in InstCombine/InstructionCombining.cpp
2) The thread local logic used in LICM's store promotionLet me phrase this informally:
- "capture" - can anyone inspect the bits of this pointer?
- "escape" - can anyone inspect the contents of this allocation?
- "thread escape" - can any other thread inspect the contents of this
allocation?Generally, "escape" and "thread local" are about the *contents* of an
allocation. "capture" is about the the pointer value itself. In practice,
we generally treat "capture" very conservatively. To have something which
has escaped, but isn't captured, you'd have to have a way to refer to an
object without being able to determine it's address. C++ doesn't have this
(I think?). Java does (in very limited forms), but we haven't tried to be
aggressive here in LLVM. We generally assume "capture" implies "escape" and
"thread escape".Illustrative examples:
- A function which returns the alignment of a pointer captures a pointer,
but does not cause it to escape or become non-thread local.
- A function which compares a pointer against a known constant may
capture, escape, and make non-thread-local all at once if the constant is
known to any other thread.
- A function which writes a newly allocated pointer into a thread local
buffer has captured and escaped it, but has not made it non-thread local.If I know something is thread local:
- I can demote atomic accesses to non-atomic ones.
Agreed you can make it non-atomic, but with LLVM's memory model can you
lose the ordering effect that the atomic had? I think in C++ you can (e.g.
a stack-local atomic doesn't enforce ordering, IIRC majnemer had an example
of this), but I don't think LLVM's model specifies.
If I know something is unescaped:
(+CC LLVM dev - I'd dropped it in my original reply unintentionally and
just noticed.)(This was written in a rush. There may be mistakes; if so I'll try to
correct later.)At the moment, most of LLVM is worried about capture. The only
exception I know of are:
1) isAllocSiteRemovable in InstCombine/InstructionCombining.cpp
2) The thread local logic used in LICM's store promotionLet me phrase this informally:
- "capture" - can anyone inspect the bits of this pointer?
- "escape" - can anyone inspect the contents of this allocation?
- "thread escape" - can any other thread inspect the contents of this
allocation?Generally, "escape" and "thread local" are about the *contents* of an
allocation. "capture" is about the the pointer value itself. In practice,
we generally treat "capture" very conservatively. To have something which
has escaped, but isn't captured, you'd have to have a way to refer to an
object without being able to determine it's address. C++ doesn't have this
(I think?). Java does (in very limited forms), but we haven't tried to be
aggressive here in LLVM. We generally assume "capture" implies "escape" and
"thread escape".Illustrative examples:
- A function which returns the alignment of a pointer captures a
pointer, but does not cause it to escape or become non-thread local.
- A function which compares a pointer against a known constant may
capture, escape, and make non-thread-local all at once if the constant is
known to any other thread.
- A function which writes a newly allocated pointer into a thread local
buffer has captured and escaped it, but has not made it non-thread local.If I know something is thread local:
- I can demote atomic accesses to non-atomic ones.Agreed you can make it non-atomic, but with LLVM's memory model can you
lose the ordering effect that the atomic had? I think in C++ you can (e.g.
a stack-local atomic doesn't enforce ordering, IIRC majnemer had an example
of this), but I don't think LLVM's model specifies.
IIRC, the example was something like:
void barrier() {
std::atomic<int> z;
z.store(1, std::memory_order_seq_cst);
}
Does the modification to 'z' participate in the total ordering?
LLVM doesn't think so.
ICC emits an mfence and no store.
GCC emits an mfence and a store.
I think LLVM's behavior here is defensible.
[Philip]
If I know something is thread local:
[David]
void barrier() {
std::atomic z;
z.store(1, std::memory_order_seq_cst);
}
Does the modification to ‘z’ participate in the total ordering?
LLVM doesn’t think so.
ICC emits an mfence and no store.
GCC emits an mfence and a store.
I think LLVM’s behavior here is defensible.
I think there is a problem here. If you demote atomic stores to non-atomic store then you are removing reordering constraints i.e with With atomic store you cannot move later loads/stores before Z.store() but if you make it non-atomic too early in compiler then further passes may reorder loads/stores and the program may not be correct. Am I right?
For the record, I am not a C++ expert or a concurrency expert. Take everything I say with several grains of salt…
If I remember the key parts of the C++ spec, an atomic store synchronizes with (establishes an ordering with) only other atomic operations.
For acquire and release, the spec is clearly defined in terms of other atomic memory operations on the same memory location. Given we’ve proven the location thread local, by definition no other thread can contain a atomic memory operation which references it.
acq_rel would seem to follow from the above.
As for csq_cst, I’m really not sure. The wording I can find defining it is a bit vague and would appear to imply an ordering involving all memory locations (including the non-thread local ones). Given this, it is probably safest not to demote seq_cst to non-atomic without careful consideration. We could always lower the atomic seq_cst store to a non-atomic store and a set of fences, but that seems unlikely to be profitable.
To be very clear, I’m not stating it is illegal to demote thread local seq_cst stores; I’m just stating that I don’t see an obvious argument why it is definitely legal.
Philip