Finding safe thread suspension points while JIT-ing (was: Add pass run listeners to the pass manager.)

So, I’m bringing a discussion that has thus far been on llvm-commits to llvmdev because I finally (and thanks for helping me understand this Andy) understand what is really going on, and I think lots of others need to be aware of this and involved to figure out the right path forward.

You can find the full review thread and more context under the subject “[PATCH][PM] Add pass run listeners to the pass manager.”, but here is the important bit from Juergen’s initial email:

this patch provides the necessary C/C++ APIs and infastructure to enable fine-
grain progress report and safe suspension points after each pass in the pass
manager.

Clients can provide a callback function to the pass manager to call after each
pass. This can be used in a variety of ways (progress report, dumping of IR
between passes, safe suspension of threads, etc).

I had wrongly (perhaps because of the implementation, but still, wrongly) focused on the progress report and IR dumping use cases. It sounds (from talking to Andy offline, sorry for that confusion) like the real use case is safe suspension of threads. The idea is that we would have a callback from the pass manager into the LLVMContext which would be used to recognize safe points at which the entire LLVMContext could be suspended, and to expose these callbacks through C API to JIT users.

Increasingly, I can’t fathom a way to get a good design for safe suspension of JIT-ing threads using callbacks tied to when passes run today. I think it is a huge mistake to bake this into the C API at this point. If you need this functionality in the C API, with a design we can use going forward, I’d like to see a really careful write up of exactly what the suspension point requirements are and a design for achieving them. I think it should be completely independent from any infrastructure for reporting or dumping IR in pass managers.

I think something much simpler than this might work outside of the C API, where we can essentially change how it works when we start designing how multiple threads will actually work within an LLVMContext. Would that work? Is there a way to make progress more rapidly there?

Ultimately, this is opening a huge can of worms if we put it into the C API, as I think it is going to fundamentally impact what options we actually have for parallelizing parts of LLVM in the future. If we want to go there, we need be incredibly explicit about what assumptions are being made here.

So, I’m bringing a discussion that has thus far been on llvm-commits to llvmdev because I finally (and thanks for helping me understand this Andy) understand what is really going on, and I think lots of others need to be aware of this and involved to figure out the right path forward.

You can find the full review thread and more context under the subject “[PATCH][PM] Add pass run listeners to the pass manager.”, but here is the important bit from Juergen’s initial email:

this patch provides the necessary C/C++ APIs and infastructure to enable fine-
grain progress report and safe suspension points after each pass in the pass
manager.

Clients can provide a callback function to the pass manager to call after each
pass. This can be used in a variety of ways (progress report, dumping of IR
between passes, safe suspension of threads, etc).

I had wrongly (perhaps because of the implementation, but still, wrongly) focused on the progress report and IR dumping use cases. It sounds (from talking to Andy offline, sorry for that confusion) like the real use case is safe suspension of threads. The idea is that we would have a callback from the pass manager into the LLVMContext which would be used to recognize safe points at which the entire LLVMContext could be suspended, and to expose these callbacks through C API to JIT users.

Good. Let’s table the discussion of how to report passes and just focus on the thread suspension API. It never occurred to me that a client using the new API for thread scheduling would not already be making an assumption about one thread per context. I believe anything else will break these clients regardless of the API. So I didn’t see this API as imposing a new restriction. The more explicit we can be about this, the better.

Increasingly, I can’t fathom a way to get a good design for safe suspension of JIT-ing threads using callbacks tied to when passes run today. I think it is a huge mistake to bake this into the C API at this point. If you need this functionality in the C API, with a design we can use going forward, I’d like to see a really careful write up of exactly what the suspension point requirements are and a design for achieving them. I think it should be completely independent from any infrastructure for reporting or dumping IR in pass managers.

Yes, there absolutely needs to be a way to expose functionality within LLVM in its current form through the C API. We can say that the API works under some explicit set of rules. If some future LLVM can be configured in a way that breaks the rules, you don’t get the callback in that case.

I think something much simpler than this might work outside of the C API, where we can essentially change how it works when we start designing how multiple threads will actually work within an LLVMContext. Would that work? Is there a way to make progress more rapidly there?

Ultimately, this is opening a huge can of worms if we put it into the C API, as I think it is going to fundamentally impact what options we actually have for parallelizing parts of LLVM in the future. If we want to go there, we need be incredibly explicit about what assumptions are being made here.

Let’s be explicit then.

We will always need to be able to configure LLVM with one thread per context. Always. So it’s not like we’re adding something that could become unusable in the future. Does anyone disagree?

Incidentally, I have no idea why the callback would not work with parallel context. If you suspend a thread within a thread group, it is totally expected that the other threads will also eventually block.

Tangentially, how many other places do we assume that an LLVMContext corresponds to a thread?

-Andy

So, I’m bringing a discussion that has thus far been on llvm-commits to llvmdev because I finally (and thanks for helping me understand this Andy) understand what is really going on, and I think lots of others need to be aware of this and involved to figure out the right path forward.

You can find the full review thread and more context under the subject “[PATCH][PM] Add pass run listeners to the pass manager.”, but here is the important bit from Juergen’s initial email:

this patch provides the necessary C/C++ APIs and infastructure to enable fine-
grain progress report and safe suspension points after each pass in the pass
manager.

Clients can provide a callback function to the pass manager to call after each
pass. This can be used in a variety of ways (progress report, dumping of IR
between passes, safe suspension of threads, etc).

I had wrongly (perhaps because of the implementation, but still, wrongly) focused on the progress report and IR dumping use cases. It sounds (from talking to Andy offline, sorry for that confusion) like the real use case is safe suspension of threads. The idea is that we would have a callback from the pass manager into the LLVMContext which would be used to recognize safe points at which the entire LLVMContext could be suspended, and to expose these callbacks through C API to JIT users.

Good. Let’s table the discussion of how to report passes and just focus on the thread suspension API. It never occurred to me that a client using the new API for thread scheduling would not already be making an assumption about one thread per context. I believe anything else will break these clients regardless of the API. So I didn’t see this API as imposing a new restriction. The more explicit we can be about this, the better.

They not only have to make the assumption of one thread per context, but they actually have to enforce it. According to the comments in LLVMContext there is no locking guarantee and the user/client has to be careful to use one context per thread. This is the current C API and that is how clients are using it right now.

Any future extension to the LLVMContext and to the pass manager that change this requirement - namely running in parallel - should be backwards compatible. Although I don’t see how this could or should be an issue to begin with as long we default to the current single-threaded execution model per LLVMContext. Anything that changes this behavior should and have to be explicitly requested by the client. That means there has to be a new C API call to communicate this information. For now all the threads are created by the client and I think this should also stay so in the future.

Increasingly, I can’t fathom a way to get a good design for safe suspension of JIT-ing threads using callbacks tied to when passes run today. I think it is a huge mistake to bake this into the C API at this point. If you need this functionality in the C API, with a design we can use going forward, I’d like to see a really careful write up of exactly what the suspension point requirements are and a design for achieving them. I think it should be completely independent from any infrastructure for reporting or dumping IR in pass managers.

Yes, there absolutely needs to be a way to expose functionality within LLVM in its current form through the C API. We can say that the API works under some explicit set of rules. If some future LLVM can be configured in a way that breaks the rules, you don’t get the callback in that case.

It is already a conscious choice of the client if and how to use threads. This choice already affects how callbacks that we already have are implemented by the client. The same would apply for the proposed callback. The client knows exactly the conditions, because it is in full control of setting up the environment.

I think something much simpler than this might work outside of the C API, where we can essentially change how it works when we start designing how multiple threads will actually work within an LLVMContext. Would that work? Is there a way to make progress more rapidly there?

Ultimately, this is opening a huge can of worms if we put it into the C API, as I think it is going to fundamentally impact what options we actually have for parallelizing parts of LLVM in the future. If we want to go there, we need be incredibly explicit about what assumptions are being made here.

Yes, this will definitely impact the design, but only in a positive way :smiley: There is only one big requirement and that is a given: The thread cannot hold a global mutex when making this call. This would deadlock everything - even other concurrent running contexts in todays implementation.

When a thread group is running concurrently in the future pass manager then it clear that the suspension of any thread in this thread group might deadlock the remaining threads in the thread group and that is perfectly fine. Also having this callback being fired concurrently is fine too. The client created a parallel pass manager and has to write the callback thread-safe.

The important thing here is that LLVM is holding the thread hostage and we need the control back to safely suspend it. It is possible suspend the thread from outside, but then it might be inside a system call or library call that holds a mutex. This could deadlock the whole application. By giving the control back to the client via the call back we know that this cannot happen. We know that LLVM might hold some mutex local to the context, but that is fine and won’t deadlock the whole application.

-Juergen

I don’t have a strong opinion on this topic at the moment, but given that it is potentially GC related, I figured I’d speak up.

I see two unspoken assumptions in the thread so far:

  • The runtime needs a means to bring all threads to a stop to perform some action. In particular, no thread can be excepted.
  • A single LLVM library call takes longer to complete than the runtime is willing to wait for the thread stoppage to occur.

Both are entirely reasonable; I’m just spelling them out since they might not be obvious to everyone. The second is particularly noteworthy since it’s entirely new in LLVM.

I largely agree with Andy at the moment that the existing interface assumes one thread per context. I don’t see the issue with continuing with this assumption, particular since no one t.m.k. has put forward any serious plans to change this. If this does happen, having it occur in an opt-in manner will be a matter of practical necessity.

I find myself somewhat uncertain of the choice to leverage the pass manager for this purpose though. It’s the entirely logical place to start, but is it enough? What if a single pass takes too long to complete? Do we start scattering these call backs throughout large parts of LLVM? Also, how does this interact with something like the hoped for parallel LTO?

Like Chandler, I believe there needs to be a fully thought out proposal and discussion on this list. I would weakly oppose the change in it’s current form solely on this basis.

Andy, Juergen - Can you start by providing a bit more background information? What do you need the thread yield call back for? Garbage collection? Other VM tasks? User mode scheduling of threads? Something else entirely?

Yours,
Philip

FWIW, I have specific, concrete, and serious plans here. =D The new pass
manager is in part motivated by these plans. I expect parallelism to be the
next major infrastructure thing I work on after the pass manager.

I don’t have a strong opinion on this topic at the moment, but given that it is potentially GC related, I figured I’d speak up.

I see two unspoken assumptions in the thread so far:

  • The runtime needs a means to bring all threads to a stop to perform some action. In particular, no thread can be excepted.

The only assumption that we want client to make of this API is that a compiler thread can be suspended at this point without blocking other non-compiler threads in the same process, or blocking other compiler threads associated with a different LLVM context.

There will be no assumption about the state of the suspended thread’s LLVM context and no assumption that other threads in the same context will continue executing (if there were such a thing).

  • A single LLVM library call takes longer to complete than the runtime is willing to wait for the thread stoppage to occur.

I’m not sure what you mean by this.

Both are entirely reasonable; I’m just spelling them out since they might not be obvious to everyone. The second is particularly noteworthy since it’s entirely new in LLVM.

I largely agree with Andy at the moment that the existing interface assumes one thread per context. I don’t see the issue with continuing with this assumption, particular since no one t.m.k. has put forward any serious plans to change this. If this does happen, having it occur in an opt-in manner will be a matter of practical necessity.

I don’t think this API makes any assumption about LLVM’s threading model. I don’t even see how this API assumes one thread per context. However, I will concede that it does if it helps focus this discussion. Either way, the JIT should determine the threading model and be given a C API that allows configuring and scheduling those threads. As you correctly said, if we hypothetically implement a parallel pass manager mode that breaks some JITs, then clients will need to opt-in. We are not beholden to implement this callback in that mode. We just need to make that clear.

I find myself somewhat uncertain of the choice to leverage the pass manager for this purpose though. It’s the entirely logical place to start, but is it enough? What if a single pass takes too long to complete? Do we start scattering these call backs throughout large parts of LLVM? Also, how does this interact with something like the hoped for parallel LTO?

Like Chandler, I believe there needs to be a fully thought out proposal and discussion on this list. I would weakly oppose the change in it’s current form solely on this basis.

Would you (or anyone) oppose a simple maySuspendContext() callback API? It would mean nothing more than the thread(s) for a given LLVM context can be suspended independent from other contexts.

Andy, Juergen - Can you start by providing a bit more background information? What do you need the thread yield call back for? Garbage collection? Other VM tasks? User mode scheduling of threads? Something else entirely?

The thread yield API is primarily for user mode scheduling of threads.

A JIT could certainly use it for GC and other VM tasks. That isn’t necessary for us because those tasks can run concurrent with the LLVM thread until it completes. It would only be a problem if there are resource constraints.

-Andy

I don't have a strong opinion on this topic at the moment, but given that
it is potentially GC related, I figured I'd speak up.

I see two unspoken assumptions in the thread so far:
- The runtime needs a means to bring all threads to a stop to perform some
action. In particular, no thread can be excepted.

The *only* assumption that we want client to make of this API is that a
compiler thread can be suspended at this point without blocking other
non-compiler threads in the same process, or blocking other compiler
threads associated with a *different* LLVM context.

There will be no assumption about the state of the suspended thread’s LLVM
context and no assumption that other threads in the same context will
continue executing (if there were such a thing).

- A single LLVM library call takes longer to complete than the runtime is
willing to wait for the thread stoppage to occur.

I’m not sure what you mean by this.

Both are entirely reasonable; I'm just spelling them out since they might
not be obvious to everyone. The second is particularly noteworthy since
it's entirely new in LLVM.

I largely agree with Andy at the moment that the existing interface
assumes one thread per context. I don't see the issue with continuing with
this assumption, particular since no one t.m.k. has put forward any serious
plans to change this. If this does happen, having it occur in an opt-in
manner will be a matter of practical necessity.

I don’t think this API makes any assumption about LLVM’s threading model.
I don’t even see how this API assumes one thread per context. However, I
will concede that it does if it helps focus this discussion. Either way,
the JIT should determine the threading model and be given a C API that
allows configuring and scheduling those threads. As you correctly said, if
we hypothetically implement a parallel pass manager mode that breaks some
JITs, then clients will need to opt-in. We are not beholden to implement
this callback in that mode. We just need to make that clear.

I find myself somewhat uncertain of the choice to leverage the pass
manager for this purpose though. It's the entirely logical place to start,
but is it enough? What if a single pass takes too long to complete? Do we
start scattering these call backs throughout large parts of LLVM? Also,
how does this interact with something like the hoped for parallel LTO?

Like Chandler, I believe there needs to be a fully thought out proposal
and discussion on this list. I would weakly oppose the change in it's
current form solely on this basis.

Would you (or anyone) oppose a simple maySuspendContext() callback API? It
would mean nothing more than the thread(s) for a given LLVM context can be
suspended independent from other contexts.

I think this is the right approach. So a given thread hits a safe point, it
optionally calls a "suspend check" or "i an safe to suspend right now"
callback if set. It doesn't stop other threads, it doesn't continue until
the function returns.

If you want to stop all threads then the user callback may contain a
barrier and count down how many threads have stopped until it sees all of
them.

Nick

Sounds good. Lets get started by nailing down the C API and semantics for this first.

I mirrored the C API for the LLVM context diagnostic handler and used Andy’s suggested name for the callback.
The opaque handle was suggested by Duncan and can provide optional user specified information that is
provided back during the callback (i.e. barrier, etc).

Cheers,
Juergen

Core.h:
typedef void (*LLVMMaySuspendCallback)(LLVMContextRef, void *);

/**

  • Set the may-suspend callback function for this context.

Given the use case (user mode scheduling), I'm not going to oppose this proposal. I would like to see a couple of things clarified documentation wise:
- When is this interface valid? (i.e. the single thread case)
- If a context does have multiple threads, is this called once per thread? Or once per thread group after internal coordination? (you can write this out of scope if desired)
- If we later introduce multiple threads, and this mechanism doesn't support it, what will happen? Will the function just not be called?
- You hint at this already, but clarifying the state of the current context at a suspend point would be helpful.

Here's a possible draft that includes the above:
The may-suspend callback function may be called by LLVM to transfer control back to the client that invoked the LLVM compilation. This can be used to yield control of the thread, or perform periodic work needed by the client. There is no guaranteed frequency at which callbacks must occur; in fact, the client is not guaranteed to ever receive this callback. It is at the sole discretion of LLVM to do so and only if it can guarantee that suspending the thread won't block any forward progress in other LLVM contexts.

At a suspend point, the state of the current LLVM context is intentionally undefined. No assumptions about it can or should be made. In particular, call backs into the context are not supported until the suspend function returns control to LLVM. Other LLVM contexts are unaffected.

Currently, LLVM assumes one thread per LLVM context. If, or when, we introduce multiple threads, this interface will not be available for contexts which opt-in to the thread pool model. We may extend this interface at a later time to support thread pools, but for the moment, that use case is explicitly unsupported.

p.s. Bikeshed wise, might "yield" be a better term than "suspend" here?

Philip

Having the use case up front would have been very useful. I had my own set of assumptions about what you were doing - gc safepoints - which turned out to be inaccurate. Can I ask that you make an effort to spell out the intended use case in future proposals? This has happened a couple of times now and it’s wasting both my time and yours. I’ll try to ask more explicitly as well, rather than make my own assumptions about what you’re doing. Thanks, Philip

If we're going in this direction, you might want to back out the pass
callback changes sooner rather than later...

I generally like the API being proposed, especially structurally.

Strongly agreed. I have no problem with a separate pass progress mechanism btw, but it should be clearly separate. Philip

Great.

Correct. We should avoid mentioning multi-thread contexts in the API docs. It is very misleading to describe a feature that LLVM does not support. We can instead add a statement to the docs explaining that the callback is only synchronous with respect to the calling thread and places no guarantee on the state of other threads, regardless of their context.

I personally like calling it “yield” because it is more intuitive and describes the use case. I proposed maySuspend because I wanted to be accurate. It is really the client deciding what to do with the callback. LLVM should make no assumption that it’s actually yielding.

Chandler likes “yield” too, so lets go with that unless anyone else wants to weigh in.

On the commits list, Juergen introduced our current use case, along with a couple other future use cases for this API. I’m sorry I neglected to clearly reiterate our usage, but when it comes to documenting the C API I intentionally try not to limit its potential to a narrow use case.

-Andy

The addition about the undefined state of the current context is a very good point. Although I wouldn’t like to shut the door completely and allow for future extensions to relax this constraint for APIs that explicitly state that they are safe to use during a yield callback.

I agree with Andy that we shouldn’t mention multi-threaded context at all in the API docs, because we are not even supporting it right now. I think this feature should be opt-in anyways, so extending the API documentation once we support it should be sufficient.

What I would like to see of course is that we would still be able to support this callback for a multi-threaded context/pass manager design in such a way, that it is possible to suspend any thread at a suspension point and it won’t block any other thread even in the same context. But I don’t want to write this in stone now. This gives us all the flexibility we need for developing a good concurrent design.

Having the ability to dynamically add and remove threads from an active compilation would be a nice feature that we should try to incorporate when possible and that would work nicely with the yield callback.

Anyways, I am digressing to much. As I said before we should only specify the multi-threaded context constrains in the API doc once they are actually available. For now this is simply undefined behavior and not even supported.

I extended the API with Philip’s suggestions and updated the doc. Please ignore proper 80col formatting for now :wink:

Does this look good to everyone?

-Juergen

Core.h:
typedef void (*LLVMYieldCallback)(LLVMContextRef, void *);

/**

  • Set the yield callback function for this context.

LGTM

Tiny nit:

Slight nitpick: control of what?

Your comment should read something like “transfer control of the current thread” or something similar.

Otherwise, LGTM.

Philip