[RFC] Implement a way to inform the compiler that a function is a fiber suspend point

Problem

Userspace fiber libraries generally allow switching threads between the time a fiber suspends and resumes execution. This implies that a fiber may not resume on the same thread it was executing on before it was suspended.

This creates several issues related to TLS address computations where the TLS block address computed before a suspend point may not be valid after the fiber resumes due to the underlying thread having been switched, leading to crashes. At the moment LLVM has no way of supporting developers working on these kinds of programs (programs which use stackful fiber libraries) which means that they end up having to individually implement workarounds for their programs.

See:

There is precedent for such a feature, for example, MSVC has the /GT commandline option which makes it reload the TLS address before each load of a TLS variable.

Proposed Solution

In order to support programs of this kind, we want to be able to inform the compiler that a given function is a fiber suspend point and therefore any calls to this function must be handled specially.

We propose a new attribute __attribute__((user_fiber_suspendpoint)) for function definitions which will let the compiler know that any calls to the function may result in the fiber being suspended.

We also propose an extention to the SimplifyCFG pass which can linearize control flow in functions which contain calls to the suspend point in a way similar to how the CoroSplit pass works. This would ensure that fibers are treated in a manner similar to coroutines.

Example

Let’s assume we have the following function definition in C++:

__attribute__((user_fiber_suspendpoint)) void fiber_yield();

At the moment a program compiled with the function will look as follows:

@tls_foo = hidden thread_local global ptr null, align 8

declare void @_fiber_yeild()
declare void @do_stuff(ptr noundef)

; Function Attrs: mustprogress nounwind sspstrong uwtable
define hidden noundef i32 @_Z4testv() {
entry:
  %0 = call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @tls_foo)
  %1 = load ptr, ptr %0, align 8
  call void @do_stuff(ptr noundef %1)
  call void @fiber_yield();
  %3 = load ptr, ptr %1, align 8
  call void @_Z8do_stuffPv(ptr noundef %3)
  ret i32 0
}

declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull)

With what we’re proposing the function after SimplifyCFG would look as follows:

@tls_foo = hidden thread_local global ptr null, align 8

declare void @_fiber_yeild()
declare void @do_stuff(ptr noundef)

define hidden noundef i32 @test() {
entry:
  %0 = call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @tls_foo)
  %1 = load ptr, ptr %0, align 8
  call void @do_stuff(ptr noundef %1)
  call void @fiber_yield()  ; ----------------------- (2)
  call @_Z4testv_resume() ; ------------------------- (1)
  ret i32 0
}

; Function Attrs: noinline
define hidden noundef i32 @test_resume() #0 {
entry:
  %2 = call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @tls_foo)
  %3 = load ptr, ptr %2, align 8
  call void @do_stuff(ptr noundef %3)
  ret i32 0
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull)
  • (1) We need this call because, as others have previously pointed out, fibers aren’t visible to the compiler. Once might notice that this is somewhat similar to Returned-Continuation lowering but with an explicit call.
  • (2) Fiber libraries manage the stack on their own so we don’t need an ID for fibers like we do for coroutines.

FAQ

  • What about the concerns with this attribute being transitive and polluting the call stack?
    I think the transitive nature of the attribute is the correct model of the reality of these types of programs. The attribute may end up being propagated to every function in the callstack if said function makes use of TLS variables but we can potentially leave that to the user.