Use of thread local storage in Timer

One of the next things I wanted to cleanup is the use of Thread local storage. LLVM already provides a ThreadLocal class, so simialr to how I cleaned up the DynamicLibrary stuff to use LLVM’s DynamicLibrary, I wanted to do the same with ThreadLocal.

There’s a catch though. Timer makes use of pthreads’ thread-local storage destructors, and this isn’t supported on all platforms. I’m wondering how critical is this really?

It looks like the thread local storage entry is created in Timer::Initialize(), and the only place in the code that Timer::Initialize() is called is from lldb_private::Initialize. Based on this, I actually think that maybe the fact that it’s using TLS at all is just a historical carry-over and not actually necessary anymore since it’s only being used from a single thread. If this is true, there’s no reason to even use TLS, I could just make a static member and cleanup with a call to Timer::Shutdown().

This would solve the portability issue since TLS destructors would no longer be needed, and making porting all of our TLS code to use llvm support libraries very easy.

Does this assessment seem correct? Or am I missing something?

One of the next things I wanted to cleanup is the use of Thread local storage. LLVM already provides a ThreadLocal class, so simialr to how I cleaned up the DynamicLibrary stuff to use LLVM's DynamicLibrary, I wanted to do the same with ThreadLocal.

There's a catch though. Timer makes use of pthreads' thread-local storage destructors, and this isn't supported on all platforms. I'm wondering how critical is this really?

It looks like the thread local storage entry is created in Timer::Initialize(), and the only place in the code that Timer::Initialize() is called is from lldb_private::Initialize.

Not correct. We create the pthread TLS key (call it X here) in Timer::Initialize() so that each thread can say "give me the TLS for key X" and get a TLS pointer so it can store a thread specific TimerStack pointer.

Based on this, I actually think that maybe the fact that it's using TLS at all is just a historical carry-over and not actually necessary anymore since it's only being used from a single thread.

The Timer class makes a "TimerStack" for each thread and stores the pointer to this in the TLS.

If this is true, there's no reason to even use TLS, I could just make a static member and cleanup with a call to Timer::Shutdown().

You will leak a TimerStack per thread if you don't somehow delete this per thread object.

This would solve the portability issue since TLS destructors would no longer be needed, and making porting all of our TLS code to use llvm support libraries very easy.

Does this assessment seem correct? Or am I missing something?

The idea is you can place a "Timer" object anywhere in code and it can some profiling by timing how long it takes. If another function makes another "Timer" object, it will push itself on the stack, stop the previous timer, wait until it destroys itself and pop itself back off the stack, keep a long running total time spent doing the task, and then reactivate the timer above it.

So the TLS destructor are currently required. If TLS destructors aren't supported on the platform, then this class should be disabled for that platform (hollow it out and have it do nothing for platforms that don't support it).

Greg

It sounds like you're talking about getting a function's exclusive time?
As in, if function A calls B, and you create a timer in both A and B, then
the timer in A will compute timeof(A) - timeof(B)? If not, and if it's
sufficient to just time functions from start to end along with all the
child functions that execute, then llvm also has its own timer class that
we could use. See llvm/Support/Timer.h.

As for the TLS destructors, it seems this is only necessary in the case
that a thread is cancelled? Because if a thread routine runs to
completion, we could just call Timer::CleanupThreadSpecificData() or
something. Could we use pthread_cleanup_cancel() for the same purpose, and
then have our thread trampoline cleanup any timer-specific TLS data before
it returns to catch the non-cancellation case?

My main concern is that I don't want to expose a generic thread-local class
that supports TLS destructors if TLS destructors aren't generic enough to
be available on all platforms. It makes it too easy to make use of them in
generic code, breaking other platforms.

This would solve the portability issue since TLS destructors would no longer be needed, and making porting all of our TLS code to use llvm support libraries very easy.

Does this assessment seem correct? Or am I missing something?

The idea is you can place a "Timer" object anywhere in code and it can some profiling by timing how long it takes. If another function makes another "Timer" object, it will push itself on the stack, stop the previous timer, wait until it destroys itself and pop itself back off the stack, keep a long running total time spent doing the task, and then reactivate the timer above it.

So the TLS destructor are currently required. If TLS destructors aren't supported on the platform, then this class should be disabled for that platform (hollow it out and have it do nothing for platforms that don't support it).

Greg

It sounds like you're talking about getting a function's exclusive time? As in, if function A calls B, and you create a timer in both A and B, then the timer in A will compute timeof(A) - timeof(B)?

It is timing from Timer construction time, to Timer destruction time, and keeping a global map of all timer times using the constructors first string value as a ConstString as the key. It does this on from all threads.

If not, and if it's sufficient to just time functions from start to end along with all the child functions that execute, then llvm also has its own timer class that we could use. See llvm/Support/Timer.h.

That just implements timing, not aggregation and history of total times.

As for the TLS destructors, it seems this is only necessary in the case that a thread is cancelled?

Why? You create a thread, it call some function that invokes a timer:

int foo()
{
   {
       Timer a("parse something within foo", ...);
       bar();
   }

   {
       Timer a("parse something else within foo", ...);
       bar();
   }
}

Then in bar you have:

int bar()
{
   Timer a("::bar()", ...);
}

Any time a timer is constructed, it needs to get the TimerStack for the current thread. It then pushes itself onto that stack and stops any other timers already on that stack. When it destructs, it removes itself and restarts the timers above. Total time spent and time between timers is kept in a global repository where the key (the first string in the constructor) is associated with the total and partial times.

Because if a thread routine runs to completion, we could just call Timer::CleanupThreadSpecificData() or something.

Then we need to guarantee every thread is created using our Host::CreateThread() stuff so it can manage that. You can't guarantee that. On MacOSX, somehow libdispatch() might get used and we don't control the creation of those threads and they can call LLDB functions with contain Timer objects.

Could we use pthread_cleanup_cancel() for the same purpose, and then have our thread trampoline cleanup any timer-specific TLS data before it returns to catch the non-cancellation case?

I don't care how it happens, but the cleanup of deleting the TimerStack for each thread must happen for any thread regardless of how it was started (by us using Host::CreateThread(), or by the OS for any other reason).

My main concern is that I don't want to expose a generic thread-local class that supports TLS destructors if TLS destructors aren't generic enough to be available on all platforms. It makes it too easy to make use of them in generic code, breaking other platforms.

How can you have thread specific data on ANY platform without being able to clean it up? Are we expected to only store integers and POD types in the TLS data? That would surprise me very much if that is the case.

Greg

No, what I’m saying is that if the thread routine returns normally and is not cancelled, you can just explicitly call the “destructor” at the end of the thread routine. You don’t need the OS to do it for you.

Anyway, you mentioned that it needs to work for threads other than those created explicitly by us, so this approach doesn’t work regardless.

I can't say Im' familiar with things like libdispatch or many other
posix'isms, but I will say that on Windows, you just wouldn't create
TLS-data on a thread which you don't own the thread routine for. And
there's also no thread cancellation. So basically, at least on Windows,
you would just always clean up everything at the end of your thread routine.