Enable STATISTIC all the time again?

Right now, the LLVM Statistic class does not increment values if we are in non-debug mode, unless you define LLVM_ENABLE_STATS (which, sadly, is also not available from cmake).

Before i go and add it to cmake, i ran numbers.

I can’t find a single testcase, large or small, where enabling statistics all the time isn’t completely noise.

It looks like it was disabled in march 2013.
https://reviews.llvm.org/D486

The only discussion i can find in the same time period is around fastisel using a lot of per-instruction stats, and it causing some slow-down.

However, I can’t find any actual data/testcases in this discussion at all, and disabling stats globally is actually fairly annoying for performance work.
:frowning:

Does anyone have any testcases where it is actually slow that i can look at?

+Jan, the original contributor (from Chromium), in case he’s got any context that might be useful.

I think adding an option to cmake is fine, but it should definitely be off by default in release builds. If we make STATISTIC useful, then people will eventually put it on some hot path, and that will kill multi-threaded performance for users like ThinLTO, because multiple threads doing read-modify-write on the same memory is just slow.

I think adding an option to cmake is fine, but it should definitely be off
by default in release builds.

Why?

If we make STATISTIC useful, then people will eventually put it on some
hot path, and that will kill multi-threaded performance for users like
ThinLTO, because multiple threads doing read-modify-write on the same
memory is just slow.

With no offense meant: Can we please be driven by data?
This argument is "people will eventually do something silly and it will
make things slow for others". That is something that can be said about just
about anything in LLVM :). This is why we have code review, etc.
If your concern is the atomic increments, then is there a reason to not
have it do the counting as a runtime option instead of a compile time one?

The original thread w.r.t. fast isel: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130225/166934.html. Nadav cited 5% of specific part of isel, but I don’t know which benchmark. On my side, we were testing build times for PNaCl apps. The setup would have been an LTO-like build, timing just the backend – an odd mix of getting optimized bitcode then running LLC under -O0. A minor point, it also made the toolchain binaries a bit smaller (e.g., maybe 500KB out of 12MB?). But, that’s not a use case you’d need to worry about.

Or increment into TLS variables and sum those at the end. Looking at a
statistic before the end of execution is dodgy territory anyway, but I
can't think of a case where it would be saner if other threads are
poking at it simultaneously.

Tim.

Last time I measured 0.5%-1% slowdown with statistics enabled in a release build. Also see my last thread about this: https://groups.google.com/forum/#!topic/llvm-dev/xZVBNg5bsSk
We also had plans to push statistics (probably newly introduced one rather than retrofitting the existing ones) through the optimization remark system to allow some context like per-function statistics. But there is nothing concrete yet.

  • Matthias

Last time I measured 0.5%-1% slowdown with statistics enabled in a release build. Also see my last thread about this: https://groups.google.com/forum/#!topic/llvm-dev/xZVBNg5bsSk
We also had plans to push statistics (probably newly introduced one rather than retrofitting the existing ones) through the optimization remark system to allow some context like per-function statistics. But there is nothing concrete yet.

And just to elaborate, this originated form the observation that there are many cases where we issue a debug message, emit an optimization remark and also increment a statistic within a few lines of code. Having a single way of feeding all these use cases would be nice code-readability and usability improvement.

Adam

Access to TLS variables can easily be more expensive than any atomics...

Joerg

Hi,

Isn’t stats essentially a fire and forget type of data flow.
You are saying “increase this counter” but you don’t actually care what the value is at the time of the update.
Why can’t you have thread local stats, thus not requiring any locking, and then add all the thread local stats up at the end, once the threads have finished?

Lockless stats collection would be fast.

If you happened to need to collect stats during the run, pausing of threads is needed. I.e. like you have when sitting on a breakpoint.

Kind regards

James

Depending on the architecture, getting the thread base register involves
a special trap or other expensive magic. At least for a single-threaded
clang, that's much more expensive than an atomic if it can't be moved
early enough and amortized aggressively by the compiler.

Joerg