Profiling full builds with -ftime-trace and without Ninja

Hi all, first time poster, if I am hoping I put this in the right place. If not, please tell me where it goes, I’ll gladly put it there.

This pertains probably to developers with LARGE and slow to compile code bases as well as people working on optimizing LLVM.

I have seen the -ftime-trace feature to generate flame graphs to be consumed in chromium trace viewers. In the blogs I have read, there seems to be an aggregator in Ninja (search for SN systems clang profiling) so if you build with Ninja, then you can profile the entire project. What about the people with bootstrap.sh/cmake/other build systems?

I tried importing the whole project to Trace Compass (I am a developer on that project, and a big fan of all things LLVM). It went very very well.

I have put a 3 minute video on youtube on how to get it working, I really hope this is of interest to the group, I would love to see if we could collaborate on this.

I have several asks though.
1- llvm tracing is “cheating” by pre-agregating the events. This is good for looking at a single file, but it does not show concurrency issues. It also hides in a project why some files are faster (precompilation/caches)
2- It would be nice to add a metadata event to each trace identifying the build. That way we can easily compare two builds.

Hope to hear from you!

/Matthew

2 Likes

Welcome!
I’m glad the compilation speed topic got more attentions.

Can you elaborate a little bit more on this point? I don’t quite understand what does “pre-aggregate” here mean and I don’t quite follow the concurrency issues either.

True, though intuitively I will just put build id as part of the trace file name. Though I guess your motivation here is that after all the trace files are loaded into, let’s say Trace Compass, we still can distinguish traces from different builds.

Hi, I noticed a 12h difference, I suppose we’re in very different time zones. So I apologize for any latencies on my side.

A trace given from -ftime-trace is stored as a ‘flame graph’ in trace events. Basically it starts at time 0, and adds the total duration a function took. This means if I call foo 1000 times, and it took 1 ms per call, -ftime-trace would store foo() time = 1s. There could be many reasons this is done, honestly, it’s novel and I bet the original author’s needs were met by this. Also I am not certain chrome tracing has a flame graph built in, only a flame chart.

If instead of aggregating the data, llvm stored every relevant function entry and exit time stamp, one could aggregate it post-mortem to have the flame graph available again, but also see the detailed root cause more easily.

Some examples:

  • Foo() takes too long, is it called twice by mistake? The flame graph hides it.
  • Foo() is at the boundary size of a datastructure and needs to randomly resize.
  • Foo() iterates O(n^2) times instead of O(n) times
  • Foo() is behaving normally, it just takes time.

Flame Graphs cannot explain that, flame charts can give more insights.
I would recommend using B and E events instead of the X ones, even if you can use the initial timestamp on the duration. It takes more time, but you can seek to the beginning or end of a state. This gives more power to the troubleshooter.

Those are the advantages for a SINGLE trace. Let’s get into multiple traces.
With flame graphs:
they can be appended, that’s it. They lost the data showing the sequence of events, and the inferred dependencies.

With flame charts:

  • you can visually rebuild the execution graph, knowing the sequence of which files were built. This is very interesting (for me as a viz nerd) since you can really see if your make -j1273 worked well or not.
  • It can show when in the build did you hit your dominator?
  • If it’s at the beginning, maybe it’s compiling something that will be re-used.
  • If it’s at the end… maybe something slowed it down on the IO side?

This won’t answer the question, but it will bring you closer to an answer.

Basically more pertinent information is available to mine.

The goal is to run two builds, and then compare them to see if a change in the code/compiler had any impact on timing. I can see that as an excellent tool to prove performance improvements in the review cycle and communicating performance regressions too.

Thanks!
Matthew

PS, I’ve been doing performance analysis for a long time, this stuff is often not obvious and we have terrible ambiguous terminology, please feel free to ask for clarifications where need be, if you don’t get something I said, it’s my fault, not yours.

1 Like

Thank you:-) your explanation was really clear

1 Like