Profiling full builds with -ftime-trace and without Ninja

Hi, I noticed a 12h difference, I suppose we’re in very different time zones. So I apologize for any latencies on my side.

A trace given from -ftime-trace is stored as a ‘flame graph’ in trace events. Basically it starts at time 0, and adds the total duration a function took. This means if I call foo 1000 times, and it took 1 ms per call, -ftime-trace would store foo() time = 1s. There could be many reasons this is done, honestly, it’s novel and I bet the original author’s needs were met by this. Also I am not certain chrome tracing has a flame graph built in, only a flame chart.

If instead of aggregating the data, llvm stored every relevant function entry and exit time stamp, one could aggregate it post-mortem to have the flame graph available again, but also see the detailed root cause more easily.

Some examples:

  • Foo() takes too long, is it called twice by mistake? The flame graph hides it.
  • Foo() is at the boundary size of a datastructure and needs to randomly resize.
  • Foo() iterates O(n^2) times instead of O(n) times
  • Foo() is behaving normally, it just takes time.

Flame Graphs cannot explain that, flame charts can give more insights.
I would recommend using B and E events instead of the X ones, even if you can use the initial timestamp on the duration. It takes more time, but you can seek to the beginning or end of a state. This gives more power to the troubleshooter.

Those are the advantages for a SINGLE trace. Let’s get into multiple traces.
With flame graphs:
they can be appended, that’s it. They lost the data showing the sequence of events, and the inferred dependencies.

With flame charts:

  • you can visually rebuild the execution graph, knowing the sequence of which files were built. This is very interesting (for me as a viz nerd) since you can really see if your make -j1273 worked well or not.
  • It can show when in the build did you hit your dominator?
  • If it’s at the beginning, maybe it’s compiling something that will be re-used.
  • If it’s at the end… maybe something slowed it down on the IO side?

This won’t answer the question, but it will bring you closer to an answer.

Basically more pertinent information is available to mine.

The goal is to run two builds, and then compare them to see if a change in the code/compiler had any impact on timing. I can see that as an excellent tool to prove performance improvements in the review cycle and communicating performance regressions too.

Thanks!
Matthew

PS, I’ve been doing performance analysis for a long time, this stuff is often not obvious and we have terrible ambiguous terminology, please feel free to ask for clarifications where need be, if you don’t get something I said, it’s my fault, not yours.

1 Like