[RFC]? time-trace and offloading

The -ftime-trace option in clang writes a trace of the compilation as a json file for each compilation unit that it processed (e.g. for each object file it produces).

This is a simple model and tools like ClangBuildAnalyzer or ninjatracing build on this. However when using an offloading model (OpenMP, HIP, CUDA, SYCL…) this promise of one produced object file → one trace file is broken.

In offloading mode, the clang driver transparently invokes the compiler (clang -cc1) multiple times (typically once for each offload target, i.e. GPU architecture) when compiling a single source file to an object file -ftime-trace currently is implemented in the compiler, the driver only provides the path where the object will be stored, therefore each sub-compilation would get its own trace. (I say would because since ⚙ D150282 [Driver] -ftime-trace: derive trace file names from -o and -dumpdir disabled -ftime-trace is currently disabled for offloading).

Now for the RFC part: ideally using -ftime-trace with offloading would be as easy for users as without offloading, i.e. a single trace should be produced that contains the compilation timings for all targets, but minimally the trace files should contain enough information to map back to object files and targets.

Possible approaches

  1. Use an IPC mechanism to share the time trace context between the compilation jobs and the driver.

  2. Merge the traces in the driver.

  3. Don’t merge, just extend the files with information that maps it to object files.

  4. Is complicated, but Perfetto could make it easier as it has its own serialization format already, (⚙ D82994 [RFC] Instrumenting Clang/LLVM with Perfetto attempted to add perfetto, but it was abandoned.


I would suggest two emit one trace per target, similar to our -save-temps output. For all but the default target, add .<target> to the filename.
Users can merge trace files themselves if they want to, but often it is interesting to look at them in isolation anyway.

1 Like

Is this a driver or Clang option? How much effort does it take to enable time tracing for Flang offload?