Difficulties with implementing JSON-formatted -ftime-report

I was looking at modifying clang to support outputting -ftime-report in JSON, and these are some of the findings I made that made this task considerably more complicated. If anyone has any input, please let me know.

When Clang outputs -ftime-report, there are two “Pass execution timing report” sections - apparently one for the frontend, and one for the backend:

$ bin/clang++ -o /dev/null -ftime-report ~/hello.cpp
===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0000 ( 40.9%)   0.0000 ( 40.9%)   0.0000 ( 38.9%)  AnnotationRemarksPass
   0.0000 ( 27.3%)   0.0000 ( 27.3%)   0.0000 ( 28.9%)  AlwaysInlinerPass
   0.0000 ( 18.2%)   0.0000 ( 18.2%)   0.0000 ( 20.0%)  CoroConditionalWrapper
   0.0000 ( 13.6%)   0.0000 ( 13.6%)   0.0000 ( 12.2%)  EntryExitInstrumenterPass
   0.0000 (100.0%)   0.0000 (100.0%)   0.0000 (100.0%)  Total

===-------------------------------------------------------------------------===
                        Analysis execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0000 ( 42.9%)   0.0000 ( 42.9%)   0.0000 ( 41.7%)  TargetLibraryAnalysis
   0.0000 ( 28.6%)   0.0000 ( 28.6%)   0.0000 ( 33.3%)  ProfileSummaryAnalysis
   0.0000 ( 28.6%)   0.0000 ( 28.6%)   0.0000 ( 25.0%)  InnerAnalysisManagerProxy<FunctionAnalysisManager, Module>
   0.0000 (100.0%)   0.0000 (100.0%)   0.0000 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0006 seconds (0.0006 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0002 ( 34.0%)   0.0000 (  0.0%)   0.0002 ( 31.9%)   0.0002 ( 32.1%)  Expand large div/rem
   0.0001 ( 21.0%)   0.0000 (  0.0%)   0.0001 ( 19.7%)   0.0001 ( 19.7%)  X86 DAG->DAG Instruction Selection
   0.0000 (  4.9%)   0.0000 ( 14.3%)   0.0000 (  5.5%)   0.0000 (  5.2%)  X86 Assembly Printer
   0.0000 (  4.2%)   0.0000 (  8.6%)   0.0000 (  4.4%)   0.0000 (  4.7%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0000 (  2.3%)   0.0000 (  2.9%)   0.0000 (  2.3%)   0.0000 (  2.5%)  Fast Register Allocator
   0.0000 (  1.3%)   0.0000 (  2.9%)   0.0000 (  1.4%)   0.0000 (  1.4%)  Post-RA pseudo instruction expansion pass
   0.0000 (  1.3%)   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  1.3%)  Exception handling preparation
   0.0000 (  1.3%)   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  1.2%)  Assignment Tracking Analysis
   0.0000 (  1.1%)   0.0000 (  5.7%)   0.0000 (  1.4%)   0.0000 (  1.2%)  Live DEBUG_VALUE analysis
   0.0000 (  0.9%)   0.0000 (  2.9%)   0.0000 (  1.1%)   0.0000 (  1.1%)  Two-Address instruction pass
   0.0000 (  0.9%)   0.0000 (  0.0%)   0.0000 (  0.9%)   0.0000 (  0.9%)  Finalize ISel and expand pseudo-instructions
   0.0000 (  0.8%)   0.0000 (  2.9%)   0.0000 (  0.9%)   0.0000 (  0.8%)  Check CFA info and insert CFI instructions if needed
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.7%)   0.0000 (  0.8%)  Scalarize Masked Memory Intrinsics
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.6%)  Free MachineFunction
   0.0000 (  0.6%)   0.0000 (  2.9%)   0.0000 (  0.7%)   0.0000 (  0.6%)  X86 pseudo instruction expansion pass
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.6%)  Expand large fp convert
   0.0000 (  0.8%)   0.0000 (  2.9%)   0.0000 (  0.9%)   0.0000 (  0.6%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.6%)  Remove unreachable blocks from the CFG
   0.0000 (  0.6%)   0.0000 (  2.9%)   0.0000 (  0.7%)   0.0000 (  0.5%)  Stack Frame Layout Analysis
   0.0000 (  0.6%)   0.0000 (  2.9%)   0.0000 (  0.7%)   0.0000 (  0.5%)  Local Stack Slot Allocation
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Expand Atomic instructions
   0.0000 (  0.6%)   0.0000 (  2.9%)   0.0000 (  0.7%)   0.0000 (  0.5%)  Insert stack protectors
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.5%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Expand reduction intrinsics
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.5%)  Lazy Machine Block Frequency Analysis #2
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Contiguously Lay Out Funclets
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.5%)  X86 Indirect Branch Tracking
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.5%)  Insert XRay ops
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.5%)  Bundle Machine CFG Edges
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.5%)  Argument Stack Rebase
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lower Garbage Collection Instructions
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Machine Sanitizer Binary Metadata
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Remove Loads Into Fake Uses
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 speculative load hardening
   0.0000 (  0.6%)   0.0000 (  2.9%)   0.0000 (  0.7%)   0.0000 (  0.4%)  X86 Insert Cache Prefetches
   0.0000 (  0.2%)   0.0000 (  2.9%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 EFLAGS copy lowering
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Compressing EVEX instrs when possible
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.4%)  X86 Speculative Execution Side Effect Suppression
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 Indirect Thunks
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Fast Tile Register Preconfigure
   0.0000 (  0.2%)   0.0000 (  2.9%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Shadow Stack GC Lowering
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Prepare callbr
   0.0000 (  0.4%)   0.0000 (  5.7%)   0.0000 (  0.7%)   0.0000 (  0.4%)  Machine Optimization Remark Emitter
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Fixup Statepoint Caller Saved
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Remove Redundant DEBUG_VALUE analysis
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Pseudo Probe Inserter
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Expand indirectbr instructions
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Lower Tile Copy
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Unpack machine instruction bundles
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Return Thunks
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lower AMX intrinsics
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lazy Machine Block Frequency Analysis #3
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Machine Optimization Remark Emitter #3
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Load Value Injection (LVI) Ret-Hardening
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lower AMX type for load/store
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 DynAlloca Expander
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Safe Stack instrumentation pass
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Fast Tile Register Configure
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 FP Stackifier
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lazy Machine Block Frequency Analysis
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Insert KCFI indirect call checks
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Insert fentry calls
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Implement the 'patchable-function' attribute
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 vzeroupper inserter
   0.0000 (  0.4%)   0.0000 (  2.9%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 Discriminate Memory Operands
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  StackMap Liveness Analysis
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.3%)  X86 insert wait instruction
   0.0000 (  0.4%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.3%)  Machine Optimization Remark Emitter #2
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Machine Module Information
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Machine Branch Probability Analysis
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Assumption Cache Tracker
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Target Transform Information
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Create Garbage Collector Module Metadata
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Target Library Information
   0.0000 (  0.2%)   0.0000 (  2.9%)   0.0000 (  0.4%)   0.0000 (  0.2%)  Target Pass Configuration
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Profile summary info
   0.0005 (100.0%)   0.0000 (100.0%)   0.0006 (100.0%)   0.0006 (100.0%)  Total

===-------------------------------------------------------------------------===
                               Clang time report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.2529 seconds (0.2528 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.2155 ( 97.5%)   0.0304 ( 95.7%)   0.2459 ( 97.2%)   0.2458 ( 97.2%)  Front end
   0.0047 (  2.1%)   0.0013 (  4.1%)   0.0060 (  2.4%)   0.0060 (  2.4%)  LLVM IR generation
   0.0009 (  0.4%)   0.0001 (  0.2%)   0.0010 (  0.4%)   0.0010 (  0.4%)  Machine code generation
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Optimizer
   0.2212 (100.0%)   0.0318 (100.0%)   0.2529 (100.0%)   0.2528 (100.0%)  Total

@MaskRay wrote an excellent blog post about the implementation, but I’ll summarize it here. The first two sections (the first pass execution timing report and analysis execution timing report) are printed in EmitAssemblyHelper::RunOptimizationPipeline(...) when the created StandardInstrumentations object gets destroyed, while the last two sections are printed much higher up in the call stack in cc1_main:

  // If any timers were active but haven't been destroyed yet, print their
  // results now.  This happens in -disable-free mode.
  llvm::TimerGroup::printAll(llvm::errs());
  llvm::TimerGroup::clearAll();

This causes a few problems:

  1. I don’t know if there being two “Pass execution timing report” sections is intentional. The comment in the code block above suggests that it was meant to be cleanup code, so pass execution timing report was either unintentionally split or intentionally split (albeit via a somewhat hacky way).
  2. Implementing JSON output for this is difficult because the code printing first two report sections are deep in the call stack while the code printing the remaining sections are shallow. To output JSON, it’s necessary to print all counters at the same time to manage commas, curly braces, etc. Also, observe that StandardInstrumentations deletes all the counters after it prints them, so modifications will be needed there.
  3. LLVM has the flag -info-output-file which is supposed to contain the output of all the timers. However, that file only contains the timers for the optimization passes. Again, I’m not sure whether this is a bug or intended behavior.
bin/clang++ -o /dev/null -ftime-report ~/hello.cpp  -mllvm -info-output-file=/tmp/out.txt
===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0006 seconds (0.0006 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0002 ( 47.0%)   0.0000 (  0.0%)   0.0002 ( 33.0%)   0.0002 ( 33.0%)  Expand large div/rem
   0.0001 ( 26.9%)   0.0000 (  0.0%)   0.0001 ( 18.9%)   0.0001 ( 19.0%)  X86 DAG->DAG Instruction Selection
   0.0000 (  0.0%)   0.0000 ( 17.9%)   0.0000 (  5.3%)   0.0000 (  5.3%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0000 (  0.0%)   0.0000 ( 17.9%)   0.0000 (  5.3%)   0.0000 (  5.2%)  X86 Assembly Printer
   0.0000 (  3.5%)   0.0000 (  0.0%)   0.0000 (  2.5%)   0.0000 (  2.6%)  Fast Register Allocator
   0.0000 (  0.0%)   0.0000 (  4.2%)   0.0000 (  1.2%)   0.0000 (  1.3%)  Post-RA pseudo instruction expansion pass
   0.0000 (  1.8%)   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  1.3%)  Assignment Tracking Analysis
   0.0000 (  1.5%)   0.0000 (  0.0%)   0.0000 (  1.1%)   0.0000 (  1.1%)  Two-Address instruction pass
   0.0000 (  0.0%)   0.0000 (  4.2%)   0.0000 (  1.2%)   0.0000 (  1.1%)  Live DEBUG_VALUE analysis
   0.0000 (  1.5%)   0.0000 (  0.0%)   0.0000 (  1.1%)   0.0000 (  0.9%)  Finalize ISel and expand pseudo-instructions
   0.0000 (  0.0%)   0.0000 (  3.0%)   0.0000 (  0.9%)   0.0000 (  0.8%)  Check CFA info and insert CFI instructions if needed
   0.0000 (  1.0%)   0.0000 (  0.0%)   0.0000 (  0.7%)   0.0000 (  0.7%)  Eliminate PHI nodes for register allocation
   0.0000 (  1.0%)   0.0000 (  0.0%)   0.0000 (  0.7%)   0.0000 (  0.7%)  Exception handling preparation
   0.0000 (  0.0%)   0.0000 (  2.4%)   0.0000 (  0.7%)   0.0000 (  0.7%)  Free MachineFunction
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.6%)  Expand Atomic instructions
   0.0000 (  1.0%)   0.0000 (  0.0%)   0.0000 (  0.7%)   0.0000 (  0.6%)  Scalarize Masked Memory Intrinsics
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Expand large fp convert
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.5%)  Remove Redundant DEBUG_VALUE analysis
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Expand indirectbr instructions
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Remove unreachable blocks from the CFG
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Argument Stack Rebase
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Insert stack protectors
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.5%)  Contiguously Lay Out Funclets
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Stack Frame Layout Analysis
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.5%)  X86 Indirect Thunks
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.5%)  Fixup Statepoint Caller Saved
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.5%)  X86 FP Stackifier
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.5%)  X86 pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.5%)  Insert KCFI indirect call checks
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.5%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.5%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.5%)  X86 speculative load hardening
   0.0000 (  0.5%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.5%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.5%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lower AMX type for load/store
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lazy Machine Block Frequency Analysis #3
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Speculative Execution Side Effect Suppression
   0.0000 (  0.5%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Shadow Stack GC Lowering
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lazy Machine Block Frequency Analysis #2
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Machine Sanitizer Binary Metadata
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Expand reduction intrinsics
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Insert Cache Prefetches
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Discriminate Memory Operands
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Compressing EVEX instrs when possible
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Insert XRay ops
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Pseudo Probe Inserter
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 DynAlloca Expander
   0.0000 (  0.5%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Safe Stack instrumentation pass
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Machine Optimization Remark Emitter
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 Load Value Injection (LVI) Ret-Hardening
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 Lower Tile Copy
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Return Thunks
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Fast Tile Register Preconfigure
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 EFLAGS copy lowering
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Machine Optimization Remark Emitter #2
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lazy Machine Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  StackMap Liveness Analysis
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Remove Loads Into Fake Uses
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 insert wait instruction
   0.0000 (  0.5%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Lower AMX intrinsics
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Machine Optimization Remark Emitter #3
   0.0000 (  0.0%)   0.0000 (  1.8%)   0.0000 (  0.5%)   0.0000 (  0.4%)  X86 vzeroupper inserter
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  X86 Indirect Branch Tracking
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Implement the 'patchable-function' attribute
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Unpack machine instruction bundles
   0.0000 (  0.5%)   0.0000 (  0.0%)   0.0000 (  0.4%)   0.0000 (  0.3%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  1.2%)   0.0000 (  0.4%)   0.0000 (  0.3%)  Insert fentry calls
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.3%)  Prepare callbr
   0.0000 (  0.8%)   0.0000 (  0.0%)   0.0000 (  0.5%)   0.0000 (  0.3%)  Fast Tile Register Configure
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.6%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Create Garbage Collector Module Metadata
   0.0004 (100.0%)   0.0002 (100.0%)   0.0006 (100.0%)   0.0006 (100.0%)  Total

===-------------------------------------------------------------------------===
                               Clang time report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.2496 seconds (0.2493 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.2072 ( 97.2%)   0.0354 ( 97.3%)   0.2426 ( 97.2%)   0.2424 ( 97.2%)  Front end
   0.0054 (  2.5%)   0.0006 (  1.7%)   0.0060 (  2.4%)   0.0059 (  2.4%)  LLVM IR generation
   0.0006 (  0.3%)   0.0004 (  1.0%)   0.0010 (  0.4%)   0.0010 (  0.4%)  Machine code generation
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Optimizer
   0.2132 (100.0%)   0.0364 (100.0%)   0.2496 (100.0%)   0.2493 (100.0%)  Total

$ cat /tmp/out.txt
===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0000 ( 36.4%)   0.0000 ( 36.4%)   0.0000 ( 35.2%)  AnnotationRemarksPass
   0.0000 ( 31.8%)   0.0000 ( 31.8%)   0.0000 ( 31.8%)  AlwaysInlinerPass
   0.0000 ( 22.7%)   0.0000 ( 22.7%)   0.0000 ( 22.7%)  CoroConditionalWrapper
   0.0000 (  9.1%)   0.0000 (  9.1%)   0.0000 ( 10.2%)  EntryExitInstrumenterPass
   0.0000 (100.0%)   0.0000 (100.0%)   0.0000 (100.0%)  Total

===-------------------------------------------------------------------------===
                        Analysis execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0000 ( 40.0%)   0.0000 ( 40.0%)   0.0000 ( 37.5%)  ProfileSummaryAnalysis
   0.0000 ( 40.0%)   0.0000 ( 40.0%)   0.0000 ( 33.3%)  TargetLibraryAnalysis
   0.0000 ( 20.0%)   0.0000 ( 20.0%)   0.0000 ( 29.2%)  InnerAnalysisManagerProxy<FunctionAnalysisManager, Module>
   0.0000 (100.0%)   0.0000 (100.0%)   0.0000 (100.0%)  Total

One “Pass execution timing report” section is for the legacy pass manager (part of codegen pipeline) while the other is for the new pass manager (optimization pipeline).

// for legacy pass manager
PassTimingInfo::PassTimingInfo() : TG("pass", "Pass execution timing report") {}

// for new pass manager
TimePassesHandler::TimePassesHandler(bool Enabled, bool PerRun)
    : PassTG("pass", "Pass execution timing report"),
      AnalysisTG("analysis", "Analysis execution timing report"),
      Enabled(Enabled), PerRun(PerRun) {}

I agree that the current output is confusing. I actually wanted to rename the old one to something like “Pass execution for codegen” but realized that the legacy pass manager has not been entirely removed for optimization purposes
. The new pass manager doc specifies:

The legacy PM somewhat works with the optimization pipeline, but this is deprecated and there are ongoing efforts to remove its usage.

This acknowledges that the legacy pass manager hasn’t been entirely disallowed for the optimization pipeline.
The code structure doesn’t make it easy to rename the description only for Clang usage.


The textual output of -ftime-report is convenient.
However, a large limitation of the -ftime-report system is that it doesn’t support nested timers.
While adding more timer groups could be a workaround, the output does not indicate any hierarchy, making the interpretation difficult.

Ideally, -ftime-report and -ftime-trace should share the same underlying measurement code. Perhaps -ftime-trace could be enhanced to include pass information in its output.

Looking at other timers in the LLVM optimization/codegen pipelines, we should make the new PM’s timers use llvm::TimerGlobals via NamedRegionTimer rather than own its timers. That should take care of this.

(I think opt will need to manually call llvm::TimerGroup::printAll() to keep some opt -time-passes tests passing)

I don’t think NamedRegionTimer is the right class to use - NamedRegionTimer works on the RAII idiom; it starts the timer when it’s created and ends when it’s destroyed. This isn’t compatible with the new pass manager, which registers callbacks that start and stop the appropriate timer. However, replacing TimePassesHandler’s two TimerGroup. with TimerGlobals looks like the right thing to do. I’ll also need to do this for the legacy pass manager, as it uses PassTimingInfo which itself has a TimerGroup