Seeking Guidance on Profiling Clang for Performance Bottlenecks

Hello,

I am reaching out for assistance in profiling Clang to identify and address potential performance bottlenecks. I have explored several profiling options, but each comes with its own set of challenges.

  1. -ftime-report and -ftime-trace options:
  • Using the -ftime-report option provided some insights, but the lack of linking to the source code or calls from the program made it less effective for my purposes.
  • I experimented with the -ftime-trace option, which offered better visualization, but it still didn’t provide the detailed information I was looking for.
  1. Perf
  • While using perf, it seems that there are no functions taking enough time to be considered ‘slow.’ The “Self” column in the perf output consistently showed 0.0x%, even for significant functions. Is this what it’s supposed to look like?
    Part of output of perf report:
Children      Self  Command   Shared Object                            Symbol
+   92,39%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::ParseAST(clang::Sema&, bool, bool)                                                                           ◆
+   92,37%     0,00%  clang-18  libclangFrontend.so.18git                [.] clang::ASTFrontendAction::ExecuteAction()                                                                           ▒
+   92,30%     0,00%  clang-18  libclangCodeGen.so.18git                 [.] clang::CodeGenAction::ExecuteAction()                                                                               ▒
+   92,27%     0,00%  clang-18  libclangFrontend.so.18git                [.] clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)                                                      ▒
+   92,27%     0,00%  clang-18  libclangFrontend.so.18git                [.] clang::FrontendAction::Execute()                                                                                    ▒
+   92,22%     0,00%  clang-18  clang-18                                 [.] cc1_main(llvm::ArrayRef<char const*>, char const*, void*)                                                           ▒
+   92,22%     0,00%  clang-18  libclangFrontendTool.so.18git            [.] 0x00007fed7ee90bb5                                                                                                  ▒
+   92,18%     0,00%  clang-18  clang-18                                 [.] ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&)                                       ▒
+   92,12%     0,00%  clang-18  clang-18                                 [.] clang_main(int, char**, llvm::ToolContext const&)                                                                   ▒
+   92,08%     0,00%  clang-18  clang-18                                 [.] main                                                                                                                ▒
+   92,03%     0,00%  clang-18  libc.so.6                                [.] __libc_start_call_main                                                                                              ▒
+   90,77%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&)           ▒
+   82,01%     0,01%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*)▒
+   75,72%     0,01%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedAttributes&, clang::P▒
+   74,97%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation)              ▒
+   74,76%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseInnerNamespace(llvm::SmallVector<clang::Parser::InnerNamespaceInfo, 4u> const&, unsigned int, cl▒
+   58,36%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseSingleDeclarationAfterTemplate(clang::DeclaratorContext, clang::Parser::ParsedTemplateInfo const▒
+   56,63%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseTemplateDeclarationOrSpecialization(clang::DeclaratorContext, clang::SourceLocation&, clang::Par▒
+   56,17%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseDeclarationStartingWithTemplate(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedA▒
+   41,12%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&)                                 ▒
+   40,24%     0,00%  clang-18  libclangParse.so.18git                   [.] clang::Parser::ParseCompoundStatementBody(bool)                                            
  1. Intel Vtune:
    Vtune seemed promising, especially after resolving compilation time issues by switching to a Release build with debug information. However, when trying to use Vtune to profile Clang, it consumes all system resources and crashes. This issue is specific to this analysis, as Vtune works well with simple programs compiled with Clang/gcc.

I am currently running Ubuntu 22.04.3 LTS on a machine with 24GB of RAM. The compilation time for a basic “Hello World” program in Debug build was excessively high (around 8 seconds), prompting the move to a Release build. However, the challenges with Vtune persist.

image

I am seeking advice and suggestions from the community on the following:

  • Any alternative profiling tools or techniques that might provide more detailed insights into Clang’s performance.
  • Tips or recommendations on resolving the resource consumption and crashing issues when using Intel Vtune with Clang.

Thanks in advance.

Nowdays, I use perf to collect data, and then I create flamegraphs.
Check out Brendan Gregg’s talks on Youtube, like this one:

He also has a lot of useful “perf oneliners”:

Dont forget to keep the frame pointers though when compiling by using -fno-omit-frame-pointer.

2 Likes