Hello,
I am reaching out for assistance in profiling Clang to identify and address potential performance bottlenecks. I have explored several profiling options, but each comes with its own set of challenges.
-ftime-report
and-ftime-trace
options:
- Using the
-ftime-report
option provided some insights, but the lack of linking to the source code or calls from the program made it less effective for my purposes. - I experimented with the
-ftime-trace
option, which offered better visualization, but it still didnât provide the detailed information I was looking for.
- Perf
- While using
perf
, it seems that there are no functions taking enough time to be considered âslow.â The âSelfâ column in the perf output consistently showed 0.0x%, even for significant functions. Is this what itâs supposed to look like?
Part of output ofperf report
:
Children Self Command Shared Object Symbol
+ 92,39% 0,00% clang-18 libclangParse.so.18git [.] clang::ParseAST(clang::Sema&, bool, bool) â
+ 92,37% 0,00% clang-18 libclangFrontend.so.18git [.] clang::ASTFrontendAction::ExecuteAction() â
+ 92,30% 0,00% clang-18 libclangCodeGen.so.18git [.] clang::CodeGenAction::ExecuteAction() â
+ 92,27% 0,00% clang-18 libclangFrontend.so.18git [.] clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) â
+ 92,27% 0,00% clang-18 libclangFrontend.so.18git [.] clang::FrontendAction::Execute() â
+ 92,22% 0,00% clang-18 clang-18 [.] cc1_main(llvm::ArrayRef<char const*>, char const*, void*) â
+ 92,22% 0,00% clang-18 libclangFrontendTool.so.18git [.] 0x00007fed7ee90bb5 â
+ 92,18% 0,00% clang-18 clang-18 [.] ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) â
+ 92,12% 0,00% clang-18 clang-18 [.] clang_main(int, char**, llvm::ToolContext const&) â
+ 92,08% 0,00% clang-18 clang-18 [.] main â
+ 92,03% 0,00% clang-18 libc.so.6 [.] __libc_start_call_main â
+ 90,77% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) â
+ 82,01% 0,01% clang-18 libclangParse.so.18git [.] clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*)â
+ 75,72% 0,01% clang-18 libclangParse.so.18git [.] clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedAttributes&, clang::Pâ
+ 74,97% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) â
+ 74,76% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseInnerNamespace(llvm::SmallVector<clang::Parser::InnerNamespaceInfo, 4u> const&, unsigned int, clâ
+ 58,36% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseSingleDeclarationAfterTemplate(clang::DeclaratorContext, clang::Parser::ParsedTemplateInfo constâ
+ 56,63% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseTemplateDeclarationOrSpecialization(clang::DeclaratorContext, clang::SourceLocation&, clang::Parâ
+ 56,17% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseDeclarationStartingWithTemplate(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedAâ
+ 41,12% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) â
+ 40,24% 0,00% clang-18 libclangParse.so.18git [.] clang::Parser::ParseCompoundStatementBody(bool)
- Intel Vtune:
Vtune seemed promising, especially after resolving compilation time issues by switching to a Release build with debug information. However, when trying to use Vtune to profile Clang, it consumes all system resources and crashes. This issue is specific to this analysis, as Vtune works well with simple programs compiled with Clang/gcc.
I am currently running Ubuntu 22.04.3 LTS on a machine with 24GB of RAM. The compilation time for a basic âHello Worldâ program in Debug build was excessively high (around 8 seconds), prompting the move to a Release build. However, the challenges with Vtune persist.
I am seeking advice and suggestions from the community on the following:
- Any alternative profiling tools or techniques that might provide more detailed insights into Clangâs performance.
- Tips or recommendations on resolving the resource consumption and crashing issues when using Intel Vtune with Clang.
Thanks in advance.