Issues in llvm-tblgen -- High-parallelized build

Let me introduce issues around building LLVM with manycore system.
At first, let’s discuss llvm-tblgen.

Abstract

Data have been measured on AWS c6a.48xlarge, 192 vcpus.

I use GitHub - nico/ninjatracing: Convert .ninja_log files to chrome's about:tracing format.
(Perfetto UI can accept .ninja_log but I don’t like its view)
They can be viewed with Perfetto UI or Chrome tracing.

Latency by llvm-tblgen

As you know, many files depend on llvm-tblgen. In this case, they wait for llvm-tblgen for 20 seconds. Ninja-build begins starving since +3.5s.

Let’s look into a simple case, clangBasic.

Let me summarize.

  • GlobalISelEmitter.cpp (1st row) makes the critical path. It takes 19s.
  • LLVMTableGen:Record.cpp may be assumed as an essential file for tblgen. It takes 7.3s to build.
    • LLVMSupport:ItaniumManglingCanonicalizer.cpp takes 11s, longer than Record.cpp.
  • RISCVTargetParserTableGen (seen in 1st row) takes 0.7s to generate. Many modules depend on also it but is it really essential to them?
    • A few files in LLVMSupport (eg. VirtualFileSystem.cpp) take longer, but I don’t think they are not big burden.

What to improve?

I experimented some of them in 2021. See also https://twitter.com/chapuni/status/1401519362058555393

Make TableGen’s emitters to plugins

Most emitters are irrelevant to generate files in the critical path, intrinsics_gen and RISCVTargetParserTableGen. If those emitters could be separated, the critical path might be reduced.

  • Move each cl::opt definition to corresponding Emitter.cpp.
  • Implement the capability to build each emitter as loadable module.
  • Make llvm-tblgen recognize an unknown (unlinked) generator and load it dynamically.
    • -gen-unknown-foo will load libtblgen-unknown-foo.so
  • Make add_tablegen to append dependency to emitter module.

I don’t intend to move to plugin-ization. This may be optional.

When they will be made, LLVMSupport:ItaniumManglingCanonicalizer.cpp will be the next critical path.

Split out some files in LLVMSupport

The most effective way is to split files required by tblgen to dedicated module, like “LLVMSupportLite”. But I am afraid that it would make LLVM less maintainancible.

Could we move ItaniumManglingCanonicalizer.cpp out of LLVMSupport?

Move dependency on LLVMTargetParser

I haven’t had any idea yet, since it is new to me. :slight_smile:

Random notes

  • I expect ninja-build to schedule along critical path. May be depth-based, at least.
    • In test-depends (check-clang), SemaExpr-clang-ast-dump-ASTNodeAPI.json-ToolingTests:SourceCodeTest.cpp is the longest critical path.
    • FYI, the attached test-depends.json is generated by my experimental ninja-build. It implements duration-based scheduler with previous build log.
  • Even if we could optimize scheduling llvm-tblgen, we would still see idle time due to starvation. Could we fill tasks into the gap?
    • Some files will be dissolved from dependency on llvm-tblgen. This will require lots of works in CMake side.
      • Loosen dependency to add_custom_command. I did, in past, with target_link_libraries(INTERFACE).
      • Discover the real dependency on generated headers with clang-scan-deps and add deps with ninja’s dyndep.

Not sure how much of saving this is but ItaniumManglingCanonicalizer/SymbolRemappingReader looks like they’d be better off in ProfileData - I think that’d only affect llvm-cxxmap ?

1 Like

I’ve long wanted to be able to unify MachineValueType.h and ValueTypes.td to avoid the duplication and the hard-coding of constants that makes adding MVTs in a fork annoying for merge conflicts. Currently llvm-tblgen depends on MachineValueType.h as some of the GISel backends need to know about MVTs, but moving to a plugin-based approach could allow that to be decoupled and break the circular dependency. I don’t know how you would propose loading plugins for a statically-linked llvm-tblgen though given dlopen only works in dynamically-linked binaries on some OSes (e.g. FreeBSD), as a attractive as it seems for resolving this problem, and I doubt we want to have one full binary per backend.

1 Like

I would go the other direction. Instead of the llvm-tblgen bottleneck, I would turn tblgen into a library and each subproject can create its own tblgen tool.

We already do that anyway; llvm-tblgen is for llvm/, clang-tblgen is for clang/, lld-tblgen is for lldb/, and they each provide their backends that link against libLLVMTableGen. Unless you mean one binary per backend, which would mean going from 1 to 30-40 executables for llvm/ alone; though perhaps statically linking libLLVMTableGen and its dependencies doesn’t need all that much disk space, on my Mac it seems to be ~5M, so 200M wouldn’t be absurd in the grand scheme of LLVM build requirements.

My bad. I wanted to say each library and not each subproject. Nonetheless, a c6a is an excellent tool to find bottlenecks in the build system.

+1 for turning TG backends (emitters) into plugins that will be built on-demand

Actually I think it’s a good time to do so since plugin-ization shares lots of infrastructure with the solution you described. Plus, it helps TG as a language in general for out-of-tree applications since creating a custom TG backend becomes a lot easier (you don’t have to create your own driver tool or modify llvm-tblgen).

1 Like

If GlobalISelEmitter.cpp is on the critical path, why can’t GIsel have its own binary? I would prefer many TG binaries over plugins.

I wonder how we would make it smarter.
Do you think also MVT::SimpleValueType may be generated?

We have to consider also plugin-unavailable hosts. Then…

  • Basic llvm-tblgen may have MVTEmitter.
  • Introduce another tblgen, llvm-cg-tblgen. It may depend on the artifact of ValueTypes.td.

If llvm-tblgen had capability of plugins, it would invoke almost all emitters, even Clang’s emitters.

Would you mean “TG binaries” as “plugin modules of tblgen”?

I think we could split out specific emitters, but I would like to introduce more generic way.
Then, “llvm-tblgen” (and LLVMTableGen+LLVMSupportLite.so) will handle almost all emitters. I expect it would make easier for 3rd parties to implement emitters out of LLVM tree.

dlopen is not fun. I was thinking of an llvm-globalisel-tblgen executable and an llvm-tblgen executable. Maybe the TG infrastructure can be exported as a library to make it easer to write custom out of tree TGs and in-tree.

1 Like

I didn’t imagine out-of-tree project at first, but your suggestion is impressive to me.
Thank you.

(3rd consecutive post has been blamed by Discourse. I’ll reduce posts in the next time.)

+1 to this. It’s at least worth a try.

A lot of it already is exported, though a bunch of CodeGen-specific stuff is missing for that plan. Basically, add a libLLVMTableGenCodeGen in addition to the already existing libLLVMTableGen, where the new library contains CodeGenTarget.cpp and potentially others.

I’ve thought the opposite of having multiple tablegen tools, but have one tablegen invocation perform all of the generators at once and in parallel. There’s a lot of time spent in common building target information paths, only some of which gets used depending in the output type.

Also there’s probably low hanging fruit to speed up GlobalISelEmitter. I’ve never looked at the profile for it.

Also there’s probably low hanging fruit to speed up GlobalISelEmitter. I’ve never looked at the profile for it.

The problem here is that GlobalISelEmitter.cpp is slow to compile, not that llvm-tblgen -gen-global-isel is slow to run.

According to the graph GlobalISelEmitter.cpp is on the critical path. Having two or more llvm-foo-tblgen executables should add some parallelism.

An easy solution might be splitting GlobalISelEmitter.cpp

If we add more tablegen-* binaries, do we also need to increase the default number of parallel link jobs? Will adding more binaries make the build slower on systems with fewer cores?

That is another challenge with the LLVM build system. Do you tune for 1 or 96 cores?

1 Like

Thank you everyone who gave me comments.

In 2021, I took an option to plugin-ize tblgen, since I wanted to avoid intrusive changes in the tree. Then I didn’t think I could add other tblgen executables.
As I read comments, I think it would be yet another option to add specific tblgen(s), if it would be acceptable.

Although decoupling GISelEmitter would be effective, I think introducing “CodeGen’s tblgen” would be better. I will work if I have my time.

@jrtc27 I have looked around llvm-tblgen and MVT.
I expected CodeGen stuff could be split out easily, but I knew also intrinsics_gen depends on MVT.
Could we use ValueTypes.td directly from other td(s) to avoid using MachineValueType.h, at least for IntrinsicsEmitter?
I guess it would not be easy to rewrite all emitters since we would have to rewrite many emitters.
For now, I suspend considering ValueTypes.td.

@tschuett @nhaehnle Let me know why you don’t prefer plugin.
(As I said before, it should be an option)

@tstellar tblgen executables would be expected smaller.
With a few libraries to link each tblgen executable.
If we would build plugins, each plugin module would be a few object files with a few dynamic libraries.
I think it would not be a problem unless we would try linking hundred of tblgen(s).