Issues in llvm-tblgen -- High-parallelized build

chapuni · January 29, 2023, 12:09pm

Let me introduce issues around building LLVM with manycore system.
At first, let’s discuss llvm-tblgen.

Abstract

Data have been measured on AWS c6a.48xlarge, 192 vcpus.

LLVMCore LLVMCore (c6a.48xlarge) · GitHub
clangBasic clangBasic (c6a.48xlarge) · GitHub
test-depends (large) https://gist.github.com/chapuni/aae5c61e4ba0eb1b457ab8a9680b4eac

I use GitHub - nico/ninjatracing: Convert .ninja_log files to chrome's about:tracing format.
(Perfetto UI can accept .ninja_log but I don’t like its view)
They can be viewed with Perfetto UI or Chrome tracing.

Latency by llvm-tblgen

As you know, many files depend on llvm-tblgen. In this case, they wait for llvm-tblgen for 20 seconds. Ninja-build begins starving since +3.5s.

Let’s look into a simple case, clangBasic.

Let me summarize.

GlobalISelEmitter.cpp (1st row) makes the critical path. It takes 19s.
LLVMTableGen:Record.cpp may be assumed as an essential file for tblgen. It takes 7.3s to build.
- LLVMSupport:ItaniumManglingCanonicalizer.cpp takes 11s, longer than Record.cpp.
RISCVTargetParserTableGen (seen in 1st row) takes 0.7s to generate. Many modules depend on also it but is it really essential to them?
- A few files in LLVMSupport (eg. VirtualFileSystem.cpp) take longer, but I don’t think they are not big burden.

What to improve?

I experimented some of them in 2021. See also https://twitter.com/chapuni/status/1401519362058555393

Make TableGen’s emitters to plugins

Most emitters are irrelevant to generate files in the critical path, intrinsics_gen and RISCVTargetParserTableGen. If those emitters could be separated, the critical path might be reduced.

Move each cl::opt definition to corresponding Emitter.cpp.
Implement the capability to build each emitter as loadable module.
Make llvm-tblgen recognize an unknown (unlinked) generator and load it dynamically.
- -gen-unknown-foo will load libtblgen-unknown-foo.so
Make add_tablegen to append dependency to emitter module.

I don’t intend to move to plugin-ization. This may be optional.

When they will be made, LLVMSupport:ItaniumManglingCanonicalizer.cpp will be the next critical path.

Split out some files in LLVMSupport

The most effective way is to split files required by tblgen to dedicated module, like “LLVMSupportLite”. But I am afraid that it would make LLVM less maintainancible.

Could we move ItaniumManglingCanonicalizer.cpp out of LLVMSupport?

Move dependency on LLVMTargetParser

I haven’t had any idea yet, since it is new to me.

Random notes

I expect ninja-build to schedule along critical path. May be depth-based, at least.
- In test-depends (check-clang), SemaExpr-clang-ast-dump-ASTNodeAPI.json-ToolingTests:SourceCodeTest.cpp is the longest critical path.
- FYI, the attached test-depends.json is generated by my experimental ninja-build. It implements duration-based scheduler with previous build log.
Even if we could optimize scheduling llvm-tblgen, we would still see idle time due to starvation. Could we fill tasks into the gap?
- Some files will be dissolved from dependency on llvm-tblgen. This will require lots of works in CMake side.
  - Loosen dependency to add_custom_command. I did, in past, with target_link_libraries(INTERFACE).
  - Discover the real dependency on generated headers with clang-scan-deps and add deps with ninja’s dyndep.

RKSimon · January 29, 2023, 12:31pm

Not sure how much of saving this is but ItaniumManglingCanonicalizer/SymbolRemappingReader looks like they’d be better off in ProfileData - I think that’d only affect llvm-cxxmap ?

jrtc27 · January 29, 2023, 5:38pm

I’ve long wanted to be able to unify MachineValueType.h and ValueTypes.td to avoid the duplication and the hard-coding of constants that makes adding MVTs in a fork annoying for merge conflicts. Currently llvm-tblgen depends on MachineValueType.h as some of the GISel backends need to know about MVTs, but moving to a plugin-based approach could allow that to be decoupled and break the circular dependency. I don’t know how you would propose loading plugins for a statically-linked llvm-tblgen though given dlopen only works in dynamically-linked binaries on some OSes (e.g. FreeBSD), as a attractive as it seems for resolving this problem, and I doubt we want to have one full binary per backend.

tschuett · January 29, 2023, 5:49pm

I would go the other direction. Instead of the llvm-tblgen bottleneck, I would turn tblgen into a library and each subproject can create its own tblgen tool.

jrtc27 · January 29, 2023, 5:57pm

We already do that anyway; llvm-tblgen is for llvm/, clang-tblgen is for clang/, lld-tblgen is for lldb/, and they each provide their backends that link against libLLVMTableGen. Unless you mean one binary per backend, which would mean going from 1 to 30-40 executables for llvm/ alone; though perhaps statically linking libLLVMTableGen and its dependencies doesn’t need all that much disk space, on my Mac it seems to be ~5M, so 200M wouldn’t be absurd in the grand scheme of LLVM build requirements.

tschuett · January 29, 2023, 5:58pm

My bad. I wanted to say each library and not each subproject. Nonetheless, a c6a is an excellent tool to find bottlenecks in the build system.

mshockwave · January 29, 2023, 7:37pm

+1 for turning TG backends (emitters) into plugins that will be built on-demand

Actually I think it’s a good time to do so since plugin-ization shares lots of infrastructure with the solution you described. Plus, it helps TG as a language in general for out-of-tree applications since creating a custom TG backend becomes a lot easier (you don’t have to create your own driver tool or modify llvm-tblgen).

tschuett · January 29, 2023, 9:30pm

If GlobalISelEmitter.cpp is on the critical path, why can’t GIsel have its own binary? I would prefer many TG binaries over plugins.

chapuni · January 30, 2023, 2:34pm

I wonder how we would make it smarter.
Do you think also MVT::SimpleValueType may be generated?

We have to consider also plugin-unavailable hosts. Then…

Basic llvm-tblgen may have MVTEmitter.
Introduce another tblgen, llvm-cg-tblgen. It may depend on the artifact of ValueTypes.td.

If llvm-tblgen had capability of plugins, it would invoke almost all emitters, even Clang’s emitters.

chapuni · January 30, 2023, 2:40pm

Would you mean “TG binaries” as “plugin modules of tblgen”?

I think we could split out specific emitters, but I would like to introduce more generic way.
Then, “llvm-tblgen” (and LLVMTableGen+LLVMSupportLite.so) will handle almost all emitters. I expect it would make easier for 3rd parties to implement emitters out of LLVM tree.

tschuett · January 30, 2023, 2:43pm

dlopen is not fun. I was thinking of an llvm-globalisel-tblgen executable and an llvm-tblgen executable. Maybe the TG infrastructure can be exported as a library to make it easer to write custom out of tree TGs and in-tree.

chapuni · January 30, 2023, 2:43pm

I didn’t imagine out-of-tree project at first, but your suggestion is impressive to me.
Thank you.

(3rd consecutive post has been blamed by Discourse. I’ll reduce posts in the next time.)

nhaehnle · January 30, 2023, 3:07pm

+1 to this. It’s at least worth a try.

A lot of it already is exported, though a bunch of CodeGen-specific stuff is missing for that plan. Basically, add a libLLVMTableGenCodeGen in addition to the already existing libLLVMTableGen, where the new library contains CodeGenTarget.cpp and potentially others.

arsenm · January 30, 2023, 3:37pm

I’ve thought the opposite of having multiple tablegen tools, but have one tablegen invocation perform all of the generators at once and in parallel. There’s a lot of time spent in common building target information paths, only some of which gets used depending in the output type.

Also there’s probably low hanging fruit to speed up GlobalISelEmitter. I’ve never looked at the profile for it.

jayfoad · January 30, 2023, 4:06pm

Also there’s probably low hanging fruit to speed up GlobalISelEmitter. I’ve never looked at the profile for it.

The problem here is that GlobalISelEmitter.cpp is slow to compile, not that llvm-tblgen -gen-global-isel is slow to run.

tschuett · January 30, 2023, 4:41pm

According to the graph GlobalISelEmitter.cpp is on the critical path. Having two or more llvm-foo-tblgen executables should add some parallelism.

mshockwave · January 30, 2023, 5:25pm

An easy solution might be splitting GlobalISelEmitter.cpp

tstellar · January 30, 2023, 5:45pm

If we add more tablegen-* binaries, do we also need to increase the default number of parallel link jobs? Will adding more binaries make the build slower on systems with fewer cores?

tschuett · January 30, 2023, 6:06pm

That is another challenge with the LLVM build system. Do you tune for 1 or 96 cores?

chapuni · January 31, 2023, 2:29pm

Thank you everyone who gave me comments.

In 2021, I took an option to plugin-ize tblgen, since I wanted to avoid intrusive changes in the tree. Then I didn’t think I could add other tblgen executables.
As I read comments, I think it would be yet another option to add specific tblgen(s), if it would be acceptable.

Although decoupling GISelEmitter would be effective, I think introducing “CodeGen’s tblgen” would be better. I will work if I have my time.

@jrtc27 I have looked around llvm-tblgen and MVT.
I expected CodeGen stuff could be split out easily, but I knew also intrinsics_gen depends on MVT.
Could we use ValueTypes.td directly from other td(s) to avoid using MachineValueType.h, at least for IntrinsicsEmitter?
I guess it would not be easy to rewrite all emitters since we would have to rewrite many emitters.
For now, I suspend considering ValueTypes.td.

@tschuett @nhaehnle Let me know why you don’t prefer plugin.
(As I said before, it should be an option)

@tstellar tblgen executables would be expected smaller.
With a few libraries to link each tblgen executable.
If we would build plugins, each plugin module would be a few object files with a few dynamic libraries.
I think it would not be a problem unless we would try linking hundred of tblgen(s).

Topic		Replies	Views
[Zorg] Simplify ClangBuilder LLVM Dev List Archives	6	106	October 31, 2016
FYI: Ninja-build user may use CMake-3.9 LLVM Dev List Archives	5	83	August 11, 2017
Minimal build for just clang-format Beginners	17	3144	March 18, 2022
Problems with make LLVM Dev List Archives	6	101	June 30, 2014
FYI: ENABLE_MODULES would make building faster LLVM Dev List Archives	5	82	July 12, 2017