RFC: F18 build time memory requirements are too high

Hi F18 community

We've been developing on F18 for a few months now at Arm and are finding the codebase nicely accessible and well designed. The one thing we have really noticed though is the build time memory usage. We are really concerned about this and think that this could have serious implications for the viability of F18 as an LLVM project in future. We want to surface the problem and talk it through before sinking a lot of time into investigating mitigations.

The peak memory usage for compiling one of the files in F18 - parsing.cc - is around 5GB for a debug build. This is the worst single file for peak compile time memory usage, but there are many other files that use the parse tree that have 2Gb+ compile time memory usage. The link-time memory usage for any link with these resulting objects (so the f18 binary and any test binaries that need the parse tree) is then around 8Gb. Touching check-omp-structure.cc, which is a user of the parse tree, and doing an incremental build of F18 takes well over two minutes. Doing a similar exercise in clang (touching SemaOpenMP.cpp) takes 50 seconds. Comparing -fsyntax-only builds of both files (a common usage for text editor syntax highlighting, for example) shows the difference in compile time even more starkly - 90s for F18, 10s for Clang.

These files all use parse_tree.h, which heavily uses std::variant causing a lot of template instantiations. If you load up chrome://tracing in Google Chrome and import the attached file you can see graphs of where time is taken to build parsing.cc. When zooming in (press the zoom tooltip then click and hold and move up and down with the mouse) you can see that the time taken in the front end is almost exclusively taken instantiating std::variant templates, and the time taken in the backend is almost exclusively dead code elimination resulting from the huge amount of dead code generated. The heavy use of std::variant in the F18 parse tree means that building this file takes 2+ minutes and makes incremental builds around 2.5x slower than for clang.

In addition, a single threaded build of F18 consumes up to 5GB when compiling files then 8GB when linking. On most systems, this does not leave much scope for parallel building, which makes builds of F18 slower than they could be. A parallel build of F18 on Taishan (64-core AArch64 server node) using all 64 cores takes around the same wallclock time (about 5 minutes) as it does to build clang+llvm in the same way. A large proportion of the build time of clang is linking LLVM, which F18 does not do yet. F18 will need to link to LLVM and also MLIR for codegen, so we can expect the build time of F18 to increase significantly in future. There will be also more usage of the parse tree in the code generator, so more single files with high compile-time memory usage and high-memory links. I think we can predict that when F18(+LLVM) is complete, the build time will be significantly slower than the build time of Clang(+LLVM).

An aside, the parallel F18 on Taishan build uses 36Gb of RAM! The link stage of this build ends up being two parallel link processes consuming about 14Gb RAM between them.

So what could this all mean?

I'm sure we all know the general rule that slow incremental and debug build times generally hurt developer productivity and reduce the appeal of a codebase. Slow parallel builds also reduce turnaround time for CI testing, again hurting developer productivity. David Truby's recent mail about CI shows that we are limiting our choice of free CI services because of the large build resources needed to build F18.

Specifically for F18, we want it to be a fully supported LLVM subproject on a par with Clang. If enabling F18 hugely increases build time, then LLVM developers in the wider community who don't care specifically about Fortran, simply won't build it. The same goes for build bot maintainers, if it hikes up the turnaround time for a buildbot or reduces the frequency of runs then they will not build it for us.

The memory usage is perhaps even more challenging as it will greatly reduce the level of parallelism that the LLVM project can use for a build with F18 enabled because of these large compilations and links taking up multiple GB and that will slow down the whole project build, not just the F18 part. We are also placing a really high bar for developers to play with F18, basically you need a very solid dev machine or a cluster node to build it sensibly. You can't hack on it offline on your laptop on the plane or train. And there will be some folk who are simply left behind - we are turning away potential contributors (for example [1].)

If most developers are not going to build F18 as a matter of course this has a number of potential downsides:
  * Most significantly, it will mean that if LLVMs API's, utilities or other behaviour changes, F18 is less likely to be automatically considered for fixing up at the same time by the same developer. F18 will be a second class citizen in this respect, it will increase the burden on the F18 community of doing this work and reduces F18's credentials as a fully supported F18 project.
  * It reduces the diversity of platforms/toolchains routinely used to build F18 on which reduces the robustness of the top of master on these platforms.

As well as the ongoing impact, we think this is going to have a big negative impact when F18 code initially lands in LLVM. When the code lands, we can expect random folks in the community to just try and built it. What does it do for the reputation of F18 when it crashes on their machines because they only have 8Gb or 12GB memory available?

We believe that build time is scaling roughly linearly with every parse node added to the parse tree so this problem is likely to slowly get worse as F18's language support is built out and as it expands to cover new standards. As an empirical data point, David has been setting up his CI on his fork of F18 which he forked a few weeks back. Today he rebased to the latest top of tree and had to drop from -j24 to -j16, and even this runs out of memory occasionally. The drone.io node has 96 cores, so that is a lot of potential parallelism going to waste!

We are aware that this is a difficult point to make given that use of std::variant is key to F18's design, and a specific priority to prioritise the design over the resources needed [2] But if we can't find a way to mitigate this build-time memory usage it could be a serious threat to the success of F18 in LLVM. Is anyone else as concerned about this as we are?

Thanks
Rich

[1] A bug reported by a nobody. · Issue #431 · flang-compiler/f18 · GitHub
[2] A bug reported by a nobody. · Issue #431 · flang-compiler/f18 · GitHub

parsing.json.gz (6.31 MB)

+1 to all of this. We do need to address this problem.

Several people have approached me offline with concerns about this over the last month.

Would it help if we split the problematic source files? This is very common with non-trivial uses of Boost.Spirit and similar libraries for similar reasons.

-Hal

Rich,

Perhaps there are techniques that you can experiment with to speed up compilation and reduce the peak memory usage and the amount of dead code. For example, external template declarations or C++20 Modules come to mind.

I don’t know if std::variant is heavyweight or not, but perhaps f18 doesn’t use all of its features. Chandler suggested a while back that someone might write a lightweight llvm::variant that provides the same type safety and features that f18 is using today.

Were you able to profile clang to see what it was doing during those instantiations? Perhaps there’s something in clang that could be more efficient.

  • Steve

Hi Steve,

Perhaps there are techniques that you can experiment with to speed up
compilation and reduce the peak memory usage and the amount of dead
code. For example, external template declarations or C++20 Modules
come to mind.

I played around with extern templates to try and reduce this issue,
unfortunately they don't help here as we are using ".v" and ".t"
members of (parts of) the parse tree in a lot of places, and to know
the type of these requires instantiation. So although usually you could
use extern template to force instantiation to occur in only one place,
other files are still going to need to instantiate in this case.

I haven't tried modules, but given that C++20 isn't published yet I
feel a dependence on a feature from there would hinder our acceptence
into the LLVM repo. I also suspect the same issue will arise as with
precompiled headers, which I also tried; the issue being that these
only solve the problem of having to parse the header multiple times, it
won't help with instantiation as that still needs to do the same amount
of work.

I don't know if std::variant is heavyweight or not, but perhaps f18
doesn't use all of its features. Chandler suggested a while back that
someone might write a lightweight llvm::variant that provides the
same type safety and features that f18 is using today.

Unfortunately the problem isn't really with the implementation of
std::variant but rather the template instantiation depth. Any
implementation of something similar to variant would require an
equivalent number of templates to be instantiated.

It's possible we could improve template instantiation performance in
clang (and gcc, as we get similar build time and memory usage results
there). I suspect the low hanging fruit will have been picked already
here though, as templates have been around in both compilers for some
time.

I saw Hal gave a talk at the LLVM meeting last month about template
instantiation performance, perhaps he has some insight here?

David Truby

Hello everyone,

At Appentra, we have also been working with F18 and can only subscribe to Richard Barton’s words. Compilation times and memory requirements are already posing a barrier to F18 adoption right now and we don’t see any signs of improvement in the short or medium term.

Therefore, we believe that it is a priority as a community to re-evaluate the impact of this problem and try to find solutions as soon as possible. Do you see any value in asking for input about this in the LLVM-dev mailing list? Or folks at cfe-dev or cfe-users could give us some hints with the trace file that Richard has shared.

We have discussed this and other issues with Hal Finkel and have agreed to present them in the bi-weekly technical call on December 2nd. We want to share our impressions on trying to use F18 for tooling to promote discussion and try to find paths for improvement.

We look forward to meeting you there.

Hi everyone,

I just want to say that I'm actively looking in to this and
investigating possible solutions, I will post more information about
this on here when I have a better idea of what these solutions or any
short term fixes might look like. If anyone else has any ideas please
let me know.

Thanks
David Truby