Building LLVM through Bazel

Hi llvm-dev,

I’m not sure if this is the right place for this, but I posted a question on Stack Overflow a week ago with no response, so feel free to point me elsewhere if there is a more appropriate place.

Basically, I have a project using LLVM as a dependency currently building with CMake, which I would like to switch over to Bazel. However I am not able to get LLVM to build because header files are always missing. Bazel is supposed to do the actual C++ compilation, but without the headers it can’t do that. I’ve tried looking through LLVM’s build system, but it’s pretty complex and I’m not that familiar with CMake.

Are there certain targets which will generate all these headers which I can run before Bazel starts compiling C++ source? The closest I’ve found is ModuleMaker, which seems to run tblgen to create some of these headers, but I still find I’m missing some. Are there other targets which build header files or are there any other generated source files that need to created before compilation?

One example error I’m getting at the moment is:

In file included from external/llvm/llvm-master/include/llvm/MC/MCStreamer.h:30:0,
from external/llvm/llvm-master/lib/Object/RecordStreamer.h:16,
from external/llvm/llvm-master/lib/Object/ModuleSymbolTable.cpp:17:
external/llvm/llvm-master/include/llvm/Support/TargetParser.h:61:31: fatal error: ARMTargetParser.def: No such file or directory

Any direction would be greatly appreciated:

Doug

Yeah - not sure we’re quite at the point where LLVM wants to start supporting two build systems again (used to be Configure+Make and the CMake system, now it’s just the Cmake system), but if you want to make it work out-of-tree it shouldn’t be too difficult (Google does this internally with the internal version of Bazel). Writing a short BUILD extension for running tblgen should be possible without too much complexity - not sure what Tensorflow is doing that makes its solution so complicated, it doesn’t seem like it should be terribly hard, just a genrule to run tblgen and generate the appropriate files from the td files.

  • Dave

Would it be technically possible for cmake to generate bazel files? If so, it seems that that could be a great bridge, and useful to other projects as well.

-Chris

Yeah - not sure we’re quite at the point where LLVM wants to start supporting two build systems again (used to be Configure+Make and the CMake system, now it’s just the Cmake system), but if you want to make it work out-of-tree it shouldn’t be too difficult (Google does this internally with the internal version of Bazel). Writing a short BUILD extension for running tblgen should be possible without too much complexity - not sure what Tensorflow is doing that makes its solution so complicated, it doesn’t seem like it should be terribly hard, just a genrule to run tblgen and generate the appropriate files from the td files.

Would it be technically possible for cmake to generate bazel files? If so, it seems that that could be a great bridge, and useful to other projects as well.

/maybe/? It’s an interesting thought - not sure I know enough about either build system to have a very informed opinion here, though.

  • Dave

There have been discussions about adding a Bazel generator before:
https://cmake.org/pipermail/cmake-developers/2017-July/030144.html

There does seem to be interest in having that support in CMake, and I can’t imagine any insurmountable reason why it couldn’t be done. The real issue is that nobody has put in the time to do it.

-Chris

I believe it would be possible to run a cmake command to generate a BUILD file, though I don’t know if that would be easier to maintain on the LLVM side. Would definitely be happy to see direct support, though I was just trying to figure out what’s needed to hack this together on my end.

I guess my real question is what underlying commands are necessary to build all the source files (without actually compiling the C/C++)? If I can understand how to do this manually with terminal/cmake/tblgen, then I could probably get it to work with Bazel.

Looking at TensorFlow’s setup, it looks like tblgen has dependencies on support which goes down to zlib and gets a lot of complexity there. I get the impression that tblgen isn’t so complicated as to require all that. What would be the minimum steps to build tblgen from source?

Doug

I believe it would be possible to run a cmake command to generate a BUILD file, though I don’t know if that would be easier to maintain on the LLVM side. Would definitely be happy to see direct support, though I was just trying to figure out what’s needed to hack this together on my end.

I guess my real question is what underlying commands are necessary to build all the source files (without actually compiling the C/C++)? If I can understand how to do this manually with terminal/cmake/tblgen, then I could probably get it to work with Bazel.

Looking at TensorFlow’s setup, it looks like tblgen has dependencies on support which goes down to zlib and gets a lot of complexity there. I get the impression that tblgen isn’t so complicated as to require all that. What would be the minimum steps to build tblgen from source?

You could look at the cmake+ninja (or other build system) build and dump the commands it executes (I think ninja produces a log, or can do so) which should show you all the commands needed to build any part of LLVM.

But yeah, I think you do have to build support to build tblgen, to run tblgen to generate the rest of the files to continue the build.

  • Dave

You could look at the cmake+ninja (or other build system) build and dump the commands it executes (I think ninja produces a log, or can do so) which should show you all the commands needed to build any part of LLVM.

There’s a switch to dump all compiler commands as a JSON file:

https://cmake.org/cmake/help/v3.12/variable/CMAKE_EXPORT_COMPILE_COMMANDS.html

TensorFlow uses bazel to build LLVM:
https://github.com/tensorflow/tensorflow/tree/master/third_party/llvm
The script that generates llvm.autogenerated.BUILD is not open sourced
yet unfortunately but looking skimming through the generated file
should give you a rough idea of what's involved.

-- Sanjoy

I tried running all the commands in the compile_commands.json (thanks for pointing that out), but it runs into the same “No such file or directory: *.inc”. I don’t see those files built anywhere in that list. Does it take tblgen into account?

Doug

Yeah, compile_commands.json would be insufficient - it’s just the compile commands, not tblgen. (compile_commands is for clang tools to consume so they know what command line arguments are used for a given source file, so they can reproduce that build to show clang-tidy tips, etc)

A log output from something like ninja would likely be complete, I think.

You can run ninja -t commands <target> to get all the commands that are needed to build a particular target.

The compile_commands.json file only contains compile commands, not table-gen invocations because those are “utility” commands to CMake.

-Chris

Just to close the thread on this (thanks everyone for the help!) I was able to get this working, but cheated a little bit. I looked into using Ninja to output the build commands but found that to be pretty complex. If you want to run Ninja at build time, then you need to compile Ninja from source in Bazel, which is just deferring the “build LLVM in Bazel” problem to another project. You could run Ninja manually to generate all the commands and then check them into source control for Bazel to use during build. This could work, but makes maintainability harder because you have a hidden dependency on Ninja and changing LLVM versions becomes more complicated. It’s also tricky because you’d need to read the Ninja commands as input in order to deduce the output files which will be generated. Bazel wants to know it’s output file names before it actually begins compiling anything in order to perform proper dependency management and optimizations. I think you’d need a tool to read the Ninja commands and generate a Skylark file which gets checked into source control for Bazel to use (maybe this is where TensorFlow’s file comes from).

Regardless, I wanted to avoid all this complexity for my particular project, so I cheated a bit and just downloaded the pre-built binaries from releases.llvm.org. This included all the necessary headers, libraries, and tooling binaries (like llc which I also needed). I was able to link against this and build a simple “Hello World” language. I was also able to make some nice Bazel tooling pretty easily. It is a little counter to the way Bazel is supposed to work, as the intention is to compile everything from source, usually at head. Since LLVM has proper releases anyways this didn’t seem too bad. Downloading the pre-built binaries is also much faster than building the entire codebase from source. The full example is here. I was able to get this to work with LLVM 3.9.1, though trying a couple other versions had some missing definitions. I may just be missing a dependency somewhere.

Unfortunately, I was not able to find a good way to turn my Bazel code into a proper library. I’d love to set this up any project could just link to my repo as a repository rule and then just get LLVM as a cc_library(…) they can depend upon. Unfortunately, I couldn’t find a good way to do that. Maybe I’ll reach out to the Bazel team about that if I have some time. I was also able to integrate this with my original project, so the custom toolchain is a little more fleshed out there with external symbol linking, end to end tests, etc.

Hopefully this is helpful to some future LLVM-Bazel-er.

Doug