Clang development environment

Hi,
I’m a masters student, and I’m strating on my thesis now. I’ll write about safety within C++, and would like to develop on Clang.
Currently, I’ve not been able to create a good environment in Clion where I can use it’s debugger with clang (it simply skips over). I am placing breakpoints in the AST, CFG builders, non of which is hit (within Clion). I am currently building the ‘clang’ target in the llvm sub project, and when I modify Clang, the change is compiled, but not executed when I use Clang (a simple print when building the CFG). I’m experiencing this both when compiling C++ and C code.
I was wondering if anyone has experience with debugging Clang through Clion, or if I should use GDB instead? And generally if anyone has some good experiences/advice regarding developing on Clang.
I’m sorry if this has been asked before, I’ve not been able to find any posts or anything.

EDIT:
I’m using Ninja and a small hello world program to test.

UPDATE:
I’ve successfully reached some breakpoints under tools/driver/driver.cpp, but am still not reaching any in the CFG or AST

If you’re saying the print doesn’t happen at all even outside a debugger, then you need to double check that what you’re modifying corresponds to what you’re building. For example a command line like clang ... might be picking up a apt installed clang, not the clang you just built. Try ./build/bin/clang just to be sure.

If it prints outside a debugger but not inside one, this often happens because clang ... actually starts a new process after the “driver” part is done, clang -cc1 .... So your debugger is sticking with the first process. The second one is where the compilation happens.

The usual way around that is to add -### to your clang command line, this will output the cc1 command. Then debug that command instead.

You might be able to set options like follow-fork in gdb, but I’ve never used those myself.

This code is part of the first part clang ..., then your breakpoints are in clang -cc1 .... So you are already on the right track it seems.

Also be sure that your print should be executed at all. Maybe it’s in a place that’s genuinely not used with your examples.

I think clang/tools/driver/cc1_main.cpp is where the -cc1 process starts, so this is a good place to try also.

Controls this. CLANG_SPAWN_CC1 appears to default off, so perhaps this isn’t the issue at all but you can try the -fintegrated... options mentioned there just in case.

This is what I’m doing, I’m using my own build (llvm/build/bin/clang++)

The print does not occur any time, and -cc1 does nothing. I’ve only managed to get a print from within the driver-environment, not the CFG or AST. Maybe my understanding of the structure of the project is flawed?

Shouldn’t there always be a CFG? or

Uuhh, thank you! I’ll have a look next time I have the time.

Thank you so much for taking time to help me, truly means a lot! I’ll be sure to write an update when I’ve figured it out!

Ok, so now I’ve managed to get the print when building the CFG (llvm-project/clang/lib/Analysis/CFG.cpp at 1199e5b9ce5a001445463ba8da1f70fa4558fbcc · llvm/llvm-project · GitHub), but when placing breakpoints in Clion, it will not stop. I’m unsure if Clang is inaccessible because it’s compiled and linked to llvm, and therefore not debugable in the “normal” sense. Do you know of a standard way to develop on Clang, where I can debug the CFG, AST, and liveness analysis?

I’ll show you the terminal version first, then how I think you can do the same in CLion.

Not sure specifically how you’re getting the print to happen, but I added -Wall, shouldn’t matter as long as we both reach that function.

$ cat /tmp/test.c
int main() { return 0; }

$ ./bin/clang /tmp/test.c -o /tmp/test.o -g -Wall
CFG::buildCFG

Note that this build is without debug symbols but generally you can at least break on most functions despite that. For anything more than “did the code run”, I do recommenced you use the Debug build type.

Here I’m loading it into lldb but the same principle applies to other tools. In fact CLion might be using lldb anyway, gdb will work the same way too.

(the -- separates the program file I’m telling lldb to debug, from the options I want to pass to that program file)

$ ./bin/lldb ./bin/clang -- /tmp/test.c -o /tmp/test.o -g -Wall
(lldb) target create "./bin/clang"
Current executable set to '/home/david.spickett/build-llvm-aarch64/bin/clang' (aarch64).
(lldb) settings set -- target.run-args  "/tmp/test.c" "-o" "/tmp/test.o" "-g" "-Wall"
(lldb) b CFG::buildCFG
Breakpoint 1: where = clang`clang::CFG::buildCFG(clang::Decl const*, clang::Stmt*, clang::ASTContext*, clang::CFG::BuildOptions const&), address = 0x000000000676dec0
(lldb) run
Process 1334109 launched: '/home/david.spickett/build-llvm-aarch64/bin/clang' (aarch64)
CFG::buildCFG
Process 1334109 stopped and restarted: thread 1 received signal: SIGCHLD
Process 1334109 stopped and restarted: thread 1 received signal: SIGCHLD
Process 1334109 exited with status = 0 (0x00000000)
(lldb) q

Note that the print did happen, but apparently not inside the process we were debugging, so we didn’t stop. It did let us place the breakpoint, because all this code is in the same binary.

Now I’ll add -### to the original command line to get the cc1 command. Here is where I should have given you an example, because it prints a wall of text that’s not easy to parse.

$ ./bin/clang /tmp/test.c -o /tmp/test.o -g -Wall -###
clang version 20.0.0git (https://github.com/llvm/llvm-project.git ea9204505cf1099b98b1fdcb898f0bd35e463984)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/david.spickett/build-llvm-aarch64/bin
Build config: +assertions
 "/home/david.spickett/build-llvm-aarch64/bin/clang-20" "-cc1" "-triple" "aarch64-unknown-linux-gnu" <...>
 "/usr/bin/ld" "-EL" "-z" "relro" "--hash-style=gnu" <....>

The line you want is the one that begins "/home/.../clang-20" "-cc1". Debug that command instead:

$ ./bin/lldb "/home/david.spickett/build-llvm-aarch64/bin/clang-20" -- "-cc1" "-triple" <...>
(lldb) target create "/home/david.spickett/build-llvm-aarch64/bin/clang-20"
Current executable set to '/home/david.spickett/build-llvm-aarch64/bin/clang-20' (aarch64).
(lldb) settings set -- target.run-args  "-cc1" "-triple" <...>
(lldb) b CFG::buildCFG
Breakpoint 1: where = clang-20`clang::CFG::buildCFG(clang::Decl const*, clang::Stmt*, clang::ASTContext*, clang::CFG::BuildOptions const&), address = 0x000000000676dec0
(lldb) run
Process 1334225 launched: '/home/david.spickett/build-llvm-aarch64/bin/clang-20' (aarch64)
Process 1334225 stopped
* thread #1, name = 'clang-20', stop reason = breakpoint 1.1
    frame #0: 0x0000aaaab1217ec0 clang-20`clang::CFG::buildCFG(clang::Decl const*, clang::Stmt*, clang::ASTContext*, clang::CFG::BuildOptions const&)
clang-20`clang::CFG::buildCFG:
->  0xaaaab1217ec0 <+0>:  str    d8, [sp, #-0x70]!
    0xaaaab1217ec4 <+4>:  stp    x29, x30, [sp, #0x10]
    0xaaaab1217ec8 <+8>:  stp    x28, x27, [sp, #0x20]
    0xaaaab1217ecc <+12>: stp    x26, x25, [sp, #0x30]

Now we’re debugging the process where the print happens, which is cc1 which is the “compiler” vs. the “driver” that first runs.

Note that I’ve got no source display here because this is a release build I happened to have already. Again I reccomend using the cmake option -DCMAKE_BUILD_TYPE=Debug if you’re doing any serious debugging.

I tried the -fintegrated-cc1 options but it made no difference here. I suspect because this just controls whether a new process is used, and even with integrated cc1 it’s replacing itself with the cc1 process? I’m not an expert here though.

What does work is lldb’s follow fork child mode using the original clang command:

$ ./bin/lldb ./bin/clang -- /tmp/test.c -o /tmp/test.o -g -Wall
(lldb) target create "./bin/clang"
s Current executable set to '/home/david.spickett/build-llvm-aarch64/bin/clang' (aarch64).
(lldb) settings set -- target.run-args  "/tmp/test.c" "-o" "/tmp/test.o" "-g" "-Wall"
(lldb) settings set target.process.follow-fork-mode child
(lldb) b CFG::buildCFG
Breakpoint 1: where = clang`clang::CFG::buildCFG(clang::Decl const*, clang::Stmt*, clang::ASTContext*, clang::CFG::BuildOptions const&), address = 0x000000000676dec0
(lldb) run
Process 1334571 launched: '/home/david.spickett/build-llvm-aarch64/bin/clang' (aarch64)
Process 1334666 stopped
* thread #2, name = 'clang-20', stop reason = exec
    frame #0: 0x0000fffff7fd9c40 ld-linux-aarch64.so.1`_start
ld-linux-aarch64.so.1`_start:
->  0xfffff7fd9c40 <+0>:  bti    c
    0xfffff7fd9c44 <+4>:  mov    x0, sp
    0xfffff7fd9c48 <+8>:  bl     0xfffff7fda610 ; _dl_start at rtld.c:527:1
    0xfffff7fd9c4c <+12>: mov    x21, x0
1 location added to breakpoint 1

lldb will stop on exec which is clang starting the cc1 process. There’s probably a way to continue automatically here but I couldn’t find it. Just continue to then hit the breakpoint as expected:

(lldb) c
Process 1334666 resuming
Process 1334666 stopped
* thread #2, name = 'clang-20', stop reason = breakpoint 1.2
    frame #0: 0x0000aaaab1217ec0 clang-20`clang::CFG::buildCFG(clang::Decl const*, clang::Stmt*, clang::ASTContext*, clang::CFG::BuildOptions const&)
clang-20`clang::CFG::buildCFG:
->  0xaaaab1217ec0 <+0>:  str    d8, [sp, #-0x70]!
    0xaaaab1217ec4 <+4>:  stp    x29, x30, [sp, #0x10]
    0xaaaab1217ec8 <+8>:  stp    x28, x27, [sp, #0x20]
    0xaaaab1217ecc <+12>: stp    x26, x25, [sp, #0x30]

If you are using GDB it’s follow fork settings are documented here: Forks (Debugging with GDB)

I am not a CLion user but I found Debug forked processes | CLion Documentation. I think you will want the “Follow Child on Fork” option.

Otherwise you may be able to make CLion run settings set ... or set follow-fork-mode... for you, depending on whether it’s launching LLDB or GDB.

Using this follow fork setting will mean you don’t have to go get the cc1 command every time.

I found that it wasn’t always used. Adding -Wall got me some output so I think the CFG is only built if it’s needed by some pass or warning.

So I had to the time this morning and for your purposes, you can ignore them. See my longer answer above.

I actually already have Debug build type, the error still occured there.

I see your lldb is within your local build, however I don’t have a lldb in my build. Could this be because I have targeted ‘clang’ only?

I actually also managed to get it to print, but I did thath by making a more “complex” test program:

#include <iostream>

std::string& get_string(std::string& str) {
  if (str.empty()) {
    str = "empty";
  }
  return str;
}

int main() {
  std::string str = "string";
  std::cout << get_string(str) << std::endl;

  return 0;
}

When I run ./bin/clang++ -cc1 test.cpp -o test.exe, I get this error:

test.cpp:1:10: fatal error: 'iostream' file not found
    1 | #include <iostream>
      |          ^~~~~~~~~~
1 error generated.

And the output with -### added:

clang version 20.0.0git (git@github.com:llvm/llvm-project.git bdcbfa7fb4ac6f23262095c401d28309d689225e)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/lohse/Documents/School/9_Sem/project/llvm-project/llvm/build/bin
Build config: +unoptimized, +assertions
 "/home/lohse/Documents/School/9_Sem/project/llvm-project/llvm/build/bin/clang-20" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-emit-obj" "-dumpdir" "test.exe-" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "test.cpp" <....>

Which is a bit different from yours, maybe because of the Debug build type?

I just tried this, and it did actually hit! However, do you know whether there is a “better” way of building clang? When doing it through the llvm sub-project, it takes 5+ minutes, no matter the change…

Again, thank you so much for helping me, it truly means a lot!

Make sure you are using Ninja instead of Make.

Yes the problem of debugging the wrong process will occur regardless but when you do hit the breakpoint a debug build should show you source lines not assembly.

If you want to build lldb add lldb to the -DLLVM_ENABLE_PROJECTS= option like =clang;lldb.

However you don’t have to build lldb you can install it from apt and various other places. I think it’s also in the Windows llvm releases. From what you’ve described of your work, there’s no reason for you to be building your own lldb.

I work on lldb itself, that’s why I build it.

Ok so the cc1 “compiler” command line is generally not something you should be building yourself. You may do that later if you start writing test cases.

For now I would only get the cc1 command by adding -### to the “normal” clang command. This way you’ll have all the required options like target features, include paths, language modes, etc.

In this case the lack of include paths is what causes the error you got. By generating the cc1 command using clang, all those are generated for you.

So if I wanted the cc1 commad for: ./bin/clang++ test.cpp -o test.exe

I would add -### on the end to get ./bin/clang++ -cc1 test.cpp -o test.exe -###, run that and copy the cc1 line from the output like I showed above.

Specifically, CMake defaults to generating make files. You can add -G Ninja to the CMake command to generate ninja files. See Getting Started with the LLVM System — LLVM 20.0.0git documentation

make builds one at a time, you have to add -j<number of jobs> to make it parallel. ninja paralelises as much as possible by default, and -j<...> can be used to reduce it if you need to.

I already am, my compile times are still through the roof, even by adding a comment.

Thank you!

Aaahhh, makes sense. I misunderstood what it was then.

I already use Ninja, and it’s still super slow, I think Clion defaults to -j 14, should I try and increase this?

That sounds like the number of cores/threads (the number the nproc command shows), maybe minus a couple so the system stays responsive.

People do commonly build with all cores at once, you just have to put up with a bit of interface lag during the build.

Regarding the use of Ninja, make sure that you actually have enough ram. You might be going into swap. Ninja will happily link everything at once using more memory than you have. You should limit the number of link jobs with a cmake option if that happens(LLVM_PARALLEL_LINK_JOBS). You should also use lld as the linker, as it is faster and uses less memory. Don’t use LTO for debug builds. Also use a compiler cache like ccache or sccache, though if you aren’t removing your build dir cmake/ninja should build only what is needed.

I’ll keep that in mind, thank you!

I’ve added (or tried) lld and ccache, and now my compile times are better, but still bad (3,5 minutes for a print). Maybe I’ve done it wrong?
cmake -DCMAKE_CXX_FLAGS="-fuse-ld=lld" -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DLLVM_ENABLE_PROJECTS="clang" -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_RUNTIMES=libcxx -G Ninja ../llvm

The settings look fine. You could check some things:

  • Is the ccache cache folder full? It defaults to 5GB I think? Which is sometimes not enough. ccache -s will show you a bunch of useful stats.
  • Is the build spending most of its time in linking or compiling? If it’s mostly linking after you make a change to a .cpp file, this is expected. If it was rebuilding a lot of cpp files, that might be a sign that something is off. If you change header files, lots of things will rebuild.

Also a debug build does take longer to link. Given that you are now using ccache you could switch to a release build while you’re print debugging - then back to debug for source stepping debug. Compiling should just pull the object files back out of the ccache.

Or you can keep two builds around, and you won’t have to redo all the cache lookups either.

I think the issue here is that you’re using LLVM_ENABLE_RUNTIMES. This variable builds the runtime using the just built compiler. If the compiler you just built is unoptimized, it will be slower. And as the compiler changed cmake might not cache it anymore. You should probably use enable projects instead, if you even need to build libcxx. Or linking on your machine just takes this long.