Beginning developer questions

Hello everyone,
I have been studying the LLVM IR and now want to get into LLVM development. I have a few questions regarding that and I would be really grateful to get answers for:

  1. The LangRef is an excellent guide/reference to the IR. Is there something similar for the codebase (the core llvm to be specific)? Or do I have to generate that from the source, in which case how do I do that?
  2. I tried building just the llvm sub-project, and that is filling up my RAM completely during the linking stages, and sends my laptop thrashing. I am using Ninja. Is there a way to mitigate this? (I am on Ubuntu 20.04 Linux, 8 GM RAM, 8 GM swap on an HDD).
  3. VSCode, at least on my laptop, is very sluggish with such large a project. Is there any recommended development environment for Linux (or at least something that has been found to work well)?

Thank you for your time!
Regards,
Deep

Hi Deep,

  1. Kind of. There’s Doxygen generated from source automatically, which shows you many things e.g., members of a type along with some short documentation (which is taken from the code). It also shows you the inheritance tree related to this type
    Here’s an example: https://llvm.org/doxygen/classllvm_1_1LoopInfo.html
    It doesn’t really matter what this is for now, but you can see e.g., that LoopInfo inherits from LoopInfoBase. If you scroll down, you can click to different members and go to a more detailed description further down. You can open the dropdown menus (e.g., public
    functions inherited). And finally, at the top, you can see the file it appears at. In general, I think that if you start clicking stuff, it’s going to make sense, it’s relatively intuitive.

  2. Try minimizing the number of parallel threads used. I think by default Ninja uses all the available threads which in most machines will fill up the RAM. To limit them, use the -j argument like this: ninja -j8
    Another thing that will probably be useful in general is that you can choose to build specific sub-projects instead of building the whole thing, like this: ninja -j8 opt

  3. Ok, first of all, if you only care about editing and not debugging LLVM (i.e. launching it with a debugger like gdb), then editors like Vim, Emacs, 4coder, maybe Sublime Text should do the job. I think most people
    developing LLVM on Linux use something like this.

Now, if you’re interested in IDEs and / or debuggers, well, the news in Linux is bad IMHO. For example, in my machine, GDB takes 30 seconds to launch the debug build of opt.
So, I couldn’t use any IDE because virtually all use GDB under the hood. Personally, I switched to Windows + Visual Studio just for this reason. That was an insane productivity boost for me.
But if you need something that works in Linux, you can maybe try LLDB. Hopefully it will be faster. If yes, you can maybe try hooking it in an IDE, which I guess won’t be trivial.

That said, as I don’t develop LLVM in Linux, other people might have better suggestions.

Best,
Stefanos

Στις Τρί, 12 Ιαν 2021 στις 5:43 π.μ., ο/η Deep Majumder via llvm-dev <llvm-dev@lists.llvm.org> έγραψε:

Hi Deep,

  1. Kind of. There’s Doxygen generated from source automatically, which shows you many things e.g., members of a type along with some short documentation (which is taken from the code). It also shows you the inheritance tree related to this type
    Here’s an example: https://llvm.org/doxygen/classllvm_1_1LoopInfo.html
    It doesn’t really matter what this is for now, but you can see e.g., that LoopInfo inherits from LoopInfoBase. If you scroll down, you can click to different members and go to a more detailed description further down. You can open the dropdown menus (e.g., public
    functions inherited). And finally, at the top, you can see the file it appears at. In general, I think that if you start clicking stuff, it’s going to make sense, it’s relatively intuitive.

  2. Try minimizing the number of parallel threads used. I think by default Ninja uses all the available threads which in most machines will fill up the RAM. To limit them, use the -j argument like this: ninja -j8
    Another thing that will probably be useful in general is that you can choose to build specific sub-projects instead of building the whole thing, like this: ninja -j8 opt

You can also use -DLLVM_PARALLEL_LINK_JOBS= on your cmake command to limit just the number of linking jobs that can run in parallel. -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON can be a useful build configuration that you gets you debug logging and assertions, but you won’t have debug symbols for gdb. There’s also -DLLVM_USE_SPLIT_DWARF. All of these options are covered here https://llvm.org/docs/GettingStarted.html#common-problems

Hi Deep,

  1. Kind of. There’s Doxygen generated from source automatically, which shows you many things e.g., members of a type along with some short documentation (which is taken from the code). It also shows you the inheritance tree related to this type
    Here’s an example: https://llvm.org/doxygen/classllvm_1_1LoopInfo.html
    It doesn’t really matter what this is for now, but you can see e.g., that LoopInfo inherits from LoopInfoBase. If you scroll down, you can click to different members and go to a more detailed description further down. You can open the dropdown menus (e.g., public
    functions inherited). And finally, at the top, you can see the file it appears at. In general, I think that if you start clicking stuff, it’s going to make sense, it’s relatively intuitive.

  2. Try minimizing the number of parallel threads used. I think by default Ninja uses all the available threads which in most machines will fill up the RAM. To limit them, use the -j argument like this: ninja -j8
    Another thing that will probably be useful in general is that you can choose to build specific sub-projects instead of building the whole thing, like this: ninja -j8 opt

You can also use -DLLVM_PARALLEL_LINK_JOBS= on your cmake command to limit just the number of linking jobs that can run in parallel. -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON can be a useful build configuration that you gets you debug logging and assertions, but you won’t have debug symbols for gdb. There’s also -DLLVM_USE_SPLIT_DWARF. All of these options are covered here https://llvm.org/docs/GettingStarted.html#common-problems

On top of these, if you’re on Linux I found that using LLD (or Gold) as a linker instead of the default bfd helps a lot with memory consumption (-DLLVM_ENABLE_LLD=ON)

Thanks everyone for the advice! I am able to build LLVM now without causing my laptop to thrash. Also as I understand that for auto-complete in LLVM, Linux is not the best place to be. Also, thanks for the Doxygen-generated docs link.
Warm regards,
Deep

Hi Deep,

Glad you are able to compile :slight_smile:

Also as I understand that for auto-complete in LLVM

Well, apart from the fact that you can tweak e.g., Vim and Emacs to do auto-complete, there are things like CLion. IMO, CLion is really great. In fact, if it had a debugger that would start
as fast as Visual Studio’s, I’d use that. But if you only need to use it as an IDE without the debugger (e.g., intellisense etc.), it should be more than enough.

The downside is that you have to pay for it except if you’re a student in which case you can probably get it for free.

Best,
Stefanos

Στις Τρί, 12 Ιαν 2021 στις 5:35 μ.μ., ο/η Deep Majumder <deep.majumder2019@gmail.com> έγραψε:

Thanks everyone for the advice! I am able to build LLVM now without causing my laptop to thrash. Also as I understand that for auto-complete in LLVM, Linux is not the best place to be. Also, thanks for the Doxygen-generated docs link.
Warm regards,

If you’re more vim/emacs than IDE - I use https://wiki.archlinux.org/index.php/Vim/YouCompleteMe and compile_commands.json generated from the ninja build I think (maybe it’s generated by cmake? I forget)

Hi Stefanos,
Speaking of CLion, their page says open source projects can qualify for free licenses. I am not sure if LLVM community qualifies as per the below required qualifications:

Do not pay their core project developers.- Meet the Open Source definition.

Hi Stefanos and Madhur,
Of course it would be great if LLVM is given licenses by JetBrains as it would benefit many people(I guess) working on this project who are not students.
I am a student and so have a free license anyway.
Also, thank you David for the link.

Warm Regards,
Deep

Hi Madhur,

I’m not sure either… The thing is, LLVM devs are paid and there are versions of LLVM developed by companies that require payment. However, these are not initiated by the LLVM Foundation
AFAIK. Maybe the last point though is.

I think this is an interesting issue, I CC’d Tanya Lattner who may be able to help.

Best,
Stefanos

Στις Τρί, 12 Ιαν 2021 στις 7:52 μ.μ., ο/η Deep Majumder <deep.majumder2019@gmail.com> έγραψε:

I’ve had good luck using QTCreator for large C++ projects in the past. Unlike CLion, QTCreator is actually free. It may be worth taking a look.

Re CLion: The LLVM Project (presumably meaning the Foundation) does not pay core developers. It does pay for some infrastructure staff IIRC.

However, the project is primarily funded by commercial companies (you should be able to find documentation of the contributors on the Foundation website), so I think on that count it would not qualify for the free CLion.

–paulr

Hi all,
As Stefanos had pointed out, GDB takes an awful lot of time to even start on a debug build of an executable (say clang). LLDB works better but still isn’t quite smooth to work with (takes a long time to set breakpoints). So what do LLVM devs who use Linux use for debugging, or is Windows the predominant platform of development?
Warm regards,
Deep

gdb startup time can be reduced significantly by using a linker-generated index. Compile with -ggnu-pubnames and link with -Wl,–gdb-index. I use this configuration (plus Split DWARF, fwiw) and gdb startup time is only a few seconds/quite usabel.

On Linux I still use GDB but build LLVM components as shared libraries (BUILD_SHARED_LIBS cmake flag) to reduce GDB loading time (it’s basically instantly if you build as shared libraries). But be aware that this only reduce the loading time, from my experiences the initialization time — the time between (gdb) run and executing the first line of code — is still pretty long.
I guess this period is dominated by loading debug info from subset of the shared libraries, which is still slightly better than loading all debug info in the cases of building LLVM components as static libraries.

-Min

Hi David,
Do you use ld, gold or lld as the linker. I am getting an unknown flag error with lld for the --gdb-index.
Warm regards,
Deep

You'd need gold or lld - ld.bfd doesn't support gdb-index.

Hi David,
Sorry to annoy you. I have lld enabled via LLVM_ENABLE_LLD=ON. To use -WI and --gdb-index, I set CMAKE_EXE_LINKER_FLAGS_DEBUG=-WI --gdb-index in CMakeCache.txt. But that doesn’t work out and I get the following error:
c++: error: unrecognized command line option ‘-WI’; did you mean ‘-I’?
c++: error: unrecognized command line option ‘–gdb-index’; did you mean ‘–no-index’?

Why does this happen?

Warm Regards,
Deep

Ah, It's "-Wl,--gdb-index" the character after 'W' is lower-case L,
not upper case i. (L for Linker). And it's a comma, not a space,
between "-Wl" and "--gdb-index"

Hi all,
I have got a lot of useful advice from many different people in this thread. Is it possible for me to make a page in the wiki or a section perhaps summarising all of these points?
Warm regards,
Deep