For many years, we have had tutorials at the LLVM Developers’ Meeting directed at newcomers and beginners with LLVM and LLVM subprojects. Many of these were topics that we saw a need for and recruited volunteers. As we are seeing an increase in newcomers at our events, I would love to have more tutorials at the upcoming LLVM Dev Mtg.
I’m curious what tutorials the community would like to see and thinks we need?
Things we have had in the past (that could be repeated with an update too):
This is not an exhaustive list and thank you to all that have done a tutorial!
Not sure if it fits, but I often see newcomers running into build issues, namely:
- LLVM builds hours on my laptop, what can I do?
- I just made
git pull and the project is going to be rebuilt from scratch. Really?
- Build is killed (meaning out of RAM, but apparently few people know that).
- Debug build wastes 200GB disk space, I don’t have that much.
I think one of the tutorials could explain why it happens and possible ways to fix it (i.e. use shared libraries, lld, SSD, ccache, limit the number of parallel compile/link jobs etc.).
Yeah, tutorial covering Building LLVM with CMake seems useful.
One of the questions I see pop up a lot is “How do I build and use custom passes on Windows?” Andrzej Warzyński briefly covered this in his talk in 2019, but it’s easier now and it’s possible to do it from inside Visual Studio where you can actively debug the pass while you’re writing it. Additionally, it’s possible to build them as DLLs that can be used with
--load-pass-plugin instead of having to force LLVM to statically link against it. I just don’t think many beginners realize this because it seems like all the other information elsewhere (my own YouTube tutorial included) is based on doing it on Linux.
I see this a lot in the beginners channel on the Discord. It’s lead me to creating a CMake wrapper script… written in CMake with the sole purpose of getting a minimal toolchain built with a single command. But that doesn’t solve the problem of learning how to build it manually.
With the planned move to Github pull requests - would a tutorial about new best practices, and how to move from the phabricator approach to things be in order?
Part of the confusion for me is not knowing whether the time taken is down to my build choices or to be expected with that hardware.
I think someone presented a build benchmarking script at Euro LLVM, perhaps that could be used to provide a snapshot of some common machines and their build times. It would never match everyone’s machine but it’s a start at least.
Baremetal toolchain is becoming more popular in LLVM world, so I also think this topic could be addressed.
E.g. “How to build a (cross-)toolchain / distribution for a baremetal target; features and limitations”.
Current documentation mainly focuses on “hosted” architectures and multistage builds, which is way off.
There is only How to Cross Compile Compiler-rt Builtins For Arm — LLVM 17.0.0git documentation that is close enough.
I’m hesistant to derail this further, but on the build topic: CMake supports the notion of “cmake presets” via a CMakePresets.json file. Perhaps somebody interested could propose a small set of “typical” presets?
Yes! This is for sure something that we want and plan to have. Perhaps @beanz and @kbarton want to update their talk? Or even @tstellar.
Definitely looking for volunteers!
slightly derailing this further: I feel like cmake preset will be perfect for llvm-tests-suite to replace its predefined cmake caches, which are doing roughly the same thing as cmake preset.
I’d be happy to give a tutorial on building LLVM with CMake—including more advanced topics like building (cross-)toolchain/distribution—given the interest expressed in this topic.
We’re also considering a tutorial on source-based code coverage since this feature can be challenging to use, and we have a lot of experience in this area we could share. Would there be an interest in such a tutorial?
I would. Especially if we get it on YT.
Given that people are asking for interest here, I am happy to find some co-conspirators and prepare something about IPO in LLVM, or (the internals of) (GPU) offloading, or both .
As I beginner, I would like to see some experts teach about, the algorithms implemented in Things like Vectorizer loop and straight-line code vectorizer like SLP, and not just at a very high level, but touching each optimization in a bit of depth, doing a code walkthrough, telling people why it is done the way it is done, and explain some hypothesis, Other modules like Inliner, and some Scalar optimizations would be really helpful. Thanks
This is probably an elephant in the room but having some beginner-friendly LLVM backend tutorials would be nice.
IMHO, having a full-fledged tutorial on write a new backend (from scratch) in a single session is not realistic. So I’m thinking about covering only part of the backend development.
Open question: In your opinion, which parts of backend development make beginners struggle the most?
by backend you mean the codegen part of all of optimization part, btw, I would like to learn Instruction selection algorithm and reg alloc, as well.
TableGen. First you have to understand the difference between the fixed language (syntax, types) and the standard definitions, for example “set” in a dag (set …), and those definitions’ hard-coded behaviour in the TableGen backends. Then if you want to do something that doesn’t seem exactly covered by an existing backend, you have to work out how to tweak the TD code so that the TableGen backends don’t reject it, and the generated code selects instructions as you want.
BTW I remember your TableGen backend talk helped me understand this interplay between backend and defs, thank you.
While we are at the topic of custom backend development, I, as a beginner will be immensely interested in the following tutorials:
- How to create a custom LLVM backend
a. Capture architecture in
b. Write codegen algorithms for the backend
c. Register the backend in general LLVM flow
d. Insert target aware optimization passes
e. What to consider for CPU type architectures and non-CPU type architectures
- How to move towards heterogeneous compiler development
a. From LLVM IR perform codegen that targets more than one distinct architecture
b. What should be role of compiler, runtime, if during execution information needs to be exchanged between the architectures?
- Can codegen be performed directly from MLIR? If yes, is it feasible now? If not, can it be done in near future?
As I have started working newly on MLIR-LLVM part. It will be helpful for a beginners if there is following tutorial:
- How to create new MLIR dialect?
- Its integration test & functionality test with MLIR pipeline.