LLVM Education Initiative

  • Authors: Kit Barton (LLVM Foundation, IBM Canada, @kbarton)
  • Andreas Bergen (University of Toronto, Mississauga, @AndiB)
  • Chris Bieneman (LLVM Foundation, Microsoft, @beanz)
  • Mike Edwards (LLVM Foundation, Modular, @sqlbyme)

Overview

This proposal is the work of four authors (named above) who spent time
discussing challenges faced by sponsors of the LLVM Foundation, contributors to
LLVM, and educators. This captures collected ideas from hours of brainstorming
and discussion. As the proposal moves toward action, community feedback and
participation is crucial to make this effort successful.

The next section of the document describes our foundational understanding of the
challenges this proposal attempts to address, and the later sections contain
concrete proposals for action. The last section contains action items for the
authors and readers who wish to support these efforts.

Background

As the applications of compiler technology expand and compilers themselves grow
increasingly complex, the demand for skilled compiler engineers continues to
grow. Employers are struggling to fill roles and are increasingly hiring
inexperienced engineers and investing in on-the-job training.

Many factors limit employers’ abilities to bring on inexperienced engineers.
These factors include:

  • Sourcing candidates with interest and relevant background.
  • Balancing training candidates with achieving organizational goals.

Some amount of on-the-job training is expected and even desirable, however
having candidates with some background in compilers or related technology can
reduce the onboarding costs of new employees. This is even more significant if
the experience is with LLVM itself. Hiring new employees with even basic
compiler or LLVM experience can free up additional mentoring resources to
support hiring additional candidates.

Today new candidates may be introduced to compiler engineering in university
computer science curriculums. While many of those courses can be interesting and
valuable, they often focus on parsing and general text processing problems.
While there is no doubt that a firm grasp of these aspects of a compiler are integral
to its functioning, they are not the parts of compilers where most compiler
engineers spend most of their time.

Having a comprehensive compiler course that focused on more broad applications
of compiler technologies, like JITs, language productivity tools, IR
optimizations, and code analysis, as well as related topics like language
runtimes, debuggers, and linkers would be immensely valuable toward training
high quality candidates in a broad array of skills useful to professional
compiler engineers.

Enhancing Existing Resources

Our shortest term goals are to engage community volunteers in improving
existing resources. The goal of this first phase is to provide accessible and
up-to-date English-language documentation, tutorials and other reference
materials via llvm.org.

Inventory Existing Resources

One of the challenges in this effort is the absence of a catalog of existing
resources. Knowing what resources exist, the topics those resources cover and
whether or not they are accurate and up-to-date is a foundational step. This
inventory will be scoped to only include resources hosted or distributed via
LLVM Foundation-sponsored sources.

Those sources include, but are not limited to:

Creating this inventory is a large amount of work, but it can be distributed
across a team of volunteers with each individual contributing as much or as
little as their preference and availability allow.

Creating an inventory and surveying the resources will be driven using Google
Forms to allow community participation in reviewing and classifying resources.
Having an inventory of the existing resources is a key step to other community
goals.

Building an On-Ramp

Once the existing resources are inventoried the community can begin building an
onboarding track of resources. These resources should be geared to guide an
individual with C++ programming experience through LLVM’s software architecture,
developer and contribution process, and into advanced topics around LLVM
components.

This onboarding resource will be hosted on llvm.org,
separate from the per-project documentation and will serve as a launching point
into the other resources. The goal of creating an onboarding resource is not to
produce new documentation parallel to the existing documentation sources, but
rather to create a guided path through existing resources.

Creating New Resources

The second phase of the outreach initiative is targeted at expanding our base of
resources and bringing LLVM into Universities to meet students where they are.
This work will require more significant volunteer effort or funding from the
LLVM Foundation or sponsor organizations.

Filling Gaps

As the onboarding track is filled out, there will be gaps in the topics
currently covered in the existing resources. Volunteer support will then be
solicited to fill the gaps by producing written and video documentation. Videos
can be filmed either at LLVM Foundation hosted events or with the support of the
LLVM Foundation at times and places convenient to the creator.

Filling gaps will require both time from individuals to generate new resources
or update existing ones, and potentially funding from the LLVM Foundation to
create additional resources.

University Speaker Series

In order to bring students into the project it is important to meet them where
they are. The program will focus on building a presence for LLVM on campuses
around the world. The initial form of this program will use volunteer speakers
that will be matched with professors and student organizations at universities
near them.

Long-term Investments

Up to this point the proposal has focused around efforts that primarily require
volunteer effort. Volunteer-driven progress will have limits. This section
captures ideas that will require sustained financial investment. The last
subsection in this section proposes avenues of funding which the LLVM Foundation
could pursue.

Localization of Resources

One of the challenges with localizing resources is the cost of high-quality
translation. One-time costs like subtitling recorded videos are easier to
manage than ongoing costs like maintaining translated versions of
llvm.org and the per-project documentation sites.

It is crucial to the community’s prioritization of inclusivity that some
localized resources are available. Starting with the on-ramp resources described
in the first section, and branching out across all our documentation this will
be a concerted long-term investment that will require substantial financial
funding.

Developing an Open Source LLVM-based Curriculum

Reaching students where they are is key. Many prominent universities have
provided open source or public courses for computer science topics. Although
many of those courses are focused on beginner-level computer science topics the
open source nature of compiler development makes it ideal for an open source
curriculum.

Building a full and comprehensive compiler curriculum complete with an open
source textbook, exercises, and lesson plans would be a huge step to bringing
LLVM into classrooms.

As with resource localization, this cannot be a one-time investment. LLVM
changes rapidly as does the field of compiler engineering. The large funding
required to build the curriculum and resources must be coupled with a long-term
funding plan to keep the resources up-to-date.

Funding Big Investments

The ideas described in the subsection above all require dedicated paid resources
to complete. A separate proposal for funding large investments is in progress.
The authors will be proposing to the LLVM Foundation creation of grant and
scholarship programs. The structure of the programs will provide concrete goals
associated with the funding.

Taking Action

Author’s Next Steps

  • Collecting lists of volunteers and educators.
  • Giving a talk at the LLVM Developer Meeting on this RFC.
  • Collect Community Feedback.
    • Hosting a round table at the LLVM Developer Meeting to collect feedback.
    • Collecting feedback from community members not at the Dev meeting.
  • Working with the LLVM Foundation to secure funding.

Reader’s Next Steps

  • If you want to volunteer, fill out our Google Form!
  • Give us feedback here, at the Dev Meeting, via Discord, IRC, or email.
15 Likes

Great effort, thanks a lot!

I think I have it fixed. I’ve clearly never done this before.

1 Like

Some random thoughts regarding hiring:
Both LLVM & companies may want to extend their geographic reach. For example, moving the EuroLLVM conference around would help raise local student interest.
Also, opening smaller offices in other countries will help raising interest. I have a lot of students that don’t want to move abroad (myself included). Some of them don’t want to work remote either.

Then, if you want students trained, you need to support universities and master and PhD programs. It will never be the case that an undergrad compilers course will train students up to the point that they are ready to work professionally on compilers. Directly sponsoring such programs throughout the world would make a big difference (specially in poorer countries where access to funding is limited).
Maybe create a map with universities offering courses and thesis related with LLVM. (“where can I study LLVM?”)

I teach compilers myself to about 350 undergrad students per year. Our course is not based on LLVM because there’s no time for everything. Learning parsers is still important. It’s also nice to implement a simple compiler from parsing to assembly generation. Students get a glowing face when they see their compiler generating assembly for a hello world program. Not sure I could give the same experience by having them implement a small thing in LLVM.
An LLVM-based course would make more sense for an advanced compilers (graduate) class I think.

Finally, having companies providing mentors for undergrad students would be great. I always have a few students that reach out to me after the compiler course asking for small tasks they can do in LLVM. I even get requests from other universities. But I don’t scale infinitely. Nor I’m knowledgeable in all parts of LLVM. Having a wider pool of mentors would be great. As well as a getting started manual (how to checkout the code, compile it, run tests, do a first bug fix, fix tests, submit a patch, find reviewers, social policies like how often to ping reviewers, etc). We have most of this content, but not in a same page; having a page on “how to do your first contribution” would help a lot!

5 Likes

When is this being discussed at the dev meeting? Any chance for a dial in for those unable to attend in person?

Hi @jrheng99, unfortunately the dev meeting does not have any remote participation this year. This is particularly unfortunate because I got sick over the weekend and am not there myself.

This forum is the right place to have an online discussion. Maybe @sqlbyme or @kbarton can post a recap here of the discussion at the round table.

1 Like

Yes, we will post a recap after the meeting. I will also check tomorrow and if it is feasible to make a laptop work with Zoom we will see if we can make that happen. No promises, but I’ll try my best.

I will post a Zoom link here if we can work it out.

Hi all,
We had an excellent roundtable on Wednesday afternoon, following up our quick talk introducing the LLVM Education Initiative. The following are some of the major points that were discussed. I suspect I have missed some things, so please feel free to add additional comments with anything that was missed. Also, I did a very bad job recording people’s names, so if any of the comments below were from you, please feel free to take credit for them (and/or fix them if I got anything wrong).

I would say the conversation had 4 distinct parts:

  1. Organizing existing education material
  2. Ways we make it easier to improve our documentation
  3. Sources of content for additional education material
  4. Collecting Feedback

Organizing existing material
A great suggestion here is to collect and maintain a list of recommended websites for people new to LLVM. I believe something similar was started on Discourse, but I cannot find the post now.

It was also mentioned that speakers should tag/categorize their videos when submitting them for consideration at a dev conference. I believe this is already done, but that information does not transfer over to the talks themselves when the talks are posted on YouTube. This could be a very easy way for us to help provide metadata to talks as they are added to YouTube or the website.

Improving documentation
There was a long discussion about existing documentation, the fact that it is often out of date, and not easy to update. There were two concrete suggestions from this discussion:

  1. A blog post to explain how to update documentation, so people who encounter problems in the documentation know the process to follow to fix them.
  2. Longer term, a better format and review process for the documentation itself. A simple process where users could update the documentation in their browsers would be ideal. An alternative where we convert all docs to markdown, and then use GitHub pull requests for reviews and to commit the documentation would be a significant improvement to what we have now, especially for people who are new to LLVM.

Sources of content for additional material
There was a lot of suggestions in this category - everything from course curriculum for university courses to in-depth learning for companies to provide to new hires to learn LLVM. One concrete item suggested was to take a patch, or series of patches, and work through them to establish a tutorial for a specific piece of functionality. For example, work through a series of patches for Global ISel to understand the basic work needed to add Global ISel to a backend. There are likely several other examples where this could be applied.

People seemed particularly interested in hands-on video tutorials about debugging LLVM. If people could record their debugging sessions when working through a specific bug, and make this available, it allows people to observe the entire process they go through. Perhaps there are already groups that are doing a type of debug education session, where they walk through fixing a bug for others in their group to learn from. If so, could they record them and make them available for the community?

Collecting Feedback
After going through some of our introductory material, we should try to ask people that are new to LLVM “What do you wish I told you?”. This gives us feedback on what they feel like they were missing, or what they wish they had been told. This would be good for feedback from GSOC or Outreachy students, either part-way through their project or at the end of their project and would provide us very valuable feedback on areas where we are lacking information.

7 Likes

We’re growing a compilers team and have prepared some on-boarding material for LLVM work. It includes pointers to existing resources and hands-on tasks like adding removing an assembly instruction, adding an instruction, creating analysis passes, creating simple transformation passes, going all the way to creating simple C intrinsics for custom instructions.

We will be be happy to make this content open-source so it can be used more broadly and improved.

Thank you for doing this!!!

I believe having a dedicated docs repository with markdown files would be a great improvement over the current state.

  • GitHub renders the md files
  • User can file issues and feature requests
  • there is a contributing guide
  • We can do long term planing

I forgot the Swift webpage including the book are Jekyll:

It is basically all markdown files with a bit more flexibility and I can build the webpage myself locally!