GSoC Opportunity

Dear LLVM Team,

I would like to contribute to/participate in LLVM’s GSOC, because I would very much like to combine my knowledge of graph theory/algorithms and my interest in C++ together. Contributing to the LLVM code seems like a fantastic challenge and learning experience for these two interests of mine, as well as computer science in general (For example, the use of a new syntactic category to disambiguate a grammar demonstrates 1) indirection 2) the power of naming things).

But to be up front about this, I have not done any full scale C++ project(Although we had to modify the Linux kernel in my OS class, that was in C). However, I do believe my C++ skills are at an intermediate level, as C++, like Python, is a language in which I will spend my free time learning more about. Like vim, there is always more to learn in C++, and to that end I will watch CppCon Videos or peruse blogs such as Fluent C++(which is a treasure trove of material to nerd out on) in my free time. I also have a layman’s knowledge of CMake, from using it to configure ccls to lint C++ code with specific flags, and am aware of Google’s Test framework. Finally, I am currently taking Professor Stroustrap’s C++ class, and the compilers course here at Columbia.

Regarding the logistics:

  1. Do I need to submit a resume/screening/patches?

  2. Although I do have interests in certain projects posted on you website(Implement missing tab completion, createLoopPass, and PostDominatorTree), I am uncertain if I have enough expertise to decide what would be an appropriate project to contribute to given my current knowledge and experience.

  3. The GCC GSoC website suggested checking out their source code, compiling and running their test suite. Can I do something similar for LLVM?

Anyways, thank you for taking the time to read this email, and I hope to hear back!

Best regards,

Benson Li

Hi Benson,

You’re welcome to the LLVM community!

I’ll try to help but note that I’m no formal position to talk about how LLVM decides about GSoC (I’m a LLVM newcomer anyway).
With that said, the rest is my opinion which is partially formed from my experience as a GSoC student.

But to be up front about this, I have not done any full scale C++ project
Depending on how you define “full-scale”, a lot of amazing LLVM contributors have not done a full-scale C++. So, I think no problem there, it’s just good to have a relatively good knowledge of C++.
Talking about C++ skills, I think they’re more important if you want to contribute to Clang than say LLVM middle or back-end. Because for Clang, you have to know a lot of details of the language
in order to parse it, type-check it and generate LLVM IR. In most other parts of LLVM, you’re only using the language.
As a matter of fact, if you have a good knowledge of C++, I believe it’s more important to be able to understand and adapt to “nearby” code, than to be an expert in C++.
The latter can even be problematic if you start applying C++ craziness while the first is pretty much always needed when working in a team project.

  1. Do I need to submit a resume/screening/patches?
    As far as the resume, in the way that you may usually apply to jobs, no. But I think every good GSoC proposal includes a biography-like section
    where you basically tell your story in programming and how you fit into the project (in our case, LLVM).

I’m not sure what you mean by screening.

As for patches, I don’t think they’re required but they’re super useful. Not because they’re some part of unrelated logistics (like “you have to have X patches to be considered”).
But because submitting good patches is one of the best indicators (if not the best) that you are able to do useful work in this project. :slight_smile:
And they don’t only show your technical skills. But also communication skills, independence etc.

  1. Although I do have interests in certain projects posted on you website(Implement missing tab completion, createLoopPass, and PostDominatorTree), I am uncertain if I have enough expertise to decide what would be an appropriate project to contribute to given my current knowledge and experience.
    This is kind of a generic sentence.
    I’d say, start with finding a project that you’re truly interested in. Then, try to study it, understand the context and the problem.
    You don’t need to get very far, that’s totally ok. You can then do a post (either here or on Discourse: https://llvm.discourse.group/c/community/gsoc/32)
    for this specific project (you can do posts for multiple projects).
    Hopefully, by discussing with people (and mentors) and understanding what the project is asking better,
    you can find if you want to do it or not. Certainly, the mentors of the project can guide you through.
  1. The GCC GSoC website suggested checking out their source code, compiling and running their test suite. Can I do something similar for LLVM?
    Yes, totally. I’m not familiar with GCC internals but running the LLVM suite is super easy (so easy that you don’t really learn anything by doing it :stuck_out_tongue: )
    So, the LLVM project has moved to a common repository: https://github.com/llvm/llvm-project
    You can clone the project and then use CMake to build it. The cmake configuration for LLVM has a bunch of flags: https://llvm.org/docs/CMake.html
    and you may get lost. So, I’ll say start simple:
    Go to the llvm-project dir (the one you cloned) and do:
    cmake ./llvm -DLLVM_ENABLE_PROJECTS=“clang” -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_TARGETS_TO_BUILD=“X86”

In the link above you can read what the flags do. llvm middle / back-end (i.e. opt / llc, ask if you don’t know what these mean) is always built. But to build clang
we have to enable it explicitly. We set build type to release because doing a debug build will take a lot of time and a lot of space. Also, when starting out,
you probably don’t need it. We enable assertions mostly because you can use the -debug option say in opt and see debug prints.
Finally, we only build for x86 arch because that’s probably what you have and you don’t need any other for now.

Hit enter and once the configuration is complete you can do:
make
or
make -j ← this is faster but limit it depending on your systemS

When that’s finished, the llvm-project/bin/ dir will have executables like clang, clang++, opt, llc etc.
Which you can run (also ask if you don’t know what to do with them. With clang you probably will know, it’s like invoking
most compilers like gcc to compile .c / .cpp files).

To run the test suite, you can go to llvm-project/llvm/test and do:

/bin/llvm-lit . That will run only llvm's test suite but you'll get an idea.

Also, you can watch these videos:
https://www.youtube.com/watch?v=J5xExRGaIIY
https://www.youtube.com/watch?v=5kkMpJpIGYU

Hope this helped!

Kind regards,
Stefanos Baziotis

Στις Σάβ, 14 Μαρ 2020 στις 2:04 π.μ., ο/η Benson Bin Bin Li via llvm-dev <llvm-dev@lists.llvm.org> έγραψε:

Hi Stefanos,

First, thanks a lot for the very detailed response! I watched both of the videos, and I seem to have a rough idea now of how each of the different pieces of software maps onto the compilation process. Though I found blogs such as these two: https://jonasdevlieghere.com/understanding-the-clang-ast/, https://releases.llvm.org/2.6/docs/tutorial/JITTutorial1.html to be better for a more in-depth understanding. Anyways, in response to your answers:

The latter can even be problematic if you start applying C++ craziness while the first is pretty much always needed when working in a team project.

Ok, that makes sense as you would want the style to be consistent throughout.

running the LLVM suite is super easy

Yeah, everything went fine from following your instructions. I do have a question though: How do I diagnose failed tests? I found the files that correspond to them, and they seem to be 1 line scripts rather than “code” per say.

But I think every good GSoC proposal includes a biography-like section

Then, try to study it, understand the context and the problem.

But because submitting good patches is one of the best indicators

Ok, so for the application process, basically try to get more info on the projects I am interested in and from there submit a proposal? Given the whole coronavirus situation and the time remaining for the application, I probably do not have the time to get a patch through. Regarding the projects I am interested in, I have narrowed it down to two(mostly because I don’t think I have the ability to tackle PostDominatorTree project as of now), and have the following questions about them:

LLVM Pass

  1. I am following the guide to create a LLVM pass following this guide(https://llvm.org/docs/WritingAnLLVMPass.html), but it appears “add_llvm_library” is a macro and not a built-in command. So I have two questions. 1) In comparing the online repo I found this macro in and my local, it appears I don’t have the file. Do I need to build it then? 2) How do I tell CMake to look for this macro?

  2. Is there a specific section of the dragon book that I should read so that I can at least understand theoretically what it means to create a LoopNestPass?

LLDB Tab Completion

  1. Is there any resource I can read that explains how lldb is able to “pause” the executable and map it to a certain line in the source file/in general how lldb represents the state of the executable?

  2. Where in the source code can I go to see how existing tab completions are implemented?

  3. I built lldb and check-lldb, but it seems that the call path to clang got messed up, as it is trying to call “Example=Code/llvm-project” rather than my actual name for the directory “Example-Code/llvm-project”. Should I just clone the repo into a parent directory that doesn’t use hyphen?

(Would it be better if I posted this on the forum?)

Best regards,
Benson

Stefanos can speak to this more but in order to create a LoopNestPass after reading what they are talking about requires information from the call graph for a function or the loop hierarchy in LLVM IR. I’m not sure of the internal classes for this so Stefanos is there a way currently to get the info in IR about the outer loop or from the call graph? That seems to be the biggest problem getting the outer loop in the IR or the call graph. After that you would basically check if the loop is the outer loop and if so you can add dynamically to the pipeline. Sorry if I’m not much help as I’m not sure if the call graph API supports this but I’m pretty sure LLVM IR doesn’t make this easy, Nick

Hi everyone,

I probably do not have the time to get a patch through.
IMHO, you do. :slight_smile:

First of all, @Benson sorry but I’m not at all familiar with LLDB so I can’t help there.

Other than that, I’ll also disappoint you both probably because I’m not that familiar with the creation of passes and the problem at hand. I’ll try to help as I can.

Is there a specific section of the dragon book that I should read so that I can at least understand theoretically what it means to create a LoopNestPass? As I can understand, no because it’s more of a structural, LLVM-specific problem than a generic, compiler optimization problem. > Stefanos can speak to this more but in order to create a LoopNestPass after reading what they are talking about requires information from the call graph
for a function or the loop hierarchy in LLVM IR. I’m not sure of the internal classes for this so Stefanos is there a way currently to get the info in IR about
the outer loop or from the call graph? That seems to be the biggest problem getting the outer loop in the IR or the call graph. After that you would
basically check if the loop is the outer loop and if so you can add dynamically to the pipeline.

I’m not sure I followed you here. First of all, if you create a regular LoopPass, you’ll visit loops from the innermost to the outermost. In the loop nest pass
you want the outermost though, so you’ll have to visit them all until you there. Now if you do it in a function pass, you lose the ability to put loops
back into the pipeline, as this is how the function pass works. So, the way I understand it, to solve that problem, one would create something like a function
pass, figure out the loops there (i.e. with LoopInfo), then convert it to LoopPass so that you can run loop passes over the loops.
I think this can happen already, but right now, loops are going in reverse order: https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h#L230
So, maybe if you could modify that to something like FunctionToLestNestPassAdaptor, it would work? I don’t know that’s just an idea, let me not confuse you more.

Best,
Stefanos

Στις Δευ, 16 Μαρ 2020 στις 5:53 π.μ., ο/η Nicholas Krause <xerofoify@gmail.com> έγραψε:

Hi again,

I just remembered that another person had asked about the LoopNestPass project. I found the discussion here: https://groups.google.com/forum/#!msg/llvm-dev/M3XY7Gf87AI/f8N3W3wBBQAJ
I didn’t remember that Whitney Tsang mentioned the LoopPassManager too. You can see in this discussion that she mentions some code that she has already created and you can start there.
You can also email the mentors and they certainly will have more to say than me.

Kind regards,
Stefanos

Στις Τρί, 17 Μαρ 2020 στις 5:41 π.μ., ο/η Stefanos Baziotis <stefanos.baziotis@gmail.com> έγραψε:

I am following the guide to create a LLVM pass following this guide(https://llvm.org/docs/WritingAnLLVMPass.html), but it appears “add_llvm_library” is a macro and not a built-in command. So I have two questions. 1) In comparing the online repo I found this macro in and my local, it appears I don’t have the file. Do I need to build it then? 2) How do I tell CMake to look for this macro?

That guide is written for writing passes for the old pass manager.
In this GSoC project, we aim to create LoopNestPass in the new pass manager (NPM).
In https://www.youtube.com/watch?v=3pRhvQi7Z10 , you can try to follow along to create a LLVM loop pass in NPM.

There exists different kind of passes in the NPM, e.g. ModulePass, FunctionPass, LoopPass.
One or more loop passes can be added in a LoopPassManager, which then can be added in FunctionPassManager through createFunctionToLoopPassAdaptor.
Examples can be found in llvm/lib/Passes/PassBuilder.cpp.
There exists passes that best operate as a loop nest, e.g. LoopInterchange. For those passes, currently can be written as either FunctionPass or LoopPass.
However, choicing one or the other need to sacrifice the ability of the other.
The idea of a LoopNestPass is to combine the benifits of FunctionPass and LoopPass needed for a loop nest.

On top of that I would suggest to also get familiar with

  1. The LoopNest class
  • llvm/include/llvm/Analysis/LoopNestAnalysis.h
  • llvm/lib/Analysis/LoopNestAnalysis.cpp
  1. Different PassAdaptor classes (e.g. FunctionToLoopPassAdaptor)
  • llvm/include/llvm/Transforms/Scalar/LoopPassManager.h

Please feel free to email me or Ettore if you encounter any blockers, or have further questions.

Regards,
Whitney Tsang

Yes that’s correct. My idea was similar but using the call graph directly. The other problem is how to keep LCSSA form for all the loops as well and I’m aware that function passes don’t care about that. So you can’t really convert to a function pass itself but something similar. Nick

Yes that’s correct.
Well, now that I saw the LoopNestAnalysis* files, they try to do sth similar. So, I hope it helped.

My idea was similar but using the call graph directly

Personally I don’t see how the call graph can help you, since well… it’s a call graph. :slight_smile:
You care about loops in a specific function. What can help you is the Control-Flow graph, which is basically what LoopInfo uses to identify loops in a function.
But because of that, loop identification is not your problem, loop traversing is, if I understand it correctly.
Although you have to do things similar to loop identification (i.e. what LoopInfo does) when trying to
decide for perfect nestings etc.

Best,
Stefanos

Στις Τρί, 17 Μαρ 2020 στις 3:08 μ.μ., ο/η Nicholas Krause <xerofoify@gmail.com> έγραψε:

Not directly but you can could implement a call graph for the loops internal to a function and walk up it backwards. In addition you could make it possible to implement this call graph to know the number of loops nested and pop out to the outermost as a function. Basically SCC for loops themselves rather than functions. That’s probably beyond the scope of the project so your right it doesn’t matter for this, Nick

Hi Nick,

What you said makes sense, but it’s not called a call graph. :slight_smile:
You’re essentially referring to what LoopInfo does which makes sense, but as I mentioned earlier, this is already done
in the LoopInfo. Now, how much one will be able to use it in a LoopNestPass is another issue, which is certainly
something that mentors could help you with.

Best,
Stefanos

Στις Τρί, 17 Μαρ 2020 στις 3:35 μ.μ., ο/η Nicholas Krause <xerofoify@gmail.com> έγραψε:

Sure I assumed so that was just a term I used by mistake :). I’m not applying for GSoC but that’s a hint if other students are applying to help them get started. Nick

IMHO, you do. :slight_smile:

Lol, you have too much faith in me. On a more serious note, how do I know what issue can be solved in a reasonable amount of time/how do I search for one?

Please feel free to email me or Ettore if you encounter any blockers, or have further questions.

Hi Whitney, thanks for the video link and info. I was a bit busy today, but I will work tomorrow and get back to you on this!

Best,
Benson

Ohh also, Raphael Isemann got back to me on the LLDB tab completion project, and said that another student has already written a proposal. He also suggested that I ask if another student has already submitted a proposal for this one, and if so, to look for another llvm project within the given time remaining.

Hi to both,

I’m not applying for GSoC but that’s a hint if other students are applying to help them get started.
Yes I agree, thanks for bringing up the topic.

On a more serious note, how do I know what issue can be solved in a reasonable amount of time/how do I search for one?
Well, usually you don’t. If you’re lucky, someone will be able to provide you with some issues that you can get started with in the part of LLVM that you’re interested
(independent of GSoC or not). If you can’t get such an answer, then one thing you can try is search for TODOs and FIXMEs and see if you can tackle any of those.

He also suggested that I ask if another student has already submitted a proposal for this one, and if so, to look for another llvm project within the given time remaining.
In the discussion I posted above, you can see that probably there’s another person interested.

But of course multiple people can apply to the same project. Realistically speaking, you should apply where you think you have a chance, so that you can
devote time that is useful in the end for both parties.

One important thing though is: If you’re interested in LLVM, start whether you’ll get a proposal or not.
GSoC is good because it motivates people to start contributing to a project even if they don’t get GSoC in the end. And in the long run that’s
more important (and not only because the next year it will be way easier to get a GSoC).

Best,
Stefanos Baziotis

Στις Τετ, 18 Μαρ 2020 στις 8:51 π.μ., ο/η Benson Bin Bin Li <bbl2117@columbia.edu> έγραψε:

Just to clarify my point about the “asking if there is already another student applying”:

In my situation there is already a student that applied 3 or 4 weeks ago and that already has a finished application where I gave multiple rounds of feedback and that already landed a few patches that convinced me he’s a good candidate. I just told Benson that he can (of course) apply but that he might want to see if there is still a project without a student that he finds interesting. The reason here is simply that I don’t want Benson to spend the remaining time on an application where there is a high chance that another student gets selected, while at the same we have useful GSoC projects without any applications that end up not happening.

My goal is to have at least one good application for every project we put up, as that benefits both LLVM and students.

Hi Raphael,

Thanks for the clarification. I note that in this case, I also think that Benson should preferably find a different project as that would probably be better for everybody.

Best,
Stefanos Baziotis

Hi everyone,

If you’re interested in LLVM, start whether you’ll get a proposal or not.
GSoC is good because it motivates people to start contributing to a project even if they don’t get GSoC in the end.

Yeah, I definitely am interested, and given this whole coronavirus situation, will probably still want to contribute over the summer even if I don’t get selected. So if there is a project that still needs a student, it does work out better for both parties. But in any case, thanks for letting me know Stefanos. I will probably try to find a different project then.

Best,
Benson

Hi everyone,

Best of luck Benson, we hope to see you around here again either way.

Best,
Stefanos Baziotis

Στις Τετ, 18 Μαρ 2020 στις 8:08 μ.μ., ο/η Benson Bin Bin Li <bbl2117@columbia.edu> έγραψε:

Thanks Stefanos!