Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

I am a grad CS student at Stanford and wanted to engage with EJ Park, Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine Learning and Compiler Optimization concept.

My background is in machine learning, cluster computing, distributed systems etc. I am a good C/C++ developer and have a strong background in algorithms and data structure.

I am also taking an advanced compiler course this quarter at Stanford. So I would be studying several of these topics anyways - so I thought I might as well co-engage on the LLVM compiler infra project.

I am currently studying the background information on SCC Call Graphs, Dominator Trees and other Global and inter-procedural analysis to lay some ground work on how to tackle this optimization pass using ML models. I have run a couple of all program function passes and visualized call graphs to get familiarized with the LLVM optimization pass setup. I have also setup and learnt the use of GDB to debug function pass code.

I have submitted the ML and Compiler Optimization proposal to GSOC 2020. I have added an additional feature to enhance the ML optimization to include crossover code to GPU and investigate how the function call graphs can be visualized as SCCs across CPU and GPU implementations. If the extension to GPU is too much for a summer project, potentially we can focus on developing a framework for studying SCCs across a unified CPU, GPU setup and leave the coding, if feasible, to next Summer. All preliminary ideas.

Not sure how to proceed from here. Hence my email to this list. Please let me know.

Thank you
Shiva Badruswamy
shivastanford@gmail.com

Hi Shiva,

apologies for the delayed response.

> I am a grad CS student at Stanford and wanted to engage with EJ Park,
> Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine
> Learning and Compiler Optimization concept.

Cool!

> My background is in machine learning, cluster computing, distributed
> systems etc. I am a good C/C++ developer and have a strong background in
> algorithms and data structure.

Sounds good.

> I am also taking an advanced compiler course this quarter at Stanford. So I
> would be studying several of these topics anyways - so I thought I might as
> well co-engage on the LLVM compiler infra project.

Agreed :wink:

> I am currently studying the background information on SCC Call Graphs,
> Dominator Trees and other Global and inter-procedural analysis to lay some
> ground work on how to tackle this optimization pass using ML models. I have
> run a couple of all program function passes and visualized call graphs to
> get familiarized with the LLVM optimization pass setup. I have also setup
> and learnt the use of GDB to debug function pass code.

Very nice.

> I have submitted the ML and Compiler Optimization proposal to GSOC 2020. I
> have added an additional feature to enhance the ML optimization to include
> crossover code to GPU and investigate how the function call graphs can be
> visualized as SCCs across CPU and GPU implementations. If the extension to
> GPU is too much for a summer project, potentially we can focus on
> developing a framework for studying SCCs across a unified CPU, GPU setup
> and leave the coding, if feasible, to next Summer. All preliminary ideas.

I haven't looked at the proposals yet (I think we can only after the
deadline). TBH, I'm not sure I fully understand your extension. Also,
full disclosure, the project is pretty open-ended from my side at least.
I do not necessarily believe we (=llvm) is ready for a ML driven pass or
even inference in practice. What I want is to explore the use of ML to
improve the code we have, especially heuristics. We build analysis and
transformations but it is hard to combine them in a way that balances
compile-time, code-size, and performance.

Some high-level statements that might help to put my view into
perspective:

I want to use ML to identify patterns and code features that we can
check for using common techniques but when we base our decision making
on these patterns or features we achieve better compile-time, code-size,
and/or performance.
I want to use ML to identify shortcomings in our existing heuristics,
e.g. transformation cut-off values or pass schedules. This could also
mean to identify alternative (combination of) values that perform
substantially better (on some inputs).

> Not sure how to proceed from here. Hence my email to this list. Please let
> me know.

The email to the list was a great first step. The next one usually is to
setup an LLVM development and testing environment, thus LLVM + Clang +
LLVM-Test Suite that you can use. It is also advised to work on a small
task before the GSoC to get used to the LLVM development.

I don't have a really small ML "coding" task handy right now but the
project is more about experiments anyway. To get some LLVM development
experience we can just take a small task in the IPO Attributor pass.

One thing we need and we don't have is data. The Attributor is a
fixpoint iteration framework so the number of iterations is pretty
integral part. We have a statistics counter to determine if the number
required was higher than the given threshold but not one to determine
the maximum iteration count required during compilation. It would be
great if you could add that, thus a statistics counter that shows how
many iterations where required until a fixpoint was found across all
invocations of the Attributor. Does this make sense? Let me know what
you think and feel free to ask questions via email or on IRC.

Cheers,
Johannes

P.S. Check out the coding style guide and the how to contribute guide!

Hi Johannes - great we are engaging on this.

Some responses now and some later.

  1. When you say setup LLVM dev environment +. clang + tools etc, do you mean setup LLVM compiler code from the repo and build it locally? If so, yes, this is all done from my end - that is, I have built all this on my machine and compiled and run a couple of function passes. I have look at some LLVM emits from clang tools but I will familiarize more. I have added some small code segments, modified CMAKE Lists and re-built code to get a feel for the packaging structure. Btw, is there a version of Basel build for this? Right now, I am using OS X as the SDK as Apple is the one that has adopted LLVM the most. But I can switch to Linux containers to completely wall off the LLVM build against any OS X system builds to prevent path obfuscation and truly have a separate address space. Is there a preferable environment? In any case, I am thinking of containerizing the build, so OS X system paths don’t interfere with include paths - have you received feedback from other developers on whether the include paths interfere with OS X LLVM system build?

  2. The attributor pass refactoring gives some specific direction as a startup project - so that’s great. Let me study this pass and I will get back to you with more questions.

  3. Yes, I will stick to the style guide (Baaaah…Stanford is strict on code styling and so are you guys :)) for sure.

Hi Johannes - great we are engaging on this.

Some responses now and some later.

1. When you say setup LLVM dev environment +. clang + tools etc, do you
mean setup LLVM compiler code from the repo and build it locally? If so,
yes, this is all done from my end - that is, I have built all this on my
machine and compiled and run a couple of function passes. I have look at
some LLVM emits from clang tools but I will familiarize more. I have added
some small code segments, modified CMAKE Lists and re-built code to get a
feel for the packaging structure. Btw, is there a version of Basel build
for this? Right now, I am using OS X as the SDK as Apple is the one that
has adopted LLVM the most. But I can switch to Linux containers to
completely wall off the LLVM build against any OS X system builds to
prevent path obfuscation and truly have a separate address space. Is there
a preferable environment? In any case, I am thinking of containerizing the
build, so OS X system paths don't interfere with include paths - have you
received feedback from other developers on whether the include paths
interfere with OS X LLVM system build?

Setup sounds good.

I have never used OS X but people do and I would expect it to be OK.

I don't think you need to worry about this right now.

2. The attributor pass refactoring gives some specific direction as a
startup project - so that's great. Let me study this pass and I will get
back to you with more questions.

Sure.

3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on
code styling and so are you guys :)) for sure.

For better or worse.

Cheers,

Johannes

  1. Thanks for the clarifications. I will stick to non-containerized OS X for now.

  2. As an aside, I did try to build a Debian docker container by git cloning into it and using the Dockerfile in LLVM/utils/docker as a starting point:

  • some changes needed to updated packages (GCC in particular needs to be latest) and the Debian image (Debian 9 instead of Debian 8) pretty much sets up the docker container well. But for some reason, the Ninja build tool within the CMake Generator fails. I am looking into it. Maybe I can produce a working docker workflow for others who want to build and work with LLVM in a container environment.
  1. I have submitted the final proposal today to GSoC 2020 today after incorporating some comments and thoughts. When you all get a chance to review, let me know your thoughts.

  2. On GPU extension, my thoughts were around what an integrated compiler like Nvidia’s nvcc (GCC for CPU + PTX for GPU) does when GCC is substituted with LLVM and if that arrangement can be optimized for ML passes. But I am beginning to think that structuring this problem well and doing meaningful work over the summer might be a bit difficult. As mentors, do you have any thoughts on how LLVM might be integrated into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.?

Best
Shiva

1. Thanks for the clarifications. I will stick to non-containerized OS X

> for now.

Sounds good. As long as you can build it and run lit and llvm-test suite
tests :slight_smile:

> 2. As an aside, I did try to build a Debian docker container by git cloning
> into it and using the Dockerfile in LLVM/utils/docker as a starting point:
> - some changes needed to updated packages (GCC in particular needs to be
> latest) and the Debian image (Debian 9 instead of Debian 8) pretty much
> sets up the docker container well. But for some reason, the Ninja build
> tool within the CMake Generator fails. I am looking into it. Maybe I can
> produce a working docker workflow for others who want to build and work
> with LLVM in a container environment.

Feel free to propose a fix but I'm the wrong one to talk to :wink:

> 3. I have submitted the final proposal today to GSoC 2020 today after
> incorporating some comments and thoughts. When you all get a chance to
> review, let me know your thoughts.

Good. Can you share the google docs with me
(johannesdoerfert@gmail.com)? [Or did you and I misplaced the link? In
that case send it again ;)]

> 4. On GPU extension, my thoughts were around what an integrated compiler
> like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is substituted
> with LLVM and if that arrangement can be optimized for ML passes.
> But I am beginning to think that structuring this problem well and
> doing meaningful work over the summer might be a bit difficult.

As far as I know, neither GCC nor Clang will behave much differently if
they are used by nvcc than in their standalone mode.

Having an "ML-mode" is probably a generic thing to look at. Though, the
"high-level" optimizations are not necessarily performed in LLVM-IR.

> As mentors, do you have any thoughts on how LLVM might be integrated
> into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.?

I'm unsure what you ask exactly. Clang can be used in CPU-GPU
compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
I'm personally mostly interested in generic optimizations in this space
but actually quite interested. Some ideas:
- transfer latency hiding (another GSoC project),
- kernel granularity optimizations (not worked being worked on yet but
requires some infrastructe changes that are as of now still in the
making),
- data "location" tracking so we can "move" computation to the right
device, e.g., for really dependence free loops like `pragma omp loop`

I can list more things but I'm unsure this is the direction you were
thinking.

Cheers,
Johannes

> Best
> Shiva
>
>>
>>> Hi Johannes - great we are engaging on this.
>>>
>>> Some responses now and some later.
>>>
>>> 1. When you say setup LLVM dev environment +. clang + tools etc, do you
>>> mean setup LLVM compiler code from the repo and build it locally? If so,
>>> yes, this is all done from my end - that is, I have built all this on my
>>> machine and compiled and run a couple of function passes. I have look at
>>> some LLVM emits from clang tools but I will familiarize more. I have
>> added
>>> some small code segments, modified CMAKE Lists and re-built code to get a
>>> feel for the packaging structure. Btw, is there a version of Basel build
>>> for this? Right now, I am using OS X as the SDK as Apple is the one that
>>> has adopted LLVM the most. But I can switch to Linux containers to
>>> completely wall off the LLVM build against any OS X system builds to
>>> prevent path obfuscation and truly have a separate address space. Is
>> there
>>> a preferable environment? In any case, I am thinking of containerizing
>> the
>>> build, so OS X system paths don't interfere with include paths - have you
>>> received feedback from other developers on whether the include paths
>>> interfere with OS X LLVM system build?
>>
>> Setup sounds good.
>>
>> I have never used OS X but people do and I would expect it to be OK.
>>
>> I don't think you need to worry about this right now.
>>
>>> 2. The attributor pass refactoring gives some specific direction as a
>>> startup project - so that's great. Let me study this pass and I will get
>>> back to you with more questions.
>>
>> Sure.
>>
>>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on
>>> code styling and so are you guys :)) for sure.
>>
>> For better or worse.
>>
>> Cheers,
>>
>> Johannes
>>
>>>
>>>> Hi Shiva,
>>>>
>>>> apologies for the delayed response.
>>>>
>>>> > I am a grad CS student at Stanford and wanted to engage with EJ
>> Park,
>>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop the
>> Machine
>>>> > Learning and Compiler Optimization concept.
>>>>
>>>> Cool!
>>>>
>>>> > My background is in machine learning, cluster computing, distributed
>>>> > systems etc. I am a good C/C++ developer and have a strong
>> background in
>>>> > algorithms and data structure.
>>>>
>>>> Sounds good.
>>>>
>>>> > I am also taking an advanced compiler course this quarter at
>>>> Stanford. So I
>>>> > would be studying several of these topics anyways - so I thought I
>>>> might as
>>>> > well co-engage on the LLVM compiler infra project.
>>>>
>>>> Agreed :wink:
>>>>
>>>> > I am currently studying the background information on SCC Call
>> Graphs,
>>>> > Dominator Trees and other Global and inter-procedural analysis to
>> lay
>>>> some
>>>> > ground work on how to tackle this optimization pass using ML models.
>>>> I have
>>>> > run a couple of all program function passes and visualized call
>> graphs
>>>> to
>>>> > get familiarized with the LLVM optimization pass setup. I have also
>>>> setup
>>>> > and learnt the use of GDB to debug function pass code.
>>>>
>>>> Very nice.
>>>>
>>>> > I have submitted the ML and Compiler Optimization proposal to GSOC
>>>> 2020. I
>>>> > have added an additional feature to enhance the ML optimization to
>>>> include
>>>> > crossover code to GPU and investigate how the function call graphs
>> can
>>>> be
>>>> > visualized as SCCs across CPU and GPU implementations. If the
>>>> extension to
>>>> > GPU is too much for a summer project, potentially we can focus on
>>>> > developing a framework for studying SCCs across a unified CPU, GPU
>> setup
>>>> > and leave the coding, if feasible, to next Summer. All preliminary
>>>> ideas.
>>>>
>>>> I haven't looked at the proposals yet (I think we can only after the
>>>> deadline). TBH, I'm not sure I fully understand your extension. Also,
>>>> full disclosure, the project is pretty open-ended from my side at least.
>>>> I do not necessarily believe we (=llvm) is ready for a ML driven pass or
>>>> even inference in practice. What I want is to explore the use of ML to
>>>> improve the code we have, especially heuristics. We build analysis and
>>>> transformations but it is hard to combine them in a way that balances
>>>> compile-time, code-size, and performance.
>>>>
>>>> Some high-level statements that might help to put my view into
>>>> perspective:
>>>>
>>>> I want to use ML to identify patterns and code features that we can
>>>> check for using common techniques but when we base our decision making
>>>> on these patterns or features we achieve better compile-time, code-size,
>>>> and/or performance.
>>>> I want to use ML to identify shortcomings in our existing heuristics,
>>>> e.g. transformation cut-off values or pass schedules. This could also
>>>> mean to identify alternative (combination of) values that perform
>>>> substantially better (on some inputs).
>>>>
>>>> > Not sure how to proceed from here. Hence my email to this list.
>>>> Please let
>>>> > me know.
>>>>
>>>> The email to the list was a great first step. The next one usually is to
>>>> setup an LLVM development and testing environment, thus LLVM + Clang +
>>>> LLVM-Test Suite that you can use. It is also advised to work on a small

Hi Johannes:

  1. Attached is the submitted PDF.

  2. I have a notes section where I state: I am still unsure of the GPU extension I proposed as I dont know how LLVM plays into the GPU cross over space like how nvcc (Nvidia’s compiler integrates gcc and PTX) does.I dont know if there is a chance that function graphs in the CPU+GPU name spaces are seamless/continupus within nvcc or if nvcc is just a wrapper that invokes gcc on the cpu sources and ptx on the gpu sources. So what I have said is - if there is time to investigate we could look at this. But I am not sure I am even framing the problem statement correctly at this point.

  3. I have added a tentative tasks section and made a note that the project is open ended and things are quite fluid and may change significantly.

Cheers
Shiva

Final_Proposal_ MachineLearningAndCompilerOptimization.pdf (39.9 KB)

Hi Johannes:

>
> 1. Attached is the submitted PDF.

I thought they make you submit via gdoc and I also thought they wanted a
timeline and had other requirements. Please verify this so it's not a
problem (I base this on the proposals I've seen this year and not on the
information actually provided by GSoC).

> 2. I have a notes section where I state: I am still unsure of the GPU
> extension I proposed as I dont know how LLVM plays into the GPU cross over
> space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.

You can use clang as "host compiler". As mentioned before, there is
clang-cuda and OpenMP offloading also generates PTX for the GPU code.

> I dont know if there is a chance that function graphs in the CPU+GPU
> name spaces are seamless/continupus within nvcc or if nvcc is just a
> wrapper that invokes gcc on the cpu sources and ptx on the gpu
> sources.

Something like that as far as I know.

> So what I have said is - if there is time to investigate we could
> look at this. But I am not sure I am even framing the problem
> statement correctly at this point.

As I said, I'd be very happy for you to also work on GPU related things,
what exactly can be defined over the next weeks.

GPU offloading is by nature inter-procedural (take CUDA kernels) so
creating the infrastructure to alter the granularity of kernels
(when/where to fuse/split them) could be a task. For this it is fairly
important (as far as I know now) to predict the register usage
accurately. Using learning here might be interesting as well.

As you mention in the pdf, one can also split the index space to balance
computation. When we implement something like `pragma omp loop` we can
also balance computations across multiple GPUs as long as we get the
data movement right.

> 3. I have added a tentative tasks section and made a note that the
> project is open ended and things are quite fluid and may change
> significantly.

That is good. This is a moving target and open ended task, I expect
things to be determined more clearly as we go and based on the data we
gather.

Cheers,
Johannes

  1. Draft proposals via gdoc. Final via PDF.
  2. I did not see any timeline requests from GSoC but spring quarter ends June 6 or so or maybe by a week more due to Coronavirus schedule delays. Summer begins then. I will look into it some more in the morning and see what I can add to timelines.

Thanks.

Hi Johannes at al:

  1. I could not see anywhere that GSoC 2020 requires a timeline in the proposal. If anything they have laid down their own timelines which meets with my summer break, starting in June.

  2. Here’s their timeline link: https://developers.google.com/open-source/gsoc/timeline

Best
Shiva