Dynamic optimalization passes in LLVM based compiler

Hi!
I’m new to LLVM but I’ve read tons of articles, I want to implement my own compiler and I came across a big problem.
I have several questions, that I cannot answer myself:

  1. If I’m writing custom compiler do I have to “hardcode” passes that it uses (like in Kaleidoscope example: http://llvm.org/docs/tutorial/LangImpl4.html) or I have to generate LLVM IR and then use the ‘opt’ tool to run selected passes on generated code?
    I think the solution with opt is not quite good, because the opt tool has to parse the LLVM IR (or BC) input file, which is not needed, because we are generating it, so we have had it in memory before.
    Maybe there is another better solution allowing for enabling and disabling passes in custom compiler with argument options like in opt?

  2. I want to write compiler that does NOT generate LLVM IR by its own, it should simply run one of available module passes and such pass will generate LLVM IR.
    The motivation behind this decision is that I want to have a graph (C++ serialized structure) as compiler input and I want to load this graph as pass, run other passes (which will modify this graph) and then run a “conversion module pass”, which will convert this graph into LLVM IR. Additional I want to be able to read several formats and because of that I want to load this graph as a pass. (This pass will be of course grouped with other “load passes”)

Could you please tell me what will be the best (most flexible and easy) solution to do this, keeping in mind the first question?

I have an idea of solution (which does not work completely) - the idea is to create an compiler which will initialize the base module and will do nothing at all. Then I can use the opt tool with my module passes, which will load, modify graph and convert it to LLVM IR (with IRBUilder) - the problem is if the opt could be run without input file and if it will handle correctly this situation.

I was researching very long and I have not found any good answer for these problems.
I would be very thankful for any help!

Hi!
I'm new to LLVM but I've read tons of articles, I want to implement my own
compiler and I came across a big problem.
I have several questions, that I cannot answer myself:

1) If I'm writing custom compiler do I have to "hardcode" passes that it
uses (like in Kaleidoscope example:
http://llvm.org/docs/tutorial/LangImpl4.html) or I have to generate LLVM IR
and then use the 'opt' tool to run selected passes on generated code?
I think the solution with opt is not quite good, because the opt tool has to
parse the LLVM IR (or BC) input file, which is not needed, because we are
generating it, so we have had it in memory before.
Maybe there is another better solution allowing for enabling and disabling
passes in custom compiler with argument options like in opt?

I believe Clang just hardcodes passes. If you a user wants to
experiment with different pass options they can use the option to
generate LLVM bitcode from Clang then pass that to opt themselves.

2) I want to write compiler that does NOT generate LLVM IR by its own, it
should simply run one of available module passes and such pass will generate
LLVM IR.
The motivation behind this decision is that I want to have a graph (C++
serialized structure) as compiler input and I want to load this graph as
pass, run other passes (which will modify this graph) and then run a
"conversion module pass", which will convert this graph into LLVM IR.
Additional I want to be able to read several formats and because of that I
want to load this graph as a pass. (This pass will be of course grouped with
other "load passes")

LLVM's pass system is for IR transformations only. Anything else you
want to do you'll have to build separately/in front of LLVM. Once your
other system generates IR, then you can pass it to LLVM.

Thank you for yours response :slight_smile:
I know that LLVM Pass was designed to transform IR, but lets focus on an example - LLVM Pass is a function that transform some set of input into output. It can transform IR into graph of lets say strongly connected components and then other passes can use it (that data - not IR) to generate other data OR to manipulate the IR.

So why I can not create passes, that would need data generated by other passes (ie. graph loaded from disk) and then transform it into LLVM IR? I do not see any difference between these cases.
Am I wrong?

2012/11/17 David Blaikie <dblaikie@gmail.com>

Thank you for yours response :slight_smile:
I know that LLVM Pass was designed to transform IR, but lets focus on an
example - LLVM Pass is a function that transform some set of input into
output. It can transform IR into graph of lets say strongly connected
components and then other passes can use it (that data - not IR) to generate
other data OR to manipulate the IR.

So why I can not create passes, that would need data generated by other
passes (ie. graph loaded from disk) and then transform it into LLVM IR? I do
not see any difference between these cases.
Am I wrong?

A little. That would be stretching the concepts/machinery of LLVM a
little bit far, probably.

A few minor corrections:

Transformations in the LLVM sense are always IR to IR.
When you talk about SSC & the like, those are analyses - an Analysis
never modifies the IR, it only computes values from the IR it's
given. Transformations then depend on (& invalidate) analyses to
decide what transformations to perform.

What you're proposing is an analysis that doesn't analyze the IR at
all (because there is none) - it loads information from an external
source. There is one example (though I'm not sure if it's phrased as
an Analysis) of that that I can think of in the current IR: profile
guided optimization. The profile must be loaded from some external
source, references built up to the IR, and then Transformations can
depend on this information when choosing how to optimize.

Effectively your graph transformations would exist purely as analyses
- transforming non-IR data from pass to pass until you reached some
transformation that would transform null IR into the actual IR
represented by the graph from the analyses.

It's not really going to give you a lot of value compared to just
building your own graph transformation pipeline & then producing IR at
the end of that.

To come back to your original question: "I want to write a compiler
that does NOT generate LLVM IR by its own, it should simply run one of
available module passes and such pass will generate LLVM IR" - why do
you want to do this? You're just going to have to write the graph-IR
transformation sooner or later anyway? Why not do it as the first step
& then do IR level optimizations? (I'm not saying there's no reason to
do this, I'm just wondering what /your/ reasons are)

- David

I know that LLVM Pass was designed to transform IR, but lets focus on an
example - LLVM Pass is a function that transform some set of input into
output. It can transform IR into graph of lets say strongly connected
components and then other passes can use it (that data - not IR) to generate
other data OR to manipulate the IR.

So why I can not create passes, that would need data generated by other
passes (ie. graph loaded from disk) and then transform it into LLVM IR? I do
not see any difference between these cases.
Am I wrong?

A little. That would be stretching the concepts/machinery of LLVM a
little bit far, probably.

A few minor corrections:

Transformations in the LLVM sense are always IR to IR.
When you talk about SSC & the like, those are analyses - an Analysis
never modifies the IR, it only computes values from the IR it’s
given. Transformations then depend on (& invalidate) analyses to
decide what transformations to perform.

You are right, my nomenclature was wrong - I want to write analysis passes and one transformation pass genrating LLVM IR.

What you’re proposing is an analysis that doesn’t analyze the IR at
all (because there is none) - it loads information from an external
source. There is one example (though I’m not sure if it’s phrased as
an Analysis) of that that I can think of in the current IR: profile
guided optimization. The profile must be loaded from some external
source, references built up to the IR, and then Transformations can
depend on this information when choosing how to optimize.

Effectively your graph transformations would exist purely as analyses

  • transforming non-IR data from pass to pass until you reached some
    transformation that would transform null IR into the actual IR
    represented by the graph from the analyses.

It’s not really going to give you a lot of value compared to just
building your own graph transformation pipeline & then producing IR at
the end of that.

It allows me to use LLVM dependency pass manager - with analysis groups etc. I would have to write exactly the same the other way, I think.

To come back to your original question: "I want to write a compiler

that does NOT generate LLVM IR by its own, it should simply run one of

available module passes and such pass will generate LLVM IR" - why do
you want to do this? You’re just going to have to write the graph-IR
transformation sooner or later anyway? Why not do it as the first step
& then do IR level optimizations? (I’m not saying there’s no reason to
do this, I’m just wondering what /your/ reasons are)

The answer is simple - In the graph loaded from disk there is a lot more information than in generated IR, so I want to do some transformations on the beginning. (There are other reasons, but this one is one of the biggest).

I would love to ask you one more question. You have written:

It’s not really going to give you a lot of value compared to just
building your own graph transformation pipeline & then producing IR at
the end of that.

Could you please tell more about this topic? Why my custom solution (custom pass manager etc) would be better than making LLVM (non IR) passes?

2012/11/17 Wojciech Daniło <wojtek.danilo.ml@gmail.com>

I would love to ask you one more question. You have written:

It's not really going to give you a lot of value compared to just
building your own graph transformation pipeline & then producing IR at
the end of that.

Could you please tell more about this topic? Why my custom solution (custom
pass manager etc) would be better than making LLVM (non IR) passes?

Disclaimer: I don't generally work with the middle end optimization
pipeline, so my opinions aren't the most important/relevant/informed,
just a best guess (though I expect others will chime in if I'm too far
wrong).

The LLVM pass infrastructure is currently designed around IR analyses
& transformations - that's its purpose. What you're proposing would be
a rather esoteric use (I'd hesitate to say abuse, but I think it
borders on that) of that infrastructure & I (naively) expect that such
a difference of use from intent would result in a reasonable amount of
friction/problems. Short of redesigning the pass infrastructure to be
more general (& I doubt there's sufficient justification for such a
redesign that would be acceptable upstream) I suspect using the LLVM
pass infrastructure in this way would just be more pain than gain.

I could be wrong, though.

I know that LLVM Pass was designed to transform IR, but lets focus on an
example - LLVM Pass is a function that transform some set of input into
output. It can transform IR into graph of lets say strongly connected
components and then other passes can use it (that data - not IR) to generate
other data OR to manipulate the IR.

So why I can not create passes, that would need data generated by other
passes (ie. graph loaded from disk) and then transform it into LLVM IR? I do
not see any difference between these cases.
Am I wrong?

A little. That would be stretching the concepts/machinery of LLVM a
little bit far, probably.

A few minor corrections:

Transformations in the LLVM sense are always IR to IR.
When you talk about SSC & the like, those are analyses - an Analysis
never modifies the IR, it only computes values from the IR it’s
given. Transformations then depend on (& invalidate) analyses to
decide what transformations to perform.

You are right, my nomenclature was wrong - I want to write analysis passes and one transformation pass genrating LLVM IR.

What you’re proposing is an analysis that doesn’t analyze the IR at
all (because there is none) - it loads information from an external
source. There is one example (though I’m not sure if it’s phrased as
an Analysis) of that that I can think of in the current IR: profile
guided optimization. The profile must be loaded from some external
source, references built up to the IR, and then Transformations can
depend on this information when choosing how to optimize.

Effectively your graph transformations would exist purely as analyses

  • transforming non-IR data from pass to pass until you reached some
    transformation that would transform null IR into the actual IR
    represented by the graph from the analyses.

It’s not really going to give you a lot of value compared to just
building your own graph transformation pipeline & then producing IR at
the end of that.

It allows me to use LLVM dependency pass manager - with analysis groups etc. I would have to write exactly the same the other way, I think.

But you wouldn’t be able to reuse any of the analyses themselves, as they are all written for IR. You could certainly derive your own class from one of the PassManagers to operate over your graph representation. LLVM may not have something for your particular use out-of-the-box, but it might be easy enough for you to make your own pass manager given what’s already there.

To come back to your original question: "I want to write a compiler

that does NOT generate LLVM IR by its own, it should simply run one of

available module passes and such pass will generate LLVM IR" - why do
you want to do this? You’re just going to have to write the graph-IR
transformation sooner or later anyway? Why not do it as the first step
& then do IR level optimizations? (I’m not saying there’s no reason to
do this, I’m just wondering what /your/ reasons are)

The answer is simple - In the graph loaded from disk there is a lot more information than in generated IR, so I want to do some transformations on the beginning. (There are other reasons, but this one is one of the biggest).

For optimizing with source-level information, do you need to do them in conjunction with lots of analyses? If so, can you leverage the existing metadata infrastructure for your own purposes?

Michael, sorry for late answer, I had subproject to do and now I’m back with this topic. I’m happy I can do it and I will try. After analysis the graph I want to produce LLVM IR (from one of my graph passes) and then use the builtin optimalizations etc.

I didnt exactly understood your second question. I can change the metadata infrastructure, but how is it connected to this question? :slight_smile:

2012/11/27 Michael Ilseman <milseman@apple.com>