Hello

Hello LLVM group,
I am very new to this project. I want to develop a few things on LLVM platform and for that I am wondering where can I learn about Intermediate Representation used in LLVM project? Is there any in-depth instruction level guide available except online tutorials?

Please advise.

Thanks.
Have a great weekend.

There are some books, but if you’re looking for the best reference for LLVM IR it’s online here: https://llvm.org/docs/LangRef.html

Thanks David. Really appreciate this.

Would you please kindly share those names of the books so at least I have an idea about what resources are available and in case I need more explanation than this Language Reference Manual ?

Once again thanks.

Published books probably get out of date pretty quickly, so you’ll need to keep the more up-to-date code/online docs in mind even if you’re reading printed stuff. At a quick google this seems like the sort of thing that might be useful: https://www.amazon.com/Getting-Started-LLVM-Core-Libraries/dp/1782166920 (googling ‘llvm book’ shows a few results)

I own this book and it doesn’t cover IR in depth. I am trying to write an additional feature for C++ frontend and for that I want to use proper Intermediate Representation that doesn’t become a laughing stock so I am looking for some in depth explanation on IR. I am aware that it changes so fast that a book will become outdated however for my exercise I am willing to go back to the version the book covers and implement it there. If you have used some book personally and recommend it please share with me.

Thanks.

Yeah, sorry - I haven’t read any LLVM books, unfortunately. Perhaps some other folks will be able to chime in with tips.

Ok, thanks for the reply. Lets see what other folks have to say.

Hi Mihir,

I’m not really sure what would be the best choice given that you mention that you want a source that covers LLVM IR “in depth” while previously it seemed you needed
a beginner-like source.

If you already know the basics, i.e.,:

  • IR Structure (Module → Function → Basic Block → Instruction)
  • Basic operations (arithmetic, branches, calls, loads/stores, conversions all that)
  • Intrinsics / Metadata
  • PHIs
  • GEPs
  • What role types play in general

Bear in mind that when I mention these basics, I don’t mean just knowing sort of what they do. I also mean how knowing sort of how can you use them as
building blocks to implement high level operations, e.g., (from simple to more complicated):

  • 1 + 2 + 3

  • function calls

  • if-else if-else

  • classes / structs and operations in them

  • virtual functions

Maybe type conversions, pointers etc. anyway you get the point.

If you do have the basics down, then I think is the time to start studying in depth. At this point, I don’t think a book or any such source would be useful to you. LangRef would be the way to go along
with writing C++ in godbolt (https://godbolt.org/) and seeing what LLVM IR Clang emits (with -emit-llvm and preferably -g0 to avoid debug info as cmd args).

But if you don’t have the basics down, then you probably don’t need something in-depth. In this case, I’d start with this video: https://www.youtube.com/watch?v=m8G_S5LwlTo,
writing (simple) C code in godbolt and inspecting the result (eventually trying to produce it myself) and maybe I’d watch a compilers course (meaning 1-2 lectures covering things
of interest, not the whole thing and without paying - there is plenty of free material from universities online).

I hope this helps. If not, maybe you can try to direct us on what exactly is your level.

Best,
Stefanos

Στις Κυρ, 10 Ιαν 2021 στις 2:39 π.μ., ο/η Mihir Sevak via llvm-dev <llvm-dev@lists.llvm.org> έγραψε:

This is great, thank you so much! I don’t have mastery over the basics yet. I have read them and conceptually I understand them but I am afraid I can’t write them on my own. I also appreciate your sharing godbolt.org. That is a great help. Do you know how I can learn about differences between IRs when optimizing is turned on vs. when it is not turned on? If there are many people on this list who might be interested in this topic then we can remove spamming everyone on this list by including llvm-dev. You guys decide.

Thanks.

This is great, thank you so much!

Np :slight_smile:

I have read them and conceptually I understand them but I am afraid I can’t write them on my own.

I’d work a little bit more on those and godbolt can help a lot :slight_smile: You can always see what the compiler does.

Do you know how I can learn about differences between IRs when optimizing is turned on vs. when it is not turned on?

What do you mean ? Like, why did something happen ? Or, why is one version faster than the other ? Or sth else ?

If there are many people on this list who might be interested in this topic then we can remove spamming everyone on this list by including llvm-dev. You guys decide.

I didn’t get that… :slight_smile: llvm-dev is “included”, it’s CC’d. In any case, it’s ok, I don’t think we’re spamming anyone in llvm-dev.

  • Stefanos

Στις Κυρ, 10 Ιαν 2021 στις 3:06 π.μ., ο/η Mihir Sevak <mihir.sevak@gmail.com> έγραψε:

HI Stefanos,
I take your advice by heart. I will practice the basics. If you have a book recommendation on those basics which explains them in depth :smiley: please share. I have seen the video you shared a few times as I get lost in between but I still feel hungry for more.
Second part lets say this function is a C/C++ function. Ideally we should be inlining it but lets leave it as is for our discussion. Following are two different version of same code. I tend to write my code more like optimized version. I am confused that if I do that without -O2 then what will be the implication?

int square(int num) {
return num * num;
}

/// Without Optimization turned on

square(int): # @square(int)
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
imul eax, dword ptr [rbp - 4]
pop rbp
ret

/// After Optimizing turning on

square(int): # @square(int)
mov eax, edi
imul eax, edi
ret

If you have a book recommendation on those basics which explains them in depth :smiley: please share.

Unfortunately, I don’t. I learned LLVM IR basically by implementing a front-end, trying things on Godbolt and sometimes reading the LangRef. So,
I know these can work because of experience.

That said, this book: https://www.amazon.com/LLVM-Essentials-familiar-infrastructure-libraries-ebook/dp/B0166Y6Z34 seems to be more suited to your needs (but
again, I have not read it, just skimmed through it).

I am confused that if I do that without -O2 then what will be the implication?

Sorry, but this question is unclear to me. I guess I see two possible questions:

  1. What if I write the optimized version myself and give it to the compiler ?
    If you write it in ASM, the compiler doesn’t have much of a saying… You are responsible for it to be correct, fast etc.

  2. What is the difference between the two versions?
    Among other things the second has no loads / stores, does not use the stack etc.

I’m not sure if any of those was your question.

Best,
Stefanos

Στις Κυρ, 10 Ιαν 2021 στις 3:35 π.μ., ο/η Mihir Sevak <mihir.sevak@gmail.com> έγραψε:

Mihir,

In case it helps - the “-emit-llvm” option works very well on Compiler Explorer. For instance:
https://godbolt.org/z/9brhdG
shows the LLVM-IR results of compiling your C++ example with and without -O1. (-O2 gets the same results in this case.)

As a general rule, I’d think that producing better IR from your frontend will only help the overall pipeline, except in rare cases where it might reveal a weakness in an analysis pass further down the line… but it may not be necessary, if the optimizer is capable of producing good results from whatever you output.

  • Eric

Thanks Stefanos. The second one was my question and I am trying to study that for all possible cases.

Thanks.

Thanks Eric. Really appreciate this.

Regarding Eric’s comment, as I said above, -emit-llvm works even better with -g0 because you usually don’t care about debug info.
You can also try the Graph Output which is cool. You can accomplish this for LLVM IR too but you need to do it on your machine (with opt -print-cfg) 'cause
in Godbolt it doesn’t work that well.

The second one was my question and I am trying to study that for all possible cases.

But is the question “how do I make the program recognize which program is faster between any two x86 programs?”

This is a problem that is beyond hard - it’s undecidable! So, compilers use heuristics, they’re trying
to find local minima and lately they’re trying machine learning.

Now, if you try to optimize code yourself, this is a bit of an art and requires incredible expertise if you want to do a
good job. Good starting point is learning about architecture in general and then specifically about your target architecture.
Given that the architectures are opaque, there’s a limit to how much you can know.

Good starting point for architecture: https://www.youtube.com/playlist?list=PL5PHm2jkkXmgVhh8CHAu9N76TShJqfYDt

For optimizing x86
Agner Fog is a good source: https://www.agner.org/optimize/
I also like this article a lot: https://travisdowns.github.io/blog/2019/06/11/speed-limits.html

Best,
Stefanos

Στις Κυρ, 10 Ιαν 2021 στις 5:20 π.μ., ο/η Mihir Sevak <mihir.sevak@gmail.com> έγραψε:

Thanks Stefanos.