I recently subscribed to the LLVM and Clang developer mailing lists, and earlier today read a message requesting advice on “Contributing to LLVM Projects”. I’m hoping to eventually get involved as well, but with my situation being very different to that described in the referenced thread I got the idea of solliciting advice seperately. If this is not the right place to ask, though, then please let me know and ignore the remainder of this message.
Right now, I only ‘really’ know Java for my current job, save for some Prolog, Haskell and Coq that I picked up before during my linguistics studies (specializing in categorial grammar, and having supplemented with courses from CS and AI on logic and type theory). I thus still have a long road ahead, already when it comes to just bare theory, let alone implementation. The following is a detailed study plan that I have come up with for my spare time that I think will last me for the next 1.5 years or so (naturally to be supplemented with the LLVM documentation), and which I would be very happy to obtain any feedback on.
Instead of beginning with C++, I rather started studying C from K&R and (once finished) the book by Fraser and Hanson on LCC. Afterwards, I already have a modest reading list prepared of some articles on SSA (see below) which I hope to combine with a study of the LLVM IR. Based on these readings, as a matter of exercise I can take what I learnt from LCC to write a small compiler in C targeting the LLVM IR for, say, a subset of Prolog. Ideally, I should reach this stage by the end of this year. While far from ‘done’ with C, I suppose that at this point I can also proceed to pick up some C++ (having in mind at least Accelerated C++ and the Effective (Modern) C++ books, coupled perhaps with Modern C++ Design for template metaprogramming), by which time I think I should be prepared to start following some of the developments around Clang and start digging into its code. All this, in turn, should keep me busy during the first half of the next year at least. I hope, however, to also come to thoroughly understand the LLVM backend, and I would suppose that, besides following literature on compiler optimizations, going through a book or two on computer architecture should give me a start to dig deeper into its source (thinking currently of Computer Systems: A Programmer’s Perspective). There will surely never be any shortage of good books to read (in particular hoping to also squeeze in Sedgewick’s algorithms in C and a book on OS architecture somewhere), but I hope all this at least should give me a good start from which to start planning further, and hopefully eventually get involved somehow. However, given my current state of knowledge and being limited to my spare time, I’m guessing I’ll be looking at at least five years or so to reach a lower intermediate level understanding of most of the important concepts involved.
As for the articles that I had in mind for coming to grips with the basics of SSA, I came up with the following list, planning to first gradually work my way up to an understanding of control- and data dependence.
- Prosser '59: Applications of Boolean matrices to the analysis of flow diagrams <Introduces the concept of dominators, I believe.>
- Lowry & Medlock '69: Object code optimization
- Allen '70: Control flow analysis
- Allen & Cocke '76: A program data flow analysis procedure
- Ferrante et al. '87: The program dependence graph and its use in optimization
- Rosen et al. '88: Global value numbers and redundant computations
- Alpern et al. '88: Detecting equality of variables in programs
- Cytron et al. '91: Efficiently computing static single assignment form and the control dependence graph
Many thanks for taking the time to read through all this. I would be very happy to receive any feedback on these plans, and again, if this was not the right place to ask, then my apologies.