(Sorry about the wall of text, it ended up as a brain dump of a bunch of backend-related documentation that I know about/have bookmarked in the past. Hopefully there’s something useful in there.)
If you haven’t stumbled across them already, these might be helpful:
Another good source of documentation is simply to look at the commits that introduce new backends, e.g. the initial commits for SystemZ and AArch64 come to mind. If you use git, something like this might be handy:
git log -p -n1 $(git log --pretty=‘format:%H’ lib/Target/AArch64/ | tail -n1) >initial.patch
Generally the most consistent source of good documentation is the devmeetings, so definitely browse what’s available at <http://llvm.org/devmtg/> (even quite old ones can still be really useful, such as Anton’s slides). Also, as I’m sure you’ve heard, a lot of that backend stuff is slated for change (getting rid of SelectionDAG, global-isel, separating the MI layer, etc.), although the process will probably take a long time.
There’s also a lot of backend-related knowledge encoded (trapped?) inside of the TableGen backends, and AFAIK the only way to really puzzle out the deep details there is to look at the source code of the TableGen backends in utils/TableGen/ and correlating that with the generated .inc files (look in your build dir) and where those .inc files are included inside lib/Target/$ARCH/; see utils/TableGen/TableGenBackends.h for an overview of what’s there. There is some rough documentation about them in <http://llvm.org/docs/WritingAnLLVMBackend.html> and <http://llvm.org/docs/CodeGenerator.html>.
The actual TableGen “language” is documented fairly poorly but actually has real dedicated documentation available (<http://llvm.org/docs/TableGenFundamentals.html>, <http://llvm.org/docs/TableGen/LangRef.html>). There’s also tests/TableGen/ which might be useful. To learn about the in-code data structures that are produced by TableGen and manipulated by the TableGen backends, you basically need to look at include/llvm/TableGen/Record.h and lib/TableGen. Beware: the TableGen code is not quite up to par with the rest of the codebase. Also, don’t assume that there’s some coherent “underlying structure” to the TableGen language; it’s mostly a product of long periods of neglect punctuated by quick hacks when a developer gets frustrated by clunky existing ways to express something (or inability to express something).
The only documentation (besids the code itself) we have about the MI layer is the short section <http://llvm.org/docs/CodeGenerator.html#machine-code-description-classes>. You can get an idea of the scope of the MI layer from the plans laid out in <http://thread.gmane.org/gmane.comp.compilers.llvm.devel/65434>. Other than that you’ll probably have to go digging in the source or ask on the mailing lists.
There’s a blog post about MC <http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html> and an old dev meeting talk by Daniel Dunba, along with a section here <http://llvm.org/docs/CodeGenerator.html#the-mc-layer>r. There’s also a quite thorough tutorial specifically about implementing an integrated assembler <http://www.embecosm.com/appnotes/ean10/ean10-howto-llvmas-1.0.html>. Other than that, I don’t know very much material to recommend about MC.
If you gain any insight about a specific thing in the course of your backend project, please feel free to share that knowledge by documenting <http://llvm.org/docs/SphinxQuickstartTemplate.html>.
– Sean Silva