Porting LLVM backend is no fun yet

As we’ve already seen, David Chisnall prefers hacking LLVM over GCC (see ): “In contrast, every time I look at the GCC code, it takes two people to prevent me from clawing my eyeballs out.” I’m sorry to report that so-far I have had the opposite experience. Some years ago, I ported binutils (via CGEN) and GCC to an embedded RISC CPU and found it the process straightforward and pleasant. CGEN was especially handy for describing a sometimes quirky RISC instruction set and offered great flexibility for factoring-out commonalities. By contrast, I have found TableGen to be much more rigid and brittle. There are too many constructs that need to be special-cased, and the existing ports do them in gratuitously different ways. There also seem to be too many layers of classes and helper functions in proportion to what’s being specified. I guess that sums-up my gripe: low signal/noise and gratuitous complexity. Sorry to be complaining rather than proposing solutions. When I better get the hang of all of this, I expect to have some ideas on how to improve TableGen. Is there a development plan or wishlist for TableGen? I see nothing on the wiki yet. I must also say that the LLVM code is considerably “denser” because of the unfortunate choice of BiCapitalizedIdentifierNames. Underscores lend some horizontal whitespace to names and make their subtokens visually distict. BiCapped code is kinda like German with its cumbersome compound nouns. Enough complaining for now–back to banging skull on stone! 8^) G

Hi Greg,

I understand your frustration. I’ve been on this mailing list for a little over a year hoping that by osmosis I could get a a better handle on writing a back end for LLVM. Although I feel more comfortable with the nomenclature, I still do not have a clue as to how to begin (actually I do, but it sounds more dramatic saying it this way). I’ve read the documentation, but TableGen seems to just be a glorified front end for generating C++ records. I was hoping for something that would allow me to specify my target machine (more inline with what GCC does) and then just stand back and watch the target code be generated. I guess a deeper understanding of Target classes is mandatory before proceeding to use TableGen.

I guess what would help would be a tutorial that shows how one goes about writing a back-end for a fictitious target machine - something similar to “Porting GCC for Dunces” (http://ftp.axis.se/pub/users/hp/pgccfd/pgccfd.pdf)). Also a pre-made “Dummy” back-end that had some basic instructions would also go a long way in helping someone write a back-end.

Alex Karahalios

By the way. I'm searching for more detailed documentation about
instruction itineraries in LLVM, used by the schedulers.
Can someone point me over that?

As we've already seen, David Chisnall prefers hacking LLVM over GCC (see How the LLVM Compiler Infrastructure Works | What Is LLVM? | InformIT): "In contrast, every time I look at the GCC code, it takes two people to
prevent me from clawing my eyeballs out."

I'm sorry to report that so-far I have had the opposite experience. Some years ago, I ported binutils (via CGEN) and GCC to an embedded RISC CPU and found it the process straightforward and pleasant. CGEN was especially handy for describing a sometimes quirky RISC instruction set and offered great flexibility for factoring-out commonalities. By contrast, I have found TableGen to be much more rigid and brittle. There are too many constructs that need to be special-cased, and the existing ports do them in gratuitously different ways. There also seem to be too many layers of classes and helper functions in proportion to what's being specified. I guess that sums-up my gripe: low signal/noise and gratuitous complexity. Sorry to be complaining rather than proposing solutions. When I better get the hang of all of this, I expect to have some ideas on how to improve TableGen. Is there a development plan or wishlist for TableGen? I see nothing on the wiki yet.

Your observations are accurate; LLVM's CodeGen is comparatively
less mature in this area. There are numerous examples of these
symptoms.

There certainly are wishlist items for TableGen and TableGen-based
instruction descriptions, though I don't know of an official list. Offhand,
a few things that come to mind are the ability to handle nodes with
multiple results, something analogous to GCC's multi-alternative
constraints, the ability to generate more of the Legalize tables
automatically, and the ability to generate more of the TargetInstrInfo
hooks automatically. There's no plan for things like this at the
moment though; they will get done only when someone steps up
and implements them.

I must also say that the LLVM code is considerably "denser" because of the unfortunate choice of BiCapitalizedIdentifierNames. Underscores lend some horizontal whitespace to names and make their subtokens visually distict. BiCapped code is kinda like German with its cumbersome compound nouns.

I guess this is just a matter of familiarity, and perhaps of choosing
an advantageous font.

Dan

Hello, Alex

for generating C++ records. I was hoping for something that would allow me
to specify my target machine (more inline with what GCC does) and then just
stand back and watch the target code be generated. I guess a deeper
understanding of Target classes is mandatory before proceeding to use
TableGen.

That's true. TableGen can automate many important cases, but surely
not the everything, since targets differs *alot*. Extending TableGen
language to handle all possible cases would yield another language and
it's quite questionable whether it will be better than C++ :slight_smile:

I guess what would help would be a tutorial that shows how one goes about
writing a back-end for a fictitious target machine - something similar to
"Porting GCC for Dunces"
(http://ftp.axis.se/pub/users/hp/pgccfd/pgccfd.pdf). Also
a pre-made "Dummy" back-end that had some basic instructions would also go a
long way in helping someone write a back-end.

Recently on the horizont appeared some group of enthusiasts willing to
write MSP430 backend. I'm trying to make their learning curve less
steep and thus started such backend by myself. You can monitor the git
repository at Public Git Hosting - llvm/msp430.git/summary. It contains dummy
backend and also has several early steps already done.

There is also some description what's going on
LLVM — LiveJournal, but unfortunately, only in
russian. Hopefully someday it become a proper "Backend for Dummies"
article.

Hope this will help somehow.

Dan Gohman wrote:

There certainly are wishlist items for TableGen and TableGen-based
instruction descriptions, though I don't know of an official list.
Offhand,
a few things that come to mind are the ability to handle nodes with
multiple results,

Is there an official workaround, BTW?

- Volodya

Hi Anton,

Thanks for the MSP430 links. This will help because MSP430 has a simple enough instruction set to make following along in the code much easier.

Alex Karahalios

Dan Gohman wrote:

There certainly are wishlist items for TableGen and TableGen-based
instruction descriptions, though I don't know of an official list.
Offhand,
a few things that come to mind are the ability to handle nodes with
multiple results,

Is there an official workaround, BTW?

Currently you have to write C++ code. See how the X86 backend handles mul hi/lo instructions, for example:

     case ISD::SMUL_LOHI:
     case ISD::UMUL_LOHI: {
       SDValue N0 = Node->getOperand(0);
       SDValue N1 = Node->getOperand(1);

       bool isSigned = Opcode == ISD::SMUL_LOHI;
       if (!isSigned)
         switch (NVT.getSimpleVT()) {
         default: assert(0 && "Unsupported VT!");
         case MVT::i8: Opc = X86::MUL8r; MOpc = X86::MUL8m; break;
         case MVT::i16: Opc = X86::MUL16r; MOpc = X86::MUL16m; break;
         case MVT::i32: Opc = X86::MUL32r; MOpc = X86::MUL32m; break;
         case MVT::i64: Opc = X86::MUL64r; MOpc = X86::MUL64m; break;
         }
       else
         switch (NVT.getSimpleVT()) {
         default: assert(0 && "Unsupported VT!");
         case MVT::i8: Opc = X86::IMUL8r; MOpc = X86::IMUL8m; break;
         case MVT::i16: Opc = X86::IMUL16r; MOpc = X86::IMUL16m; break;
         case MVT::i32: Opc = X86::IMUL32r; MOpc = X86::IMUL32m; break;
         case MVT::i64: Opc = X86::IMUL64r; MOpc = X86::IMUL64m; break;
         }

       unsigned LoReg, HiReg;
       switch (NVT.getSimpleVT()) {
       default: assert(0 && "Unsupported VT!");
       case MVT::i8: LoReg = X86::AL; HiReg = X86::AH; break;
       case MVT::i16: LoReg = X86::AX; HiReg = X86::DX; break;
       case MVT::i32: LoReg = X86::EAX; HiReg = X86::EDX; break;
       case MVT::i64: LoReg = X86::RAX; HiReg = X86::RDX; break;
       }

       SDValue Tmp0, Tmp1, Tmp2, Tmp3, Tmp4;
       bool foldedLoad = TryFoldLoad(N, N1, Tmp0, Tmp1, Tmp2, Tmp3, Tmp4);
       // multiplty is commmutative
       if (!foldedLoad) {
         foldedLoad = TryFoldLoad(N, N0, Tmp0, Tmp1, Tmp2, Tmp3, Tmp4);
         if (foldedLoad)
           std::swap(N0, N1);
       }

       SDValue InFlag = CurDAG->getCopyToReg(CurDAG->getEntryNode(), dl, LoReg,
                                               N0, SDValue()).getValue(1);

       if (foldedLoad) {
         SDValue Ops = { Tmp0, Tmp1, Tmp2, Tmp3, Tmp4, N1.getOperand(0),
                           InFlag };
         SDNode *CNode =
           CurDAG->getTargetNode(MOpc, dl, MVT::Other, MVT::Flag, Ops,
                                 array_lengthof(Ops));
         InFlag = SDValue(CNode, 1);
         // Update the chain.
         ReplaceUses(N1.getValue(1), SDValue(CNode, 0));
       } else {
         InFlag =
           SDValue(CurDAG->getTargetNode(Opc, dl, MVT::Flag, N1, InFlag), 0);
       }

       // Copy the low half of the result, if it is needed.
       if (!N.getValue(0).use_empty()) {
         SDValue Result = CurDAG->getCopyFromReg(CurDAG- >getEntryNode(), dl,
                                                   LoReg, NVT, InFlag);
         InFlag = Result.getValue(2);
         ReplaceUses(N.getValue(0), Result);
#ifndef NDEBUG
         DOUT << std::string(Indent-2, ' ') << "=> ";
         DEBUG(Result.getNode()->dump(CurDAG));
         DOUT << "\n";
#endif
       }
       // Copy the high half of the result, if it is needed.
       if (!N.getValue(1).use_empty()) {
         SDValue Result;
         if (HiReg == X86::AH && Subtarget->is64Bit()) {
           // Prevent use of AH in a REX instruction by referencing AX instead.
           // Shift it down 8 bits.
           Result = CurDAG->getCopyFromReg(CurDAG->getEntryNode(), dl,
                                           X86::AX, MVT::i16, InFlag);
           InFlag = Result.getValue(2);
           Result = SDValue(CurDAG->getTargetNode(X86::SHR16ri, dl, MVT::i16,
                                                  Result,
                                      CurDAG->getTargetConstant(8, MVT::i8)), 0);
           // Then truncate it down to i8.
           SDValue SRIdx = CurDAG->getTargetConstant(X86::SUBREG_8BIT, MVT::i32);
           Result = SDValue(CurDAG->getTargetNode(X86::EXTRACT_SUBREG, dl,
                                                    MVT::i8, Result, SRIdx), 0);
         } else {
           Result = CurDAG->getCopyFromReg(CurDAG->getEntryNode(), dl,
                                           HiReg, NVT, InFlag);
           InFlag = Result.getValue(2);
         }
         ReplaceUses(N.getValue(1), Result);
#ifndef NDEBUG
         DOUT << std::string(Indent-2, ' ') << "=> ";
         DEBUG(Result.getNode()->dump(CurDAG));
         DOUT << "\n";
#endif
       }

#ifndef NDEBUG
       Indent -= 2;
#endif

       return NULL;

As we’ve already seen, David Chisnall prefers hacking LLVM over GCC (see ): “In contrast, every time I look at the GCC code, it takes two people to prevent me from clawing my eyeballs out.” I’m sorry to report that so-far I have had the opposite experience. Some years ago, I ported binutils (via CGEN) and GCC to an embedded RISC CPU and found it the process straightforward and pleasant. CGEN was especially handy for describing a sometimes quirky RISC instruction set and offered great flexibility for factoring-out commonalities. By contrast, I have found TableGen to be much more rigid and brittle. There are too many constructs that need to be special-cased, and the existing ports do them in gratuitously different ways. There also seem to be too many layers of classes and helper functions in proportion to what’s being specified. I guess that sums-up my gripe: low signal/noise and gratuitous complexity. Sorry to be complaining rather than proposing solutions. When I better get the hang of all of this, I expect to have some ideas on how to improve TableGen. Is there a development plan or wishlist for TableGen? I see nothing on the wiki yet.

Surely these are two separate issues. TableGen being less than capable then CGEN doesn’t have anything to do with the overall quality of rest of LLVM. Yes it’s true it could be harder to port LLVM to certain architectures. But it’s probably not the case for every target. Can you do a good port of x86 using CGEN? :slight_smile:

Evan

Dan Gohman wrote:

There certainly are wishlist items for TableGen and TableGen-based
instruction descriptions, though I don't know of an official list. Offhand, a few things that come to mind are the ability to handle nodes with
multiple results, something analogous to GCC's multi-alternative
constraints, the ability to generate more of the Legalize tables
automatically, and the ability to generate more of the TargetInstrInfo
hooks automatically. There's no plan for things like this at the
moment though; they will get done only when someone steps up
and implements them.
  
Hi Dan,

Please pass along any other whishlist items you find. I'll dig through the llvmdev archives and see what I can find. I'm willing to lend a hand and make some design proposals as things become more clear for me. One thing I could use is nested multiclasses--the ability to call defm within a multiclass.

G

Evan Cheng wrote:

Surely these are two separate issues. TableGen being less than capable then CGEN doesn't have anything to do with the overall quality of rest of LLVM. Yes it's true it could be harder to port LLVM to certain architectures. But it's probably not the case for every target. Can you do a good port of x86 using CGEN? :slight_smile:

Hi Evan,

Forgive me for focusing so much on complaints and omitting praise. I enthusiastically applaud LLVM in every other respect! LLVM is the coolest, grooviest swiss-army-knife of compiler technology I have had the pleasure of working with. I'm certain that my own deficient C++ skills contributes to my frustrations, and is hardly the fault of LLVM. The thrust of my message is that I want to learn better what TableGen can already do, and contribute to making it a more flexible and complete target description tool. One reason I had an easier time with the CGEN+GCC port was because there were already several good quality ports to targets that were very similar (m32r was one). LLVM just isn't as old and broadly ported as CGEN+GCC, so I'm not benefiting as much from others' work. Now I get to be something of a pioneer and will doubtlessly acquire some arrows in my back to prove it. 8^)

G

Here's a few on my list in addition to what you mentioned:

1. Multiclass inheritance. I've implemented that here and am working toward
   getting it upstream.

2. defm can inherit from multiple multiclasses. Also implemented here and
   working its way upstream.

3. Passing subclasses as arguments so one can re-use template code.
   This is really, really hard to implement the way things are right now. I
   tried and gave up. :frowning:

4. !nameconcat - works like !strconcat but looks up a symbol when it's done.
   This is really useful when trying to reuse dag patterns. Also implemented
   here and working its way upstream.

5. !cast - Given a string, look up a symbol, expecting a particular type for
   that symbol.

6. Documentation! I hope to contribute something here as I can. I've leard A
   LOT over the last few weeks.

                                                      -Dave