llvm-mc and endianess.

Hi,

As a first step to port the LLVM chain on an in-house big-endian processor, I’m integrating the native assembler as a new ‘-assemble –arch=’ in llvm-mc.

All work quite well, I have a correct output ELF format except that generated code is little-endian.

I’ve understood that the endianess of the LLVM chain is controlled by the DataLayout class, but it appear to me that llvm-mc does not make use of such class.

I’ve seen a backend (CPU0, http://jonathan2251.github.io/lbd/genobj.html) that defines two different targets and performs the byte swapping as part of the ‘EmitInstruction’. Is it the right way?

Could somebody confirm my understanding and give me some tips about endianess in llvm-mc?

Thanks, Dominique T.

As a first step to port the LLVM chain on an in-house big-endian processor, I’m integrating the native assembler as a new ‘-assemble –arch=’ in llvm-mc.
All work quite well, I have a correct output ELF format except that generated code is little-endian.
I’ve understood that the endianess of the LLVM chain is controlled by the DataLayout class, but it appear to me that llvm-mc does not make use of such class.
I’ve seen a backend (CPU0, http://jonathan2251.github.io/lbd/genobj.html) that defines two different targets and performs the byte swapping as part of the ‘EmitInstruction’. Is it the right way?
Could somebody confirm my understanding and give me some tips about endianess in llvm-mc?

I have the same problem for big-endian Z8, I’ve postponed the assembler for the moment because of this.

Cheers, Kuba

Dominique Torette <Dominique.Torette@spacebel.be> writes:

Hi,

As a first step to port the LLVM chain on an in-house big-endian
processor, I'm integrating the native assembler as a new '-assemble
-arch=' in llvm-mc.
All work quite well, I have a correct output ELF format except that
generated code is little-endian.
I've understood that the endianess of the LLVM chain is controlled by
the DataLayout class, but it appear to me that llvm-mc does not make use
of such class.
I've seen a backend (CPU0,
http://jonathan2251.github.io/lbd/genobj.html) that defines two
different targets and performs the byte swapping as part of the
EmitInstruction'. Is it the right way?
Could somebody confirm my understanding and give me some tips about
endianess in llvm-mc?

At the MC level you need to make your *AsmInfo constructor set:

  IsLittleEndian = false;

Also make sure you pass false to the third argument to createELFObjectWriter().

FWIW there are several in-tree targets that support big-endian,
such as ARM, MIPS and PowerPC. SystemZ is big-endian only.
It might help to compare with one of those.

Thanks,
Richard

Could somebody confirm my understanding and give me some tips about
endianess in llvm-mc?

At the MC level you need to make your *AsmInfo constructor set:

IsLittleEndian = false;

Also make sure you pass false to the third argument to createELFObjectWriter().

FWIW there are several in-tree targets that support big-endian,
such as ARM, MIPS and PowerPC. SystemZ is big-endian only.
It might help to compare with one of those.

Thanks Richard, it’s also exactly the information I needed. — Kuba

For what concern, the ARM architecture the endianess problem seem to be managed 'by hand' in ' ARMELFStreamer::emitInst()', not by 'EmitInstruction'.
The third argument to createELFObjectWriter() only set a boolean attribute that is retrieved in LittleEndian' local variable.
This local variable is then used, with some length information, to re-order the instruction's bytes.
Then, my understanding is that I have to define my own 'emitInst()' that perform explicitly the bytes swapping.
Could someone confirms this analysis?

  virtual void emitInst(uint32_t Inst, char Suffix) {
    unsigned Size;
    char Buffer[4];
    const bool LittleEndian = getContext().getAsmInfo()->isLittleEndian();

    switch (Suffix) {
    case '\0':
      Size = 4;

      assert(!IsThumb);
      EmitARMMappingSymbol();
      for (unsigned II = 0, IE = Size; II != IE; II++) {
        const unsigned I = LittleEndian ? (Size - II - 1) : II;
        Buffer[Size - II - 1] = uint8_t(Inst >> I * CHAR_BIT);
      }

      break;
    case 'n':
    case 'w':
      Size = (Suffix == 'n' ? 2 : 4);

      assert(IsThumb);
      EmitThumbMappingSymbol();
      for (unsigned II = 0, IE = Size; II != IE; II = II + 2) {
        const unsigned I0 = LittleEndian ? II + 0 : (Size - II - 1);
        const unsigned I1 = LittleEndian ? II + 1 : (Size - II - 2);
        Buffer[Size - II - 2] = uint8_t(Inst >> I0 * CHAR_BIT);
        Buffer[Size - II - 1] = uint8_t(Inst >> I1 * CHAR_BIT);
      }

      break;
    default:
      llvm_unreachable("Invalid Suffix");
    }

    MCELFStreamer::EmitBytes(StringRef(Buffer, Size));
  }