Getting up to speed with llvm backends. Machine Instruction operands.

Welcome to all

Questions from veteran programmer with no LLVM backend experience evaluating

llvm for creating a Hitachi 6309 backend.

This post is about finding out more about machine instruction operands.

The documentation I have read so far includes:

  • the online manuals

  • Building an LLVM Backend. Fraser Cormack Pierre-André Saulais

  • The Design of a Custom 32-bit RISC CPU and LLVM Compiler Backend. Connor Jan Goldberg

  • Design and Implementation of a TriCore Backend for the LLVM Compiler Framework. Christoph Erhardt

I have also cloned llvm 9.0.1 and started looking at some of the targets. A little overwhelming!

At this point I’m at information overload!

From the “The LLVM Target-Independent Code Generator”

The MachineInstr class

The operands of a machine instruction can be of several different types: a register reference, a constant integer, a basic block reference, etc.

Where are these operand types defined or documented (especially the etcs)?

How do these operand types relate to the operands specified in the instruction selection and selection patterns?

A concern I have is raised in “Design and Implementation of a TriCore Backend for the LLVM Compiler” where

the instruction set is non orthogonal (contains special purpose address registers)

The strict distinction between pointers and integers is highly problematic because LLVM’s

code generator implicitly converts all pointers to integers of the same width … upon

construction of the SelectionDAG.

.

.

.

As mentioned above, LLVM’s agnosticism regarding pointers initially makes it impos-

sible to comply with the EABI as there is no way to tell whether an integer argument

should go into an address register or a data register.

However this document is dated circa 2008/2009 and I ask if this situation still remains the same

today.

I ask because the backend I would like to target the Hitachi/Motorola 6309/6809 which too

provides dedicated indexing (addressing) registers. In fact in all binary operations the second

operand is either immediate or some kind of a memory reference via a index/address register.

The syntax being:

{[}{OffsetReg | Disp{5,8,16}},{- | --}IndexReg{+ | ++ | ]}

OffsetReg can be 8bit or 16bit accumulator (so only certain regs allowed)

Displacment can be 5, 8 or 16 bit signed

IndexReg can only be special index registers or PC or stack

  • ++ is post increment by 1, 2 repsectively
  • – is pre decrement by 1, 2 respectively

the entire effective address is a pointer to pointer

and any incrementors/decrementors are mutally exclusive

So given the machine instruction :

add d ,x # to the d register add what the x register points at

further examples of the second arguement are:

,x+ # what register x points to and post inc x ie. *x++

10,y # what register y + 10 pointer to ie. *(y+10)

[20,u] # what register u + 20 pointer to pointer to ie. **(u+20)

w,y # what register y + register w points to ie. *(y+w)

Is there a way to pattern match these kinds of operands?

In MachineOperand.h I see this operand type. I assume I can match to it?!?!?

MO_TargetIndex, ///< Target-dependent index+offset operand.

At The LLVM Target-Independent Code Generator — LLVM 16.0.0git documentation

The x86 has a very flexible way of accessing memory. It is capable of forming memory addresses of the following

expression directly in integer instructions (which use ModR/M addressing):

SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32

In order to represent this, LLVM tracks no less than 5 operands for each memory operand of this form. This means

that the “load” form of ‘mov’ has the following MachineOperands in this order:

Index: 0 | 1 2 3 4 5

Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement Segment

OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg

Stores, and all other instructions, treat the four memory operands in the same way and in the same order. If the

segment register is unspecified (regno = 0), then no segment override is generated. “Lea” operations do not have

a segment register specified, so they only have 4 operands for their memory reference.

I then went and looked at the files in target/x86 and I have to admit I got lost trying to find where and

how this is implemented.

At this (learning) stage I would appreciate any input or pointers including any other documentation or

tutorials that might help in relation to how I can implement indexed memory addressing operands.

So appreciate comments.

Walter

Walter Zambotti via llvm-dev <llvm-dev@lists.llvm.org> writes:

I have also cloned llvm 9.0.1 and started looking at some of the targets. A
little overwhelming!

I would work off master if possible. LLVM APIs are a moving target.

The operands of a machine instruction can be of several different types: a
register reference, a constant integer, a basic block reference, etc.

Where are these operand types defined or documented (especially the etcs)?

Look in include/llvm/CodeGen/MachineOperand.h

How do these operand types relate to the operands specified in the
instruction selection and selection patterns?

That is entirely up to the target to define. You write a combination of
DAG patterns to match (using TableGen) and custom C++ lowering code to
map SelectionDAG nodes to MachineInstrs (which reference
MachineOperands). Most of the time these are one-to-one mappings (a
SelectionDAG node maps to an instruction defining a virtual register, a
SelectionDAG constant maps to a MachineOperand constant, SelectionDAG
operands map to virtual register references, etc.)

As mentioned above, LLVM’s agnosticism regarding pointers initially
makes it impossible to comply with the EABI as there is no way to tell
whether an integer argument should go into an address register or a
data register.

However this document is dated circa 2008/2009 and I ask if this situation
still remains the same today.

Yes. There is no machine-level "pointer type."
  

I ask because the backend I would like to target the Hitachi/Motorola
6309/6809 which too provides dedicated indexing (addressing)
registers. In fact in all binary operations the second operand is
either immediate or some kind of a memory reference via a
index/address register.

I'm not familiar with this architecture, but I will try to answer
question as best I can.

So given the machine instruction :

               add d ,x # to the d register add what the x register
points at

further examples of the second arguement are:

,x+ # what register x points to and post inc x ie. *x++

10,y # what register y + 10 pointer to ie. *(y+10)

[20,u] # what register u + 20 pointer to pointer to ie. **(u+20)

w,y # what register y + register w points to ie. *(y+w)

Is there a way to pattern match these kinds of operands?

Yes. The AArch64 backend might be a good guide as it supports pre- and
post-increment. I don't know if any existing target has
pointer-to-pointer operands (that's kind of a strange thing as it
requires two memory operands in one instruction) but I don't think it
would be super difficult to add. The X86 backend has special matching
code to construct its more complex addressing modes.

In MachineOperand.h I see this operand type. I assume I can match to
it?!?!?

    MO_TargetIndex, ///< Target-dependent index+offset operand.

I think the interpretation of this operand type is up to the target
(hence, "Target-dependent"). So yes, I think you could use it.

The x86 has a very flexible way of accessing memory. It is capable of
forming memory addresses of the following

I then went and looked at the files in target/x86 and I have to admit
I got lost trying to find where and how this is implemented.

For X86, the magic happens with the "addr" TableGen class in
X86InstrInfo.td which is used to type address operands of Load/Store
nodes in the DAG patterns. This ends up calling info custom matching
code "selectAddr" which lives in X86ISelDAGToDAG.cpp. There are similar
variants of "addr" in the TableGen files each with different
requirements (for example alignment restrictions for vector
instructions).

Look at the instruction patterns in X86InstrArithmetic.td to see how
"addr" is used.

Hopefully that will get you started.

At this (learning) stage I would appreciate any input or pointers
including any other documentation or tutorials that might help in
relation to how I can implement indexed memory addressing operands.

For backend work it is absolutely necessary to understand TableGen.

http://llvm.org/docs/TableGen/index.html
http://llvm.org/docs/TableGen/LangIntro.html
http://llvm.org/docs/TableGen/LangRef.html
http://llvm.org/docs/TableGen/BackEnds.html

The way existing code generators work will make much more sense after
reading at least the first three above. The fourth is mostly brief
summaries of the different things TableGen is used to generate. It
started out mostly as a code generator generator but has become much
more over the years.

                       -David

Yes. The AArch64 backend might be a good guide as it supports pre- and
post-increment. I don't know if any existing target has
pointer-to-pointer operands (that's kind of a strange thing as it
requires two memory operands in one instruction) but I don't think it
would be super difficult to add. The X86 backend has special matching
code to construct its more complex addressing modes.

MSP430 has memory-memory instructions. As well as post-increments.
Probably the smallest backend to check :slight_smile:

Anton Korobeynikov <anton@korobeynikov.info> writes:

Yes. The AArch64 backend might be a good guide as it supports pre- and
post-increment. I don't know if any existing target has
pointer-to-pointer operands (that's kind of a strange thing as it
requires two memory operands in one instruction) but I don't think it
would be super difficult to add. The X86 backend has special matching
code to construct its more complex addressing modes.

MSP430 has memory-memory instructions. As well as post-increments.
Probably the smallest backend to check :slight_smile:

I mistyped :). My understanding is the with the pointer-to-pointer
operand the instruction would issues two memory *operations* (not have
two memory *operands* as I had written). That is, the DAG to match
would be a load from an address produced by another load.

                    -David

David Greene via llvm-dev <llvm-dev@lists.llvm.org> writes:

I ask because the backend I would like to target the Hitachi/Motorola
6309/6809 which too provides dedicated indexing (addressing)
registers. In fact in all binary operations the second operand is
either immediate or some kind of a memory reference via a
index/address register.

I'm not familiar with this architecture, but I will try to answer
question as best I can.

I missed this the first time around so I'll add it now.

Special index/address registers are really no problem. They're just a
different register class. There are target hooks you would redefine to
implement copying between registers of different classes when the code
generator realizes the value it needs is in the "wrong" register class.
It *should* mostly just fall out given correct register class
definitions and typing the TableGen pattern inputs correctly.

I think the concern raised by the thesis paper is very specific to
calling convention. The TriCore ABI apparently requires that
arguments/return values of pointer and integer type be passed in
different register classes. Does your target have a similar
requirement? If not, then I don't think you have anything to worry
about.

                     -David

David Greene

Thanks for the MSP430 pointer I will look into that.

In relation to my concerns regarding the TriCore ABI. It is good to know
this was specific to the calling convention.

The 6309 does not have a lot of registers to pass arguments. It is all be
stack based.

I believe the current C compiler will use the only 16 bit accumulator to
return a value regardless of type. So unless I extend the ABI that
shouldn't be a problem.

I will continue my self-education by study the MSP430 backend.

Thanks

Walter