Welcome to all
Questions from veteran programmer with no LLVM backend experience evaluating
llvm for creating a Hitachi 6309 backend.
This post is about finding out more about machine instruction operands.
The documentation I have read so far includes:
-
the online manuals
-
Building an LLVM Backend. Fraser Cormack Pierre-André Saulais
-
The Design of a Custom 32-bit RISC CPU and LLVM Compiler Backend. Connor Jan Goldberg
-
Design and Implementation of a TriCore Backend for the LLVM Compiler Framework. Christoph Erhardt
I have also cloned llvm 9.0.1 and started looking at some of the targets. A little overwhelming!
At this point I’m at information overload!
From the “The LLVM Target-Independent Code Generator”
The MachineInstr class
The operands of a machine instruction can be of several different types: a register reference, a constant integer, a basic block reference, etc.
Where are these operand types defined or documented (especially the etcs)?
How do these operand types relate to the operands specified in the instruction selection and selection patterns?
A concern I have is raised in “Design and Implementation of a TriCore Backend for the LLVM Compiler” where
the instruction set is non orthogonal (contains special purpose address registers)
The strict distinction between pointers and integers is highly problematic because LLVM’s
code generator implicitly converts all pointers to integers of the same width … upon
construction of the SelectionDAG.
.
.
.
As mentioned above, LLVM’s agnosticism regarding pointers initially makes it impos-
sible to comply with the EABI as there is no way to tell whether an integer argument
should go into an address register or a data register.
However this document is dated circa 2008/2009 and I ask if this situation still remains the same
today.
I ask because the backend I would like to target the Hitachi/Motorola 6309/6809 which too
provides dedicated indexing (addressing) registers. In fact in all binary operations the second
operand is either immediate or some kind of a memory reference via a index/address register.
The syntax being:
{[}{OffsetReg | Disp{5,8,16}},{- | --}IndexReg{+ | ++ | ]}
OffsetReg can be 8bit or 16bit accumulator (so only certain regs allowed)
Displacment can be 5, 8 or 16 bit signed
IndexReg can only be special index registers or PC or stack
- ++ is post increment by 1, 2 repsectively
- – is pre decrement by 1, 2 respectively
the entire effective address is a pointer to pointer
and any incrementors/decrementors are mutally exclusive
So given the machine instruction :
add d ,x # to the d register add what the x register points at
further examples of the second arguement are:
,x+ # what register x points to and post inc x ie. *x++
10,y # what register y + 10 pointer to ie. *(y+10)
[20,u] # what register u + 20 pointer to pointer to ie. **(u+20)
w,y # what register y + register w points to ie. *(y+w)
Is there a way to pattern match these kinds of operands?
In MachineOperand.h I see this operand type. I assume I can match to it?!?!?
MO_TargetIndex, ///< Target-dependent index+offset operand.
At The LLVM Target-Independent Code Generator — LLVM 16.0.0git documentation
The x86 has a very flexible way of accessing memory. It is capable of forming memory addresses of the following
expression directly in integer instructions (which use ModR/M addressing):
SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
In order to represent this, LLVM tracks no less than 5 operands for each memory operand of this form. This means
that the “load” form of ‘mov’ has the following MachineOperands in this order:
Index: 0 | 1 2 3 4 5
Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement Segment
OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg
Stores, and all other instructions, treat the four memory operands in the same way and in the same order. If the
segment register is unspecified (regno = 0), then no segment override is generated. “Lea” operations do not have
a segment register specified, so they only have 4 operands for their memory reference.
I then went and looked at the files in target/x86 and I have to admit I got lost trying to find where and
how this is implemented.
At this (learning) stage I would appreciate any input or pointers including any other documentation or
tutorials that might help in relation to how I can implement indexed memory addressing operands.
So appreciate comments.
Walter