LLVM Backend for RISC-V easy start

Hello,
I am trying to add one custom instruction for the RV32IM or RV32I. The new instruction is the example instruction that is on the official website. The source that I follow is this.
These are the steps that I wrote again in my understanding.
1 - If your target is called “Dummy”, create the directory lib/Target/Dummy. >>(I think it is better to copy RISCV folder and change the name. I made it and called RISCVomer)
2 - In this new directory, create a CMakeLists.txt. It is easiest to copy a CMakeLists.txt of another target and modify it. >>(I won’t require this because it is already copied with all the content of RISCV folder.)
3 - It should at least contain the LLVM_TARGET_DEFINITIONS variable. >>(I should change the value from RISCV.td to RISCVomer.td Or I may left it same because I will use modified RISCV.td from new folder instad of actual RISCV folder.
4 - The library can be named LLVMRISCVomer (for example, see the MIPS target). >>(I do not understand this step. Which file/folder name or which variable should I change? in my example from LLVMRISCV to LLVMRISCVomer)
5 - To make your target actually do something, you need to implement a subclass of TargetMachine. This implementation should typically be in the file lib/Target/DummyTargetMachine.cpp, but any file in the lib/Target directory will be built and should work. To use LLVM’s target independent code generator, you should do what all current machine backends do: create a subclass of LLVMTargetMachine. (To create a target from scratch, create a subclass of TargetMachine.) >> (I totally did not understand this step.)
6 - cmake with -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=RISCVomer >> (I should type this command but in which folder?)

All the steps which are taken from the official page.

Several questions:
Should I build every time that I made a change in the RISCVomer folder?
Will I only change the content of files in the RISCVomer folder?
Since I will add one instruction and add one selection dag to the existing one(RISCV), I should change a few files. However, I could not find which part I should continue from the preliminaries. Could anyone direct me?
Am I in the wrong way? Should I follow another source?

Is there any tutorial that aims my approach? There are tutorials but It aims to create a backend for a new target which is cpu0. That makes understanding hard.

I don’t think duplicating the whole RISC-V target is the best way to do this (and will run into real problems as your doppelganger competes with the original to own the riscv32-linux-gnu triple etc). The guide you’re following is for when you have an entirely new CPU kind to implement, but that’s not what you’re doing at all.

You should be able to add your instruction to the existing llvm/lib/Target/RISCV/RISCVInstrInfo.td (and possibly RISCVInstrFormats.td in the same directory).

Should I build every time that I made a change in the RISCVomer folder?

Whenever you want to check how your change is working, certainly.

Will I only change the content of files in the RISCVomer folder?

Probably only the RISCV folder unless you’re doing something really exotic.

Thank you for the reply.
In this case, it might be a very simple question but I want to ask.
For my goal, I should edit
llvm/lib/Target/RISCV/RISCVInstrInfo.td ,
RISCVInstrFormats.td (in same directory),
/llvm/include/llvm/Target/TargetSelectionDAG.td ,
/llvm/lib/Target/RISCV/RISCVISelLowering.cpp ,
/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
those files right?
According to the suggestion, the best matching tutorial is the Alex Bradbury slide.

Those slides look like a pretty good overview

/llvm/include/llvm/Target/TargetSelectionDAG.td

You’re unlikely to need to modify this unless you’re doing something strange. Most things people need are already there.

/llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This would be where you add code to actually use your new instruction if the input pattern is too complicated for a TableGen (.td file) description.

/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

Same as above, but usually for even more exotic cases.

I will convert add followed by a multiply instruction to a new addmultiply instruction as an example on the official website. I should define at least a DAG to select the pattern. Am I wrong?
So, I should edit /llvm/lib/Target/RISCV/RISCVISelLowering.cpp because my aim is not an exotic case.

I don’t think you’ll need to edit RISCVISelLowering.cpp for that, no. TableGen can easily handle it.

1 Like

Regarding the 6th question in my original post, which folder should I run the command?
(Bay the way should I create a separate question for that? or is it better to be on one page?)

I usually make a build directory under the top-level checkout, but really anything empty is fine. The important bit is that you point CMake at /path/to/llvm-project/llvm because that’s where the top-level CMakeLists.txt file lives.

So, for example:

$ git checkout git@github.com:llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake ../llvm -DLLVM_ENABLE_PROJECTS=clang -G Ninja
$ ninja check-all

I think it’s fine to keep asking here.

1 Like

I open the terminal in the llvm-project/FirstTrial folder.
Then, I wrote
cmake -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=RISCV -G Ninja ../llvm
ninja check-all
those commands.
It started to build 3689 aims just like I was building the whole LLVM + clang 14.
Is it normal?

Yes. If you just want to build an individual tool you can specify that instead (ninja llc is the most likely one you’ll want). Still a couple of thousand files though.

1 Like

I write the code below to the end of the TargetSelectionDAG.td.

/// what I add
class RISCVReg<bits<5> Enc, string n, list<string> alt = []> : Register<n>
{
    let HWEncoding{4-0} = Enc;
    let AltNames = alt;
    let Namespace ="RISCV";
}

def GPR : RegisterClass<"RISCV", [i32], 32, (add (sequence "X%u_32",0,31))>;

def MLA : Instruction {
    bits<32> Inst;
    bits<32> SoftFail = 0;
    bits<5> rs2;
    bits<5> rs1;
    bits<5> rd;
    let Namespace = "RISCV";
    let hasSideEffects = 0;
    let mayLoad = 0;
    let mayStore = 0;
    let Size = 4;
    let Inst{31-25} = 0b0000000; /*funct7*/ //I will change this and funct3
    let Inst{24-20} = rs2; /*rs2*/
    let Inst{19-15} = rs1; /*rs1*/
    let Inst{14-12} = 0b000; /*funct3*/
    let Inst{11-7} = rd; 
    let Inst{6-0} = 0b0110011; /*opcode*/

    dag OutOperandList = (outs GPR:$rd);
    dag InOperandList = (ins GPR:$rs1 ,  GPR:$rs2);
    let AsmString = "mla\t$rd, $rs1, $rs2";
}

def : Pat<  (add (mul GPR:$src1, GPR:$src2), GPR:$src3),
            (MLA GPR:$src1, GPR:$src2, GPR:$src3)>;

However, it gives an error below

/home/llvm/llvm-project/llvm/include/llvm/Target/TargetSelectionDAG.td:1692:1: error: No def named 'X0_32': (sequence "X%u_32", 0, 31)
def GPR : RegisterClass<"RISCV", [i32], 32, (add (sequence "X%u_32",0,31))>;
^

I also check the work from this slide. The works are similar (I mean the MLA instruction extension.)
The only thing that I have done differently is the definition of MLA instruction. It was in the RISCVInstrInfo.td file according to the slide by Alex Bradbury. However, I made this edit in the TargetSelectionDAG.td.
Should I make it in RISCVInstrInfo.td ? or did I have another syntax error about the definition of GPR?

Yes, that’s the problem. Target specific code does not belong in the generic TargetSelectionDAG.td file.

You also shoudldn’t try to duplicate the RISC-V definition of RISCVReg or GPR (which lives in RISCVRegisterInfo.td).

1 Like

When I make it, it shows :

/home/llvm/llvm-project/llvm/include/llvm/Target/TargetSelectionDAG.td:1702:18: error: Variable not defined: 'GPR'
def : Pat<  (add GPR:$rs1, GPR:$rs2),
                 ^

I comment on my original code (as shown below) to be sure that there is no problem there. I directly wrote the same code from the slide but it gives the same error.

/*def : Pat<  (add (mul GPR:$src1, GPR:$src2), GPR:$src3),
            (MLA GPR:$src1, GPR:$src2, GPR:$src3)>;*/
def : Pat<  (add GPR:$rs1, GPR:$rs2),
            (MLA GPR:$rs1, GPR:$rs2)>;

Which part that I am missing?
(By the way, I can’t thank you enough.)

You need to move your changes out of TargetSelectionDAG.td and into RISCVInstrInfo.td. TargetSelectionDAG.td, apart from the layering violation, gets included too early in the RISC-V TableGen phase to see the RISC-V defintions (like GPR).

1 Like

It continues to build. Thank you so much. I will break for two days. Then, I will update here on Monday according to the result.

1 Like

It is completed by 2130/2375. However, it gives an error message like that:

 Instruction 'MLA' was provided 3 operands but expected only 2!
def : Pat<  (add (mul GPR:$src1, GPR:$src2), GPR:$src3),
^
anonymous_40746: 	(MLA:{ *:[i32] m1:[i64] } GPR:{ *:[i32] m1:[i64] }:$src1, GPR:{ *:[i32] m1:[i64] }:$src2, GPR:{}:$src3)

My pattern was like that :

def : Pat<  (add (mul GPR:$src1, GPR:$src2), GPR:$src3),
            (MLA GPR:$src1, GPR:$src2, GPR:$src3)>;

Actually, I wasn’t sure that I can write like that but there was not an example in the slide If I don’t misunderstand. I found this style from another slide.

Should I write like that?

def : Pat<  (mul (add GPR:$src1, GPR:$src2)),
            (MLA GPR:$src1, GPR:$src2)>;

I want to take the two sources of add function as sources of MLA function. Also, the destination of mul function as a destination of the MLA function.

The problem is in the definition of the instruction:

An MLA operation really does have 3 inputs, it’s just that this is usually implemented by the destination register also being read as the accumulator. In LLVM this would be expressed as

    dag OutOperandList = (outs GPR:$rd);
    dag InOperandList = (ins GPR:$rs1 ,  GPR:$rs2, GPR:$rs3);
    let Constraints = "$rd = $rs3";
    let AsmString = "mla\t$rd, $rs1, $rs2";
1 Like

Yes. Finally, it is built thank you so much.

Hi,
I am a beginner trying with the instruction pattern matching.
I tried out this code but on writing the c code and generating assembly , mla instruction is not invoking in assembly for me.

  1. Pattern and instruction definition is only required for invoking this mla instruction or do we have to code any other mapping?

Hello,
As far as I understood, yes for this mla example. However, there are some other files to change for some other examples like built-in LLVM functions.

Could you write what instructions was generated in assembly form instead of mla instruction?

By the way, I can say that I am also a beginner in this field but I record my progress during those questions. Even if it is in Turkish language you can continue.
This link is the github page for code and this link is the video list.

1 Like