Support Instructions through instrinsic function in both riscv32 and riscv64

Hi all,

we are currently in the work of supporting custom RISCV instructions through builtin and intrinsic functions in.
A patch for LLVM 17 could look something like that:

Addition in clang/include/clang/Basic/BuiltinsRISCV.def

TARGET_BUILTIN(__builtin_riscv_sbox_sbox, "ii", "nc", "experimental-xsbox")

Addition in llvm/include/llvm/IR/IntrinsicsRISCV.td

let TargetPrefix = "riscv" in {
  multiclass SBOXIntrinsic<list<LLVMType> ret_types,
                           list<LLVMType> param_types,
                           list<IntrinsicProperty> intr_properties> {
    def "int_riscv_sbox_" # NAME
      : ClangBuiltin<"__builtin_riscv_sbox_" # NAME>,
        Intrinsic<ret_types,param_types,intr_properties>;
  }
  defm sbox : SBOXIntrinsic<[llvm_i32_ty],
                            [llvm_i32_ty],
                            [IntrNoMem]>;
} // TargetPrefix = "riscv"

Addition in llvm/lib/Target/RISCV/RISCVInstrInfo.td

// let Predicates = [HasNonStdExtSBOX] in {
  def RV_SBOX_sbox_Inst
    : RVInst<(outs GPR:$rd), (ins GPR:$rs1),
             "SBOX.sbox", "$rd, $rs1",
             [(set GPR:$rd, (int_riscv_sbox_sbox GPR:$rs1))],
             InstFormatOther>,
      Sched<[]> {

    let hasSideEffects = 0;
    let mayLoad = 0;
    let mayStore = 0;
    let Constraints = "";

    bits<5> rd;
    bits<5> rs1;

    let Inst{6-0} = 0b0101011;
    let Inst{11-7} = rd;
    let Inst{14-12} = 0b000;
    let Inst{19-15} = rs1;
    let Inst{24-20} = 0b00000;
    let Inst{31-25} = 0b0000100;
  }
} // Predicates = [HasNonStdExtSBOX]

This works all fine when using resulting LLVM/Clang compiler with the riscv32 target.

Recently, we startet supporting riscv64 with our tooling and are now hitting an llvm_unreachable in llvm/lib/Target/RISCV/RISCVISelLowering.cpp:
“Don’t know how to custom type legalize this intrinsic!”
This line moves between LLVM 17 and 19, but the string is very unique. Yes, we tested it in LLVM 19, too. There you have to use TableGen instead of this Clang definition file for the builtin function and some things on how the subtarget registration works has changed.

But our problem here is more that we are interessted in supporting this instruction both in riscv32 and riscv64. And I am not sure, how I have to generate the intrinsic so that it is transparent for C/C++ user which uses the builtin function with an integer as well legal for the riscv64 backend which does not allow operations including a 32 bit values. It would be nice if the could include this in a single compiler

I tried a few, but so far I just moved the error to another assertions.

Has anyone done something like that? Thanks in advance!

Does your underlying custom instruction read a 32-bit or 64-bit value for RV64, and does it write 32 bits, 64 bits or a sign/zero-extended 64-bit value?

1 Like

One thing that’s slightly unclear to me is whether your SBOX.sbox instruction is operating over 32-bit values kept in GPRs, or is operating over xlen-bit values. I’m going to assume the latter but let me know if that assumption is wrong.

Edited to add: if this operates over 32-bit values kept in GPRs on rv64, follow Craig’s post, below.

Working backwards from the instruction definition towards the C builtin:

  • The instruction definition looks correct, including the patterns. The patterns currently don’t apply because you’re not making an intrinsic call with MVT::i64 types, which is what the pattern would be expecting.
  • Your LLVM IR intrinsic is “wrong” for rv64 (if this is working on xlen-bit data). I think the easiest way forwards is with an overloaded intrinsic, where you use llvm_anyint_ty and and LLVMMatchType<0> for the return and arguments, respectively. This will mean that in rv32 LLVM IR, you end up with a call to @llvm.riscv.sbox_sbox.i32 and in rv64 LLVM IR, you end up with a call to @llvm.riscv.sbox_sbox.i64, if all goes to plan. You should only find clang ever creates this intrinsic with i32 or i64, given the code to map the builtin to this intrinsic.
  • Then the simplest thing to do is to update your ClangBuiltin to take and return long (which is xlen-bits in both rv32 and rv64 with the current ABIs). I forget exactly the letter to designate this type, as the main branch now has a new way of specifying types for builtins.

I think this should help get you most of the way to where you want?

I have probably missed something as it’s been a while since I implemented something like this, and I might be hoping a lot more happens automatically than actually does.

This same issue came up for orc_b, brev8, sha256*, sm3p0, and sm3p1 intrinsics which have i32 types.

The way we solved this is

  • Add a RISCVISD opcode for each of these intrinsics.
  • Add code to the INTRINSIC_WO_CHAIN handler in RISCVTargetLowering::ReplaceNodeResults to convert the intrinsic to the RISCVISD opcode by extending the operands to i64 and truncating the result back to i32.
  • For RV32, add code to LowerINTRINSIC_WO_CHAIN to convert the intrinsic to RISCVISD without changing the type or size of the operands.
  • Add isel patterns in tablegen that select the RISCVISD opcode.
1 Like

Internally sbox actually only uses the last 8 bit of rs1. Uses these 8 bit to select a value from a custom hidden register. Performs an operation with it and saves it to the 32 bit part of the rd register. Neither 32 or 64 bit is really needed, but it would be nice if it works in both targets.

We have a input language which is used for hardware generation and patching LLVM to support these instruction. In where the RISCV target uses 32 bit GPR registers. The hardware tooling just ignores this. Thus, I have to work with this model of the instruction at the moment.

I tried to replicate similar ideas from other instruction like you mentioned with llvm_anyint_ty or the RISCVISD opcode extending and truncating, but you said some things that helped me understand this topics better. I will try this tomorrow again and maybe ask for further information on this.

Given the description (that the instruction, on rv64, only uses an 8-bit value and returns a 32-bit value), don’t change the intrinsic to llvm_anyint_ty, it seems correct (because you don’t need the type to be different on rv64). Follow Craig’s advice about RISCVISD nodes.