Generating a custom opcode from an LLVM intrinsic

Hello all. LLVM newbie here. If anything seems glaringly wrong with my use of LLVM, that’s probably why.

Here’s what I’m trying to do. I have modified the gem5 simulator to accept a “new” x86 instruction. I’ve done this by just reserving the opcode in gem5’s ISA specification, just as all other instructions are specified.

I’m trying to get an LLVM backend to generate this opcode during code generation. My current plan is:

  1. During an LLVM pass, I’ll detect a series of instructions which can be replaced with this new instruction. (The new instruction is a “cache compute” instruction – in my passes, I replace a series of loads, operations, and stores with this single instruction.) This step is complete.

  2. I replace the series of instructions with an intrinsic. I have added an intrinsic using the instructions here. This step is complete.

  3. During code generation, the intrinsic should be converted to this reserved opcode. This is where I’m stuck.
    I’m stuck on step 3. I have two main questions that should unblock me:

Question 1: where is the code that maps from intrinsics to instructions? The link above states:

“Add support to the .td file for the target(s) of your choice in lib/Target//.td. This is usually a matter of adding a pattern to the .td file that matches the intrinsic, though it may obviously require adding the instructions you want to generate as well. There are lots of examples in the PowerPC and X86 backend to follow.”

However, looking through these examples isn’t illuminating anything for me. Any more documentation or high-level explanation on this subject would be really helpful. I have read something about “lowering” of intrinsics; not sure if that’s relevant.

Question 2: will I be able to generate this opcode directly from the intrinsic, or will I have to add the opcode as an LLVM IR instruction and specify how it gets compiled? I can imagine two options:
option 1: I can define a “translation” from intrinsic straight to an x86 opcode.
option 2: I can define a “translation” (perhaps in a .td file? I think that’s what they’re used for) which translates my intrinsic into a new instruction, and then I can define another translation which will map the new instruction to my opcode during code gen. If this is the case, I’m not sure there’s any point to having an intrinsic; I should just add a new instruction instead.

Hoping someone can help! As you can tell, I’m a little lost…the documentation for LLVM is great, but it’s a little above my level right now :slight_smile:

Gus Smith, PSU

Here’s a couple examples for mapping an intrinsic to an X86 instruction from X86InstrInfo.td. If you look for int_x86_* in any X86Instr*.td you can find others.

let Predicates = [HasCLFLUSHOPT], SchedRW = [WriteLoad] in
def CLFLUSHOPT : I<0xAE, MRM7m, (outs), (ins i8mem:$src),
“clflushopt\t$src”, [(int_x86_clflushopt addr:$src)],
IIC_SSE_PREFETCH>, PD;

let Predicates = [HasCLWB], SchedRW = [WriteLoad] in
def CLWB : I<0xAE, MRM6m, (outs), (ins i8mem:$src), “clwb\t$src”,
[(int_x86_clwb addr:$src)], IIC_SSE_PREFETCH>, PD;

The encoding information for the binary output is buried in these definitions too. If you tell me what opcode you’ve chosen I can tell you what the right things are to get the binary output.

Craig, thanks for the quick response. That helps a lot. I had no clue they were buried in there, though I guess I should have looked harder – the hex should have given me a clue, perhaps!

For the sake of my own edification (and not taking up too much of your time) I will try to generate it myself. I’ve found the definition of the “I” class at line 358 of llvm/lib/Target/X86/X86InstrFormats.td, which helps a lot.

Let’s assume I want to produce opcode 0x16 (which I’m using because it doesn’t seem to be implemented in gem5 otherwise, and would simply produce a warning). Then my guess is that I should use something like:

def CACHEADD : I<0x16, FORMAT, (outs), (ins),
ASM, [(int_cache_add)]>, PD;

where FORMAT comes from http://legup.eecg.utoronto.ca/doxygen/namespacellvm_1_1X86II.html
and ASM = ???
and i deleted IIC_SSE_PREFETCH (because I’m not sure what this flag indicates, but I assume it’s not needed).
I’m not sure what that PD is or if it should stay.

Looking for input on this! Clearly it’s not correct as-is, but I feel like I’m at least understanding parts of it. Thanks!

For posterity, this page helped a lot, and probably should have been read first: https://llvm.org/docs/TableGen/index.html
In smaller part, this one helped too, but read the above page first: https://llvm.org/docs/TableGen/LangRef.html

ASM is the text output you want printed in a textual listing of the assembly. The curly braces you see in some text strings like “adcx{l}\t{$src, $dst|$dst, $src}” are there to provide different operand orders for at&t syntax vs intel syntax. Anything after $ matches the name in the outs/in part of the instruction.

IIC_SSE_PREFETCH is part of the scheduler system to provide latency/throughput information about the instruction.

PD indicates the instruction should be on the 0x0f two byte opcode map with a 0x66 prefix.

Most common other values in place of PD
TB - 0x0f opcode map no prefix(0x66, 0xf2, 0xf3) and use of one of those prefixes should be ignored by the disassembler.
PS - 0x0f opcode map no prefix, but if the disassembler sees a prefix it should not decode to this instruction. Should be used when there is another instruction with the same opcode that uses a prefix
PD - 0x0f opcode map with 0x66 prefix
XS - 0x0f opcode map with 0xf3 prefix
XD - 0x0f opcode map with 0xf2 prefix
T8 - 0x0f 0x38 opcode map with no prefix
T8PS - 0x0f 0x38 opcode map version of PS from above
T8PD - 0x0f 0x38 opcode map version of PD from above

T8XS - 0x0f 0x38 opcode version of XS from above
T8XD - 0x0f 0x38 opcode version of XD from above

TA - 0x0f 0x3a opcode map with no prefix
TAPS - 0x0f 0x3a opcode map version of PS from above
TAPD - 0x0f 0x3a opcode map version of PD from above

TAXS - 0x0f 0x3a opcode version of XS from above
TAXD - 0x0f 0x3a opcode version of XD from above

Great info – all of this has been incredibly useful. Do you have any links to the documentation from this, or does it just come from your experiential knowledge?

FYI, I achieved what I set out to achieve when I wrote this email. I’m moving on to a more complex goal now, but the original question was answered completely, in my opinion. This was the key line:

def CACHEOP : I<0x06, RawFrm, (outs), (ins), “cache_op”, [(int_cache_op)]>;

I added this definition to llvm/lib/Target/X86/X86InstrInfo.td. I also had to comment out an instruction (PUSHES) which overlapped the 0x06 opcode. This was OK in my case (as far as I know) because PUSHES isn’t implemented in gem5.

Thanks again!
Gus


Great info – all of this has been incredibly useful. Do you have any links to the documentation from this, or does it just come from your experiential knowledge?

FYI, I achieved what I set out to achieve when I wrote this email. I’m moving on to a more complex goal now, but the original question was answered completely, in my opinion. This was the key line:

def CACHEOP : I<0x06, RawFrm, (outs), (ins), “cache_op”, [(int_cache_op)]>;

I added this definition to llvm/lib/Target/X86/X86InstrInfo.td. I also had to comment out an instruction (PUSHES) which overlapped the 0x06 opcode. This was OK in my case (as far as I know) because PUSHES isn’t implemented in gem5.

Thanks again!
Gus​

Great info – all of this has been incredibly useful. Do you have any links to the documentation from this, or does it just come from your experiential knowledge?

FYI, I achieved what I set out to achieve when I wrote this email. I’m moving on to a more complex goal now, but the original question was answered completely, in my opinion. This was the key line:

def CACHEOP : I<0x06, RawFrm, (outs), (ins), “cache_op”, [(int_cache_op)]>;

I added this definition to llvm/lib/Target/X86/X86InstrInfo.td. I also had to comment out an instruction (PUSHES) which overlapped the 0x06 opcode. This was OK in my case (as far as I know) because PUSHES isn’t implemented in gem5.

Thanks again!
Gus

I’m not sure how thorougly any of this is documented, particularly the X86 encoding bits. I know them because I implemented a lot of it. I should write some of it up.