Strategy for Portable and Constrained Instruction Generation

I am building patching a tool where I need to generate a “constrained”
machine instruction for various architectures. I would like to build
off of LLVM’s diverse code generation backends, but I’m at a loss
about the best way to do it.

The high-level problem is to generate a machine instruction from a
generic operate (e.g., add, sub), with constrained physical registers
and operand types (since I know a lot about the instruction I want to

For example, let say I want to generate an add instruction for an
ARM32 processor with 32-bit operands, dest register $r1, and source
registers $r2 and $r3. I would like build a tool off the LLVM source
code that would tell me what ARM instruction to use, and/or generate
the full machine instruction so I can get the binary instruction
encoding (bits).

I’m thinking of a few possibilities:

  • The gMIR looks nice. I might be able to create a gMIR instruction
    like G_ADD $r1, $r2, $r3. Then hopefully add some additional type
    information on the register operations, and have instruction
    selection pick the appropriate machine instruction. However, I
    don’t know if you can reference physical registers in gMIR

  • Maybe I can generate an LLVM IR instruction (platform independent),
    and somehow constrain the operands. So something like add %dst,
    %src1, %src2, where for example, %dst is somehow constrained to be
    register allocated to $r1, etc.? Is this possible? Then I could go
    through the codegen pipeline to get the ‘add’ instruction selected
    to the correct machine instruction and then the final encoding?

Assuming that one of the above, or another strategy would work, how
would I first specify the “constrained” instruction that is input. I
could generate it programmatically, but then I would have to figure
out how to run the codegen pipeline programmatically or query the
instruction selection pass programmatically. Or maybe could I embed
the input instruction in a .mir file and let llc generate some
intermediate results or a machine binary?

Any help is greatly appreciated, and I’m happy to provide more details
if it is helpful.

Thank you for reading and helping with LLVM!

You should write an email to llvm-dev mailing list to discuss this, which is currently the most active place f such discussions.