IR load and xor instruction pattern matching

I am trying to match a pattern consist of two loads and one xor operation in LLVM IR using risc-v back-end. My sample .ll instructions are:

%3 = load i32, ptr %arrayidx9, align 4, !tbaa !6
%4 = load i32, ptr %arrayidx11, align 4, !tbaa !6
%xor12 = xor i32 %4, %3

Here is my edits on file

let mayLoad = 1 in{
def LXR : ALU_rr<0b0011011, 0b101, “lxr”>,
Sched<[WriteIALU, ReadIALU, ReadIALU]>;

def : Pat< (xor (load GPR:$rs1),(load GPR:$rs2)),
(LXR GPR:$rs1,GPR:$rs2)>;

I am having problems with defining the load pattern. Is it possible to reduce these three operations into one instruction?

What problems are you having? Because that snippet looks like it works fine for me (after fixing the quotes to be normal). When I compile

define i32 @foo(ptr %p1, ptr %p2) {
  %a = load i32, ptr %p1
  %b = load i32, ptr %p2
  %res = xor i32 %a, %b
  ret i32 %res

for riscv32-linux-gnu I get the new lxr instruction in the output.

Thanks for your kind reply. I compiled your IR code and it works fine for me also. However when i add an extra operand whose input is res, it stops using the lxr instruction.

define i32 @foo(ptr %p1, ptr %p2) {
%a = load i32, ptr %p1
%b = load i32, ptr %p2
%res = xor i32 %a, %b
%res2 = and i32 %res, %a
ret i32 %res2

I compiled this code and it gives following instructions:

lw a0, 0(a0)
lw a1, 0(a1)
xor a1, a0, a1
and a0, a1, a0

My purpose is actually running the ascon s-box algorithm in one instruction. Since my .ll file has this pattern frequently, i started reducing the instructions with this step. Is my approach is wrong and do you have any suggestions?

It looks like a profitability thing: because %a is used by the and as well as the xor, it would have to be emitted independently anyway and LLVM decides it’s not worth folding into an instruction. The responsible code is here, called when checking OPC_CheckFoldableChainNode as part of the pattern.

In most cases, particularly RISC ones (where an instruction typically only has one memory operand) this would be the right choice.

You can override IsProfitableToFold in the RISCV backend. In this case allowing multiple uses if U is an xor would be enough, though you might also want to check its other arguments are compatible with your LXR.

Arguably once you go to that level you’d just as well manually select it in RISCVISelDAGToDAG.cpp though, at least all the logic is in one place then.

1 Like

I can’t thank you enough. LLVM amazes me every step of it :slight_smile: . I tried another code without using %a as input of res2 and it worked.

But i have one more question. Suppose i have two numbers in an array and they are ordered in memory. The instruction i am trying to create will have the ALU_rr format and one of the inputs will be dummy. The other one will have the address info and it will load the value in that address and the next address.

%arrayidx = getelementptr inbounds [5 x i32], ptr %state, i32 0, i32 1
%0 = load i32, ptr %state
%1 = load i32, ptr %arrayidx
%xor = xor i32 %1, %0

I can give this as a prototype IR code. What i want as assembly line is the same lxr instruction but instead of having two inputs, there should be a pointer input points to %0. Is there any way to do this? I don’t know if i am asking too much. I am open to any suggestion.

I’m failing to understand what you’re trying to say. What assembly do you want that IR to give?

Hi we are working together with @emre on this project,
Basically what we want to capture is relatively distanced load pointers. In other words the first load can have any arbitrary address value “x” but the next load should have e.g. “x + 16” or consts like 4, 8, 12.
Our goal here is to capture load of array elements and match them with R-type instructions.

In the end what we want to produce is an instruction having only a pointer of the input array. The same pointer is used to read and write at target by overwriting memory. The second input is ignored by the target at assembly. Such as:

LXR ptr_to_the_array, xxx

I’m digging into RISCVISelDAGToDAG.cpp in the ISD::XOR select case to match this pattern. I would appreciate any advice on matching it with C++ as we used TableGen up until now.

I think that’s probably the right way to go, annoying as it is. The basic problem TableGen patterns can’t solve here is that the addresses for your two load are not independent: you need one to be exactly 4 greater than the other.

It should be a fairly easy pattern to write since your addressing-mode seems very simple (basically just a single register that has to contain the base address). So you just need to look for an xor fed by two loads (not sign-extending!), with one of the addresses an ISD::ADD 4 of the other.

Exclude volatile & atomics for good measure and you should be there.