Expanding a PseudoOp and accessing the DAG

I’ve got this PseudoOp defined:

def SDT_RELADDR : SDTypeProfile<1, 2, [SDTCisInt<0>, SDTCisInt<1>]>;
def XSTGRELADDR : SDNode<“XSTGISD::RELADDR”, SDT_RELADDR>;

let Constraints = “$dst = $addr” in { //, Uses= [GRP] in {
def RelAddr : XSTGPseudo< (outs GPRC:$dst),
(ins i64imm:$spoff, i64imm:$addr),
“! RELADDR $spoff, $dst”,
[(set GPRC:$dst, (XSTGRELADDR i64:$spoff,
(i64 (XSTGMVINI i64:$addr))
)
)]>;
}

GlobalAddresses get lowered to RelAddr nodes in our ISelLowering code. Now I just need to be able to expand this in our overridden expandPostRAPseudo function, however, I’m a bit worried that expansion happens too late (after things should already be MI’s, it seems). So things like patterns that try to match on that XSTGMVINI would have already been matched.

[as an aside, we’ve got patterns like:

def : Pat<(XSTGMVINI tglobaladdr:$off),
(MOVIMMZ_I64 tglobaladdr:$off)>;

]

So, first off, if I wanted to expand that DAG for the RelAddr Node, and I want something like (the first BuildMI below is pseudo code as I’m not sure how to accomplish it):

case XSTG::RelAddr:

BuildMI(MBB, MI, DL, MI->getOperand(1).DAG…) ???

BuildMI(MBB, MI, DL, get(XSTG::LOADI32_RI), MI->getOperand(0).getReg())
.addReg(MI->getOperand(0).getReg())
.addReg(XSTG::GRP);

Basically, I want to grab the XSTGMVINI that is the second operand to the XSTGRELADDR node from the psuedo op pattern above and then build an MI based on that DAG.

The resulting assembly for that DAG would be:

movimm rX, %rel(SYMBOL) #offset to SYMBOL

load rX, rX, GRP # rX ← mem[rx+GRP]

Where the ‘movimm’ corresponds to the first BuildMI above (and is the XSTGMVINI part of the DAG) and the ‘load’ corresponds to the second BuildMI above (I have that one working).

And secondly,

Would this RelAddr DAG:

[(set GPRC:$dst, (XSTGRELADDR i64:$spoff,

(i64 (XSTGMVINI i64:$addr)
)]

Have been pattern matched (as per the PAT above) such that the XSTGMVINI would have been transformed into:

MOVIMMZ_I64 tglobaladdr:$addr)

?

Phil

I've got this PseudoOp defined:

def SDT_RELADDR : SDTypeProfile<1, 2, [SDTCisInt<0>, SDTCisInt<1>]>;
def XSTGRELADDR : SDNode<"XSTGISD::RELADDR", SDT_RELADDR>;

let Constraints = "$dst = $addr" in { //, Uses= [GRP] in {
   def RelAddr : XSTGPseudo< (outs GPRC:$dst),
                                       (ins i64imm:$spoff, i64imm:$addr),
                                       "! RELADDR $spoff, $dst",
                                       [(set GPRC:$dst, (XSTGRELADDR
i64:$spoff,

(i64 (XSTGMVINI i64:$addr))
                                                        )
                                       )]>;
}

Since i64imm is an immediate, the constraint "$dst = $addr" doesn't make sense. The constraint is there to tie the input virtual register to the output virtual register, so that they will both be assigned the same physical register.

GlobalAddresses get lowered to RelAddr nodes in our ISelLowering code.
Now I just need to be able to expand this in our overridden
expandPostRAPseudo function, however, I'm a bit worried that expansion
happens too late (after things should already be MI's, it seems). So
things like patterns that try to match on that XSTGMVINI would have
already been matched.

The function expandPostRAPseudo is called after instruction selection, after MI-level SSA-based optimizations, after register allocation. In other words, quite late in the entire optimization sequence. Most of the actual optimization work is pretty much done at this point.

[as an aside, we've got patterns like:

def : Pat<(XSTGMVINI tglobaladdr:$off),
           (MOVIMMZ_I64 tglobaladdr:$off)>;

]

I'm not sure if the input will match if you have tglobaladdr in it. The 't' means "target", and in this context it means that the argument has already been handled by the target and is in a form that does not need any further work. The lowering from globaladdr to tglobaladdr would happen after the "bigger" pattern (i.e. the one matching XSTGMVINI) has been tried, so this pattern will likely never see this combination of arguments.

So, first off, if I wanted to expand that DAG for the RelAddr Node,

This is where I don't understand what you are trying to do. The machine instructions will be automatically generated and you don't have to build them yourself.

And secondly,

Would this RelAddr DAG:

[(set GPRC:$dst, (XSTGRELADDR i64:$spoff,
                                                          (i64
(XSTGMVINI i64:$addr)
)]

Have been pattern matched (as per the PAT above) such that the XSTGMVINI
would have been transformed into:

MOVIMMZ_I64 tglobaladdr:$addr)

AFAIK, no. Global address could be matched by i64 (or an integer type that could hold it), but not the other way around.

-Krzysztof

I've got this PseudoOp defined:

def SDT_RELADDR : SDTypeProfile<1, 2, [SDTCisInt<0>, SDTCisInt<1>]>;
def XSTGRELADDR : SDNode<"XSTGISD::RELADDR", SDT_RELADDR>;

let Constraints = "$dst = $addr" in { //, Uses= [GRP] in {
   def RelAddr : XSTGPseudo< (outs GPRC:$dst),
                                       (ins i64imm:$spoff, i64imm:$addr),
                                       "! RELADDR $spoff, $dst",
                                       [(set GPRC:$dst, (XSTGRELADDR
i64:$spoff,

(i64 (XSTGMVINI i64:$addr))
                                                        )
                                       )]>;
}

Since i64imm is an immediate, the constraint "$dst = $addr" doesn't make
sense. The constraint is there to tie the input virtual register to the
output virtual register, so that they will both be assigned the same
physical register.

GlobalAddresses get lowered to RelAddr nodes in our ISelLowering code.

Now I just need to be able to expand this in our overridden
expandPostRAPseudo function, however, I'm a bit worried that expansion
happens too late (after things should already be MI's, it seems). So
things like patterns that try to match on that XSTGMVINI would have
already been matched.

The function expandPostRAPseudo is called after instruction selection,
after MI-level SSA-based optimizations, after register allocation. In
other words, quite late in the entire optimization sequence. Most of the
actual optimization work is pretty much done at this point.

[as an aside, we've got patterns like:

def : Pat<(XSTGMVINI tglobaladdr:$off),
           (MOVIMMZ_I64 tglobaladdr:$off)>;

]

I'm not sure if the input will match if you have tglobaladdr in it. The
't' means "target", and in this context it means that the argument has
already been handled by the target and is in a form that does not need any
further work. The lowering from globaladdr to tglobaladdr would happen
after the "bigger" pattern (i.e. the one matching XSTGMVINI) has been
tried, so this pattern will likely never see this combination of arguments.

So, first off, if I wanted to expand that DAG for the RelAddr Node,

This is where I don't understand what you are trying to do. The machine
instructions will be automatically generated and you don't have to build
them yourself.

First off, I got this idea from the LLVM Cookbook chapter 8: Writing an
LLVM Backend: Lowering to multiple instructions. (now I'm having my doubts
as to whether this is the right approach)

Let me explain at the assembly level what I'm trying to accomplish.

We're trying to make position independent executables, so we intend to have
a switch like -fPIE. In that case we've designated some registers to be
pointers to various address spaces (and our processor is rather complicated
so there are several address spaces).

Right now, given a global variable called 'answer' in C we end up with the
following in the .s file:

  movimm r1, %rel(answer) # r1 <- offset to 'answer' symbol
  load r1, r1, 0 # r1<-mem[r1+0]

This isn't correct because it should be relative to the GRP register if the
PIE mode is chosen, what I'd like to get is either:

  movimm r1, %rel(answer)
  addI r1, GRP # r1 <- r1 + GRP
  load r1, r1, 0 # r1 <- mem[r1+0]

Or even better:

  movimm r1, %rel(answer)
  load r1, r1, GRP # r1 <- mem[r1+GRP]

What I'm getting at the moment is just this part:

  load r1, r1, GRP

So the movimm is missing. That's because I've added the Pseudo instruction
RelAddr and GlobalAddress nodes get converted to RelAddr nodes in
LowerGlobalAddress.... They used to get converted to the MVINI node type
there prior to adding the RelAddr pseudo inst.

It feels like more of this needs to be done in the LowerGlobalAddress
function, but I have no idea how to do it there - you seem to only be able
to get one instruction out of a lowering like that, not multiple
instructions. It also seems like (as you point out) the expansion phase is
too late to be doing it.

First off, I got this idea from the LLVM Cookbook chapter 8: Writing an
LLVM Backend: Lowering to multiple instructions. (now I'm having my
doubts as to whether this is the right approach)

There is a pass "ExpandISelPseudos", which handles instructions with custom inserters. You can mark instructions as having custom inserters in the .td files and then override the EmitInstrWithCustomInserter function to deal with them.

Let me explain at the assembly level what I'm trying to accomplish.

We're trying to make position independent executables, so we intend to
have a switch like -fPIE. In that case we've designated some registers
to be pointers to various address spaces (and our processor is rather
complicated so there are several address spaces).

Right now, given a global variable called 'answer' in C we end up with
the following in the .s file:

   movimm r1, %rel(answer) # r1 <- offset to 'answer' symbol
   load r1, r1, 0 # r1<-mem[r1+0]

This isn't correct because it should be relative to the GRP register if
the PIE mode is chosen, what I'd like to get is either:

   movimm r1, %rel(answer)
   addI r1, GRP # r1 <- r1 + GRP
   load r1, r1, 0 # r1 <- mem[r1+0]

Or even better:

   movimm r1, %rel(answer)
   load r1, r1, GRP # r1 <- mem[r1+GRP]

What I'm getting at the moment is just this part:

   load r1, r1, GRP

So the movimm is missing. That's because I've added the Pseudo
instruction RelAddr and GlobalAddress nodes get converted to RelAddr
nodes in LowerGlobalAddress.... They used to get converted to the MVINI
node type there prior to adding the RelAddr pseudo inst.

It feels like more of this needs to be done in the LowerGlobalAddress
function, but I have no idea how to do it there - you seem to only be
able to get one instruction out of a lowering like that, not multiple
instructions. It also seems like (as you point out) the expansion phase
is too late to be doing it.

Here's what I would do (based on what I understand about your target so far):

Define two additional ISD opcodes, specific to your target. One to denote a "normal" address, the other to mean "address using GRP". For example (you can invent better names for them): XSTGISD::ADDR_NORMAL and XSGTISD::ADDR_USE_GRP. Each of them will take a global address as an operand and return an address, and their only function will be to serve as a "tag" for the instruction selection algorithm to be able to apply different selection patterns to them.

In the .td file, define SDNodes corresponding to these opcodes, e.g. "addr_normal" and "addr_use_grp". Then, you can have these patterns for loads:

// Match a load from a non-relocatable address to a simple load
// instruction (with offset 0):
def: Pat<(load (addr_normal tglobaladdr:$addr)),
          (load tglobaladdr:$addr, 0)>;
// Match load from a relocatable address to a load with GRP:
def: Pat<(load (addr_use_grp tglobaladdr:$addr)),
          (load (movimm tglobaladdr:$addr), GRP)>;

The patterns above should use tglobaladdr, because you will still need custom LowerGlobalAddress to generate them first, and it may need to attach special "target flags" to these addresses.

Finally, in LowerGlobalAddress, you can check the relocation model, compilation options, etc. to see if you need to have relocatable addresses, or not:

SDValue XSTGISelLowering::LowerGlobalAddress(SDValue Addr, SelectionDAG &DAG) {
   ...
   if (NeedGRP) {
     SpecialTargetFlags = ...;
     SDValue TAddr = DAG.getTargetGlobalAddress(..., SpecialTargetFlags);
     return DAG.getNode(XSTGISD::ADDR_USE_GRP, ..., TAddr);
   }

   // Non-relocatable address:
   SDValue NAddr = DAG.getTargetGlobalAddress(...);
   return DAG.getNode(XSTGISD::ADDR_NORMAL, ..., NAddr);
}

-Krzysztof

First off, I got this idea from the LLVM Cookbook chapter 8: Writing an
LLVM Backend: Lowering to multiple instructions. (now I'm having my
doubts as to whether this is the right approach)

There is a pass "ExpandISelPseudos", which handles instructions with
custom inserters. You can mark instructions as having custom inserters in
the .td files and then override the EmitInstrWithCustomInserter function to
deal with them.

Let me explain at the assembly level what I'm trying to accomplish.

We're trying to make position independent executables, so we intend to
have a switch like -fPIE. In that case we've designated some registers
to be pointers to various address spaces (and our processor is rather
complicated so there are several address spaces).

Right now, given a global variable called 'answer' in C we end up with
the following in the .s file:

   movimm r1, %rel(answer) # r1 <- offset to 'answer' symbol
   load r1, r1, 0 # r1<-mem[r1+0]

This isn't correct because it should be relative to the GRP register if
the PIE mode is chosen, what I'd like to get is either:

   movimm r1, %rel(answer)
   addI r1, GRP # r1 <- r1 + GRP
   load r1, r1, 0 # r1 <- mem[r1+0]

Or even better:

   movimm r1, %rel(answer)
   load r1, r1, GRP # r1 <- mem[r1+GRP]

What I'm getting at the moment is just this part:

   load r1, r1, GRP

So the movimm is missing. That's because I've added the Pseudo
instruction RelAddr and GlobalAddress nodes get converted to RelAddr
nodes in LowerGlobalAddress.... They used to get converted to the MVINI
node type there prior to adding the RelAddr pseudo inst.

It feels like more of this needs to be done in the LowerGlobalAddress
function, but I have no idea how to do it there - you seem to only be
able to get one instruction out of a lowering like that, not multiple
instructions. It also seems like (as you point out) the expansion phase
is too late to be doing it.

Here's what I would do (based on what I understand about your target so
far):

Define two additional ISD opcodes, specific to your target. One to denote
a "normal" address, the other to mean "address using GRP". For example
(you can invent better names for them): XSTGISD::ADDR_NORMAL and
XSGTISD::ADDR_USE_GRP. Each of them will take a global address as an
operand and return an address, and their only function will be to serve as
a "tag" for the instruction selection algorithm to be able to apply
different selection patterns to them.

In the .td file, define SDNodes corresponding to these opcodes, e.g.
"addr_normal" and "addr_use_grp". Then, you can have these patterns for
loads:

// Match a load from a non-relocatable address to a simple load
// instruction (with offset 0):
def: Pat<(load (addr_normal tglobaladdr:$addr)),
         (load tglobaladdr:$addr, 0)>;
// Match load from a relocatable address to a load with GRP:
def: Pat<(load (addr_use_grp tglobaladdr:$addr)),
         (load (movimm tglobaladdr:$addr), GRP)>;

I'm not entirely sure what to replace 'load' with in the patterns above.

I notice that we have these defm's in our XSTGInstrInfo.td file:

defm LOADI64 : LoadOp< 0b1001010, "load", OpInfo_I64, II_LOAD1 >;
defm LOADF64 : LoadOp< 0b1001010, "load", OpInfo_F64, II_LOAD1 >;
defm LOADI32 : LoadOp< 0b1001010, "load", OpInfo_I32, II_LOAD1 >;
defm LOADF32 : LoadOp< 0b1001010, "load", OpInfo_F32, II_LOAD1 >;
defm LOADI16 : LoadOp< 0b1001010, "load", OpInfo_I16, II_LOAD1 >;
defm LOADI8 : LoadOp< 0b1001010, "load", OpInfo_I8, II_LOAD1 >;

I tried replacing 'load' with 'LOADI64' in the patter, like this:

def: Pat<(LOADI64 (XSTGADDR_NORMAL tglobaladdr:$addr)),
         (LOADI64 tglobaladdr:$addr, 0)>;

But that resulted in:

XSTGInstrPatterns.td:619:11: error: Variable not defined: 'LOADI64'
def: Pat<(LOADI64 (XSTGADDR_NORMAL tglobaladdr:$addr)),

First off, I got this idea from the LLVM Cookbook chapter 8: Writing an
LLVM Backend: Lowering to multiple instructions. (now I'm having my
doubts as to whether this is the right approach)

There is a pass "ExpandISelPseudos", which handles instructions with
custom inserters. You can mark instructions as having custom inserters in
the .td files and then override the EmitInstrWithCustomInserter function to
deal with them.

Let me explain at the assembly level what I'm trying to accomplish.

We're trying to make position independent executables, so we intend to
have a switch like -fPIE. In that case we've designated some registers
to be pointers to various address spaces (and our processor is rather
complicated so there are several address spaces).

Right now, given a global variable called 'answer' in C we end up with
the following in the .s file:

   movimm r1, %rel(answer) # r1 <- offset to 'answer' symbol
   load r1, r1, 0 # r1<-mem[r1+0]

This isn't correct because it should be relative to the GRP register if
the PIE mode is chosen, what I'd like to get is either:

   movimm r1, %rel(answer)
   addI r1, GRP # r1 <- r1 + GRP
   load r1, r1, 0 # r1 <- mem[r1+0]

Or even better:

   movimm r1, %rel(answer)
   load r1, r1, GRP # r1 <- mem[r1+GRP]

What I'm getting at the moment is just this part:

   load r1, r1, GRP

So the movimm is missing. That's because I've added the Pseudo
instruction RelAddr and GlobalAddress nodes get converted to RelAddr
nodes in LowerGlobalAddress.... They used to get converted to the MVINI
node type there prior to adding the RelAddr pseudo inst.

It feels like more of this needs to be done in the LowerGlobalAddress
function, but I have no idea how to do it there - you seem to only be
able to get one instruction out of a lowering like that, not multiple
instructions. It also seems like (as you point out) the expansion phase
is too late to be doing it.

Here's what I would do (based on what I understand about your target so
far):

Define two additional ISD opcodes, specific to your target. One to denote
a "normal" address, the other to mean "address using GRP". For example
(you can invent better names for them): XSTGISD::ADDR_NORMAL and
XSGTISD::ADDR_USE_GRP. Each of them will take a global address as an
operand and return an address, and their only function will be to serve as
a "tag" for the instruction selection algorithm to be able to apply
different selection patterns to them.

In the .td file, define SDNodes corresponding to these opcodes, e.g.
"addr_normal" and "addr_use_grp". Then, you can have these patterns for
loads:

// Match a load from a non-relocatable address to a simple load
// instruction (with offset 0):
def: Pat<(load (addr_normal tglobaladdr:$addr)),
         (load tglobaladdr:$addr, 0)>;
// Match load from a relocatable address to a load with GRP:
def: Pat<(load (addr_use_grp tglobaladdr:$addr)),
         (load (movimm tglobaladdr:$addr), GRP)>;

I'm not entirely sure what to replace 'load' with in the patterns above.

I notice that we have these defm's in our XSTGInstrInfo.td file:

defm LOADI64 : LoadOp< 0b1001010, "load", OpInfo_I64, II_LOAD1 >;
defm LOADF64 : LoadOp< 0b1001010, "load", OpInfo_F64, II_LOAD1 >;
defm LOADI32 : LoadOp< 0b1001010, "load", OpInfo_I32, II_LOAD1 >;
defm LOADF32 : LoadOp< 0b1001010, "load", OpInfo_F32, II_LOAD1 >;
defm LOADI16 : LoadOp< 0b1001010, "load", OpInfo_I16, II_LOAD1 >;
defm LOADI8 : LoadOp< 0b1001010, "load", OpInfo_I8, II_LOAD1 >;

I tried replacing 'load' with 'LOADI64' in the patter, like this:

def: Pat<(LOADI64 (XSTGADDR_NORMAL tglobaladdr:$addr)),
         (LOADI64 tglobaladdr:$addr, 0)>;

But that resulted in:

XSTGInstrPatterns.td:619:11: error: Variable not defined: 'LOADI64'
def: Pat<(LOADI64 (XSTGADDR_NORMAL tglobaladdr:$addr)),

Ah, I see, the defm is a multi-class so I needed to change it to:

def: Pat<(load (XSTGADDR_NORMAL tglobaladdr:$addr)),
         (LOADI64_RI tglobaladdr:$addr, 0)>;
// Match load from a relocatable address to a load with GRP:
def: Pat<(load (XSTGADDR_USE_GRP tglobaladdr:$addr)),
         (LOADI64_RI (MOVIMMZ_I64 tglobaladdr:$addr), GRP)>;

...at least that gets through TableGen.

Ah, I see, the defm is a multi-class so I needed to change it to:

  def: Pat<(load (XSTGADDR_NORMAL tglobaladdr:$addr)),
          (LOADI64_RI tglobaladdr:$addr, 0)>;
// Match load from a relocatable address to a load with GRP:
def: Pat<(load (XSTGADDR_USE_GRP tglobaladdr:$addr)),
          (LOADI64_RI (MOVIMMZ_I64 tglobaladdr:$addr), GRP)>;

Right.

...at least that gets through TableGen.

Excellent.

-Krzysztof

Ah, I see, the defm is a multi-class so I needed to change it to:

  def: Pat<(load (XSTGADDR_NORMAL tglobaladdr:$addr)),
          (LOADI64_RI tglobaladdr:$addr, 0)>;
// Match load from a relocatable address to a load with GRP:
def: Pat<(load (XSTGADDR_USE_GRP tglobaladdr:$addr)),
          (LOADI64_RI (MOVIMMZ_I64 tglobaladdr:$addr), GRP)>;

Right.

...at least that gets through TableGen.

Excellent.

Actually, I realized that the second should be a LOADI64_RR (reg reg
instead of reg immediate). So it's really this:

// Match load from a relocatable address to a load with GRP:
def: Pat<(load (XSTGADDR_USE_GRP tglobaladdr:$addr)),
         (LOADI64_RR (MOVIMMZ_I64 tglobaladdr:$addr), GRP)>;

When I tried running llc on my example code I get this:

ISEL: Starting pattern match on root node: 0x362d630: i64 =
XSTGISD::ADDR_USE_GRP 0x3628c90 [ORD=10] [ID=29]

  Initial Opcode index to 0
  Match failed at index 0
LLVM ERROR: Cannot select: 0x362d630: i64 = XSTGISD::ADDR_USE_GRP 0x3628c90
[ORD=10] [ID=29]
  0x3628c90: i64 = TargetGlobalAddress<i32 addrspace(4)* @answer> 0 [TF=7]
[ORD=10] [ID=22]
In function: main

I see the following in my SelectCode (in XSTGGenDGISel.inc):

/*2235*/ OPC_SwitchOpcode /*2 cases */, 27,
TARGET_VAL(XSTGISD::ADDR_NORMAL),// ->2266
/*2239*/ OPC_RecordChild0, // #1 = $addr
/*2240*/ OPC_MoveChild, 0,
/*2242*/ OPC_CheckOpcode, TARGET_VAL(ISD::TargetGlobalAddress),
/*2245*/ OPC_MoveParent,
/*2246*/ OPC_MoveParent,
/*2247*/ OPC_CheckPredicate, 5, // Predicate_unindexedload
/*2249*/ OPC_CheckPredicate, 6, // Predicate_load
/*2251*/ OPC_CheckType, MVT::i64,
/*2253*/ OPC_EmitMergeInputChains1_0,
/*2254*/ OPC_EmitInteger, MVT::i64, 0,
/*2257*/ OPC_MorphNodeTo, TARGET_VAL(XSTG::LOADI64_RI),
0|OPFL_Chain|OPFL_MemRefs,
                      1/*#VTs*/, MVT::i64, 2/*#Ops*/, 1, 2,
                  // Src: (ld:i64 (XSTGADDR_NORMAL:iPTR
(tglobaladdr:iPTR):$addr))<<P:Predicate_unindexedload>><<P:Predicate_load>>
- Complexity = 10
                  // Dst: (LOADI64_RI:i64 (tglobaladdr:i64):$addr, 0:i64)

Not sure why the initial Opcode index is being set to 0 instead of 2235?

Phil

That's where matching of ADDR_NORMAL begins. Is the code matching ADDR_USE_GRP in the .inc file?

-Krzysztof

I see the following in my SelectCode (in XSTGGenDGISel.inc):

/*2235*/ OPC_SwitchOpcode /*2 cases */, 27,
TARGET_VAL(XSTGISD::ADDR_NORMAL),// ->2266
/*2239*/ OPC_RecordChild0, // #1 = $addr
/*2240*/ OPC_MoveChild, 0,
/*2242*/ OPC_CheckOpcode, TARGET_VAL(ISD::TargetGlobalAddress),
/*2245*/ OPC_MoveParent,
/*2246*/ OPC_MoveParent,
/*2247*/ OPC_CheckPredicate, 5, // Predicate_unindexedload
/*2249*/ OPC_CheckPredicate, 6, // Predicate_load
/*2251*/ OPC_CheckType, MVT::i64,
/*2253*/ OPC_EmitMergeInputChains1_0,
/*2254*/ OPC_EmitInteger, MVT::i64, 0,
/*2257*/ OPC_MorphNodeTo, TARGET_VAL(XSTG::LOADI64_RI),
0|OPFL_Chain|OPFL_MemRefs,
                       1/*#VTs*/, MVT::i64, 2/*#Ops*/, 1, 2,
                   // Src: (ld:i64 (XSTGADDR_NORMAL:iPTR

(tglobaladdr:iPTR):$addr))<<P:Predicate_unindexedload>><<P:Predicate_load>>
- Complexity = 10
                   // Dst: (LOADI64_RI:i64 (tglobaladdr:i64):$addr, 0:i64)

Not sure why the initial Opcode index is being set to 0 instead of 2235?

That's where matching of ADDR_NORMAL begins. Is the code matching
ADDR_USE_GRP in the .inc file?

Oh, sorry, copied the wrong section, yes it's there:

/*2266*/ /*SwitchOpcode*/ 35, TARGET_VAL(XSTGISD::ADDR_USE_GRP),//
->2304
/*2269*/ OPC_RecordChild0, // #1 = $addr
/*2270*/ OPC_MoveChild, 0,
/*2272*/ OPC_CheckOpcode, TARGET_VAL(ISD::TargetGlobalAddress),
/*2275*/ OPC_MoveParent,
/*2276*/ OPC_MoveParent,
/*2277*/ OPC_CheckPredicate, 5, // Predicate_unindexedload
/*2279*/ OPC_CheckPredicate, 6, // Predicate_load
/*2281*/ OPC_CheckType, MVT::i64,
/*2283*/ OPC_EmitMergeInputChains1_0,
/*2284*/ OPC_EmitNode, TARGET_VAL(XSTG::MOVIMMZ_I64), 0,
                      1/*#VTs*/, MVT::i64, 1/*#Ops*/, 1, // Results = #2
/*2292*/ OPC_EmitRegister, MVT::i64, XSTG::GRP,
/*2295*/ OPC_MorphNodeTo, TARGET_VAL(XSTG::LOADI64_RR),
0|OPFL_Chain|OPFL_MemRefs,
                      1/*#VTs*/, MVT::i64, 2/*#Ops*/, 2, 3,
                  // Src: (ld:i64 (XSTGADDR_USE_GRP:iPTR
(tglobaladdr:iPTR):$addr))<<P:Predicate_unindexedload>><<P:Predicate_load>>
- Complexity = 10
                  // Dst: (LOADI64_RR:i64 (MOVIMMZ_I64:i64
(tglobaladdr:i64):$addr), GRP:i64)
/*2304*/ 0, // EndSwitchOpcode

I don't know. Could you post the entire output from debug-only=isel?

-Krzysztof