Instruction selection for 'load' based on static vs. dynamic data

Hi,

I've been looking at the new AVR backend lately, and one issue I've
found is that all 'load' IR instructions are matched using the 'ld'
AVR instruction, including 'load's for lookup tables generated from
switches.

On the AVR architecture, RAM and the program image are in completely
separated namespaces. There's a distinct 'lpm' (Load from Program
Memory) instruction that can load data from .text. So the nice way of
getting this to work is to select 'lpm' for those switch lookup table
'load's, thereby avoiding the memory footprint and the runtime cost of
copying the tables to RAM at startup. However, we still need to select
'ld' for other 'load's.

To give a concrete example, here's an LLVM IR snippet that should be
lowered to 'lpm':

     @switch.table = private unnamed_addr constant [15 x i8] c"\01\01\01\01\01\00\00\00\00\00\00\00\00\00\01"

     switch.lookup: ; preds = %start
       %2 = sext i8 %0 to i16
       %switch.gep = getelementptr inbounds [15 x i8], [15 x i8]* @switch.table, i16 0, i16 %2
       %switch.load = load i8, i8* %switch.gep, align 1
       br label %bb8

and here's one that should be lowered to 'ld':

     %S = type { i8 }

     start:
       %1 = getelementptr inbounds %S, %S* %0, i16 0, i32 0
       %2 = load i8, i8* %1, align 1
       ret i8 %2
     }

So my question is, what kind of pattern matching can I do in my 'lpm'
and 'ld' instructions that it would be able to make a distinction
about these two usecases?

Thanks,
         Gergo

To give a concrete example, here's an LLVM IR snippet that should be
lowered to 'lpm':

    @switch.table = private unnamed_addr constant [15 x i8]
c"\01\01\01\01\01\00\00\00\00\00\00\00\00\00\01"

    switch.lookup: ; preds = %start
      %2 = sext i8 %0 to i16
      %switch.gep = getelementptr inbounds [15 x i8], [15 x i8]*
@switch.table, i16 0, i16 %2
      %switch.load = load i8, i8* %switch.gep, align 1
      br label %bb8

and here's one that should be lowered to 'ld':

    %S = type { i8 }

    start:
      %1 = getelementptr inbounds %S, %S* %0, i16 0, i32 0
      %2 = load i8, i8* %1, align 1
      ret i8 %2
    }

So my question is, what kind of pattern matching can I do in my 'lpm'
and 'ld' instructions that it would be able to make a distinction
about these two usecases?

​I can only think of examining the load operand, which is a result of
getelementptr​. Then check the second
argument of getelementptr​ (%0 or @switch.table).

Maybe Leslie (cc'ed) can answer this question.

You can create the differentiation between objects in the data memory and in the program memory in LowerGlobalAddress (e.g. you can create different ISD opcodes that generate these addresses) and then use those to select corresponding instructions.

-Krzysztof

Right, which is how I also understand Zhai Xiang's suggestion with the link to existing AVR target code. This is already done to some extent, since the AVR backend already inserts an AVRISD::WRAPPER node to mark the static data's address.

However, what's still missing with that plan is how to propagate that information "upwards". By the time it gets to instruction selection, the `getelementptr` is already expanded into some arithmetic expression of a base address and some arbitrary offset calculations.

So for example, with this code:

%switch.gep = getelementptr inbounds [15 x i8], [15 x i8]* @switch.table, i16 0, i16 %2
%switch.load = load i8, i8* %switch.gep, align 1

by instrucion selection time, the argument to 'load' is

(add
   (sign_extend (CopyFromReg %vreg2))
   (WRAPPER TargetGlobalAddress<@switch.table>))

not a WRAPPER directly.

However, what's still missing with that plan is how to propagate that information "upwards". By the time it gets to instruction selection, the `getelementptr` is already expanded into some arithmetic expression of a base address and some arbitrary offset calculations.

At this point it's a problem of writing appropriate selection patterns.

So for example, with this code:
[...]

(add
   (sign_extend (CopyFromReg %vreg2))
   (WRAPPER TargetGlobalAddress<@switch.table>))

not a WRAPPER directly.

Such cases can be treated with conjunction with the load or store instruction, for example:

def: Pat<(ld (add (WRAPPER RC:$addr), (sign_extend RC:$offset))),
          (load_instruction_rr RC:$addr, RC:$offset)>;

Where "load_instruction" is a machine load instruction with base address and an offset, both in registers, and RC is the corresponding register class.

For cases where the offset is an immediate, you can have

def: Pat<(ld (add (WRAPPER RC:$addr), imm:$offset)),
          (load_instruction_ri RC:$addr, imm:$offset)>;

The "imm" is a general predicate for any immediate. If you want to only match a subset of immediates (e.g. those that are within some range), you can define your own PatLeaf and then use it in place of imm:

def Test : PatLeaf<(i32 imm), [{
   int64_t V = N->getSExtValue();
   return V > -17 && V < 12; // Only true for [-16, 11].
}]>;

def: Pat<(ld (add (WRAPPER RC:$addr), Test:$offset)),
          (load_instruction_ri RC:$addr, imm:$offset)>;

The "imm" in the second line can remain, it's the first part of the "Pat" that matters, but if you want, you can have "Test" in both.

There is also a target hook "isOffsetFoldingLegal" in TargetLowering, which tells the DAG combiner if it can make the immediate offset a part of the global address itself.

-Krzysztof

Can I also use something more complex than a single machine load instruction here? I.e. can I write something like

def: Pat<(ld (add (WRAPPER RC:$addr), (sign_extend RC:$offset))),
          (load_instruction_rr (special_add RC:$addr, RC:$offset))>;

? Or do I have to go via some pseudo-instruction for that?

You can have any tree of machine instructions as the output.

-Krzysztof

Cool! Even better would be if I could use non-machine (LLVM) instructions in the output pattern, and let ISel do its job on it. Is there a way to do that?

You can either have a target-specific DAG-combine function, or you can use "PreprocessISelDAG", but there is no way to do it in .td files.

-Krzysztof

Thanks, I managed to get it working using this approach; see
https://github.com/gergoerdi/llvm-avr/commit/bf415da65d102d86596753a541af8804b863928b
if you're interested in the details.

Yes, that's exactly how it works.

-Krzysztof