Help with new backend: byte-sized loads being generated for 'int' array access

First, apologies because I'm quite new to LLVM backend development. I very much appreciate any help from more experienced folks.

I'm running into a problem in which byte-sized loads are _sometimes_ being generated for a read access to an external array of 4-byte ints, depending on how the array is declared.

I am hoping someone can perhaps point me to possible sources of the problem in my backend code. I would be happy to supply additional details; I'm trying to keep this message relatively short.

The issue arises when I compile the following C code:

 extern int EI\[\];
 int MYFUNC\(\) \{ return EI\[1288\]; \}

I run the code through 'clang -emit-llvm' and end up with bitcode of:

; ModuleID = '\./clang2\.c'
target datalayout = "e\-m:e\-p:32:32\-i8:8:32\-i16:16:32\-i64:64\-n32\-S64"
target triple = "dgc"
@EI = external global \[0 x i32\]
; Function Attrs: nounwind
define i32 @MYFUNC\(\) \#0 \{
entry:
  %0 = load i32\* getelementptr inbounds \(\[0 x i32\]\* @EI, i32 0, i32 1288\), align 1
  ret i32 %0
\}

attributes \#0 = \{ nounwind "less\-precise\-fpmad"="false"
               "no\-frame\-pointer\-elim"="true"
               "no\-frame\-pointer\-elim\-non\-leaf"
               "no\-infs\-fp\-math"="false"
               "no\-nans\-fp\-math"="false"
               "stack\-protector\-buffer\-size"="8"
               "unsafe\-fp\-math"="false" "use\-soft\-float"="false" \}
\!llvm\.ident = \!\{\!0\}
\!0 = metadata \!\{metadata \!"clang version 3\.5\.0 \(209307\)"\}

When I then run the bitcode through llc, the memory load for the 'EI[1288]' reference is generated with a series of four byte-sized loads, followed by the appropriate shifting and OR'ing to get all the bytes into the proper place in the result.

This is not what I want, of course. I want a single, word-sized load to be generated. I have various sizes of load instructions defined in my TableGen file, an excerpt of which I've included at the end of this message.

Other backends built from the same source tree -- mipsel and xcore, for instance -- do indeed generate a single word-sized load, as expected, so I'm confident the problem is in my backend code.

What's interesting is that my backend *DOES* generate a single word-sized load if I make either of the following changes to the declaration of 'EI':

(1) Provide an array size in the EI declaration:
extern int EI[5000];

   This yields the following in the bitcode, replacing the like lines from above:
       @EI = external global \[5000 x i32\]
       ; Function Attrs: nounwind
       define i32 @MYFUNC\(\) \#0 \{
       entry:
         %0 = load i32\* getelementptr inbounds \(\[5000 x i32\]\* @EI, i32 0, i32 1288\), align 4
         ret i32 %0
       \}

(2) Change EI to be an int*:
extern int* EI;

   This yields the following in the bitcode:
     @EI = external global i32\*
     ; Function Attrs: nounwind
     define i32 @MYFUNC\(\) \#0 \{
     entry:
       %0 = load i32\*\* @EI, align 4
       %arrayidx = getelementptr inbounds i32\* %0, i32 1288
       %1 = load i32\* %arrayidx, align 4
       ret i32 %1
     \}

I have tried a number of things to figure out this issue, but to no avail. For some reason the 'EI[1288]' reference is being treated as possibly unaligned ("align=1"), but I can't figure out why.

TD file excerpt (modeled after the MIPS .td file):

def DGCAddrDefault :
ComplexPattern<iPTR, 2, "selectAddrDefault", [frameindex]>;
def DGCAddrInt :
ComplexPattern<iPTR, 2, "selectAddrInt", [frameindex]>;

def DGCMemSrc : Operand<iPTR> {
let MIOperandInfo = (ops ptr_rc, i32imm);
let OperandType = "OPERAND_MEMORY";
}

let canFoldAsLoad = 1,
mayLoad = 1 in
{
def LB : InstrDGC64_s__s_s<
(outs IntRegs:$rd),
(ins DGCMemSrc:$addr),
!strconcat("lb", "\t$rd, $addr"),
[(set i32:$rd, (sextloadi8 DGCAddrInt:$addr))],
0b10011, 0b000, 0, 0>;
def LH : InstrDGC64_s__s_s<
(outs IntRegs:$rd),
(ins DGCMemSrc:$addr),
!strconcat("lh", "\t$rd, $addr"),
[(set i32:$rd, (sextloadi16 DGCAddrDefault:$addr))],
0b10011, 0b001, 0, 0>;
def LBU : InstrDGC64_s__s_s<
(outs IntRegs:$rd),
(ins DGCMemSrc:$addr),
!strconcat("lbu", "\t$rd, $addr"),
[(set i32:$rd, (zextloadi8 DGCAddrDefault:$addr))],
0b10011, 0b100, 0, 0>;
def LHU : InstrDGC64_s__s_s<
(outs IntRegs:$rd),
(ins DGCMemSrc:$addr),
!strconcat("lhu", "\t$rd, $addr"),
[(set i32:$rd, (zextloadi16 DGCAddrInt:$addr))],
0b10011, 0b101, 0, 0>;
def LW : InstrDGC64_s__s_s<
(outs IntRegs:$rd),
(ins DGCMemSrc:$addr),
!strconcat("lw", "\t$rd, $addr"),
[(set i32:$rd, (load DGCAddrDefault:$addr))],
0b10011, 0b010, 0, 0>;
}

From: "Jeff Kuskin" <jk500500@yahoo.com>
To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, June 10, 2014 10:04:45 AM
Subject: [LLVMdev] Help with new backend: byte-sized loads being generated for 'int' array access

First, apologies because I'm quite new to LLVM backend development.
I very much appreciate any help from more experienced folks.

I'm running into a problem in which byte-sized loads are _sometimes_
being generated for a read access to an external array of 4-byte
ints, depending on how the array is declared.

I am hoping someone can perhaps point me to possible sources of the
problem in my backend code. I would be happy to supply additional
details; I'm trying to keep this message relatively short.

The issue arises when I compile the following C code:

 extern int EI\[\];
 int MYFUNC\(\) \{ return EI\[1288\]; \}

I run the code through 'clang -emit-llvm' and end up with bitcode of:

; ModuleID = &#39;\./clang2\.c&#39;
target datalayout =
&quot;e\-m:e\-p:32:32\-i8:8:32\-i16:16:32\-i64:64\-n32\-S64&quot;
target triple = &quot;dgc&quot;
@EI = external global \[0 x i32\]
; Function Attrs: nounwind
define i32 @MYFUNC\(\) \#0 \{
entry:
  %0 = load i32\* getelementptr inbounds \(\[0 x i32\]\* @EI, i32 0,
  i32 1288\), align 1
  ret i32 %0
\}

attributes \#0 = \{ nounwind &quot;less\-precise\-fpmad&quot;=&quot;false&quot;
               &quot;no\-frame\-pointer\-elim&quot;=&quot;true&quot;
               &quot;no\-frame\-pointer\-elim\-non\-leaf&quot;
               &quot;no\-infs\-fp\-math&quot;=&quot;false&quot;
               &quot;no\-nans\-fp\-math&quot;=&quot;false&quot;
               &quot;stack\-protector\-buffer\-size&quot;=&quot;8&quot;
               &quot;unsafe\-fp\-math&quot;=&quot;false&quot; &quot;use\-soft\-float&quot;=&quot;false&quot;
               \}
\!llvm\.ident = \!\{\!0\}
\!0 = metadata \!\{metadata \!&quot;clang version 3\.5\.0 \(209307\)&quot;\}

When I then run the bitcode through llc, the memory load for the
'EI[1288]' reference is generated with a series of four byte-sized
loads, followed by the appropriate shifting and OR'ing to get all
the bytes into the proper place in the result.

This is not what I want, of course. I want a single, word-sized load
to be generated. I have various sizes of load instructions defined
in my TableGen file, an excerpt of which I've included at the end of
this message.

Other backends built from the same source tree -- mipsel and xcore,
for instance -- do indeed generate a single word-sized load, as
expected, so I'm confident the problem is in my backend code.

What's interesting is that my backend *DOES* generate a single
word-sized load if I make either of the following changes to the
declaration of 'EI':

(1) Provide an array size in the EI declaration:
extern int EI[5000];

   This yields the following in the bitcode, replacing the like
   lines from above:
       @EI = external global \[5000 x i32\]
       ; Function Attrs: nounwind
       define i32 @MYFUNC\(\) \#0 \{
       entry:
         %0 = load i32\* getelementptr inbounds \(\[5000 x i32\]\*
         @EI, i32 0, i32 1288\), align 4
         ret i32 %0
       \}

(2) Change EI to be an int*:
extern int* EI;

   This yields the following in the bitcode:
     @EI = external global i32\*
     ; Function Attrs: nounwind
     define i32 @MYFUNC\(\) \#0 \{
     entry:
       %0 = load i32\*\* @EI, align 4
       %arrayidx = getelementptr inbounds i32\* %0, i32 1288
       %1 = load i32\* %arrayidx, align 4
       ret i32 %1
     \}

I have tried a number of things to figure out this issue, but to no
avail. For some reason the 'EI[1288]' reference is being treated as
possibly unaligned ("align=1"), but I can't figure out why.

For the question of how C is being translated into LLVM IR (why there is the 'align 1' vs 'align 4'), you should ask on the cfe-dev list (not here).

To mention a related point, if your target supports unaligned loads for 4-byte integers, then you need to override the *TargetLowering::allowsUnalignedMemoryAccesses callback for your target.

-Hal