load instruction erroneously removed by GVN v2

Hello to whom this may concern,

Versioned this as I saw identical title before. I’m compiling a clang project where I’m seeing GVN mess up and replace a load with a wrong def value. I am using LLVM-3.5, but the problem has been observed upto 3.8.
To illustrate the problem,

define i32 @main

scalar.ph:

<initialize [80 x i16] %dest>

preheader:

%index=0

br test, loop1, bb2

loop1:

… write to %dest in increasing index // ptr-based while loop

%ptr++;

br test, loop1, bb2

bb2:

%lcssa = phi [%ptr, loop1], [%ptr, preheader]
store i16 0, i16* %lcssa !dbg !20094 !tbaa 20030 // write null byte at end

%76 = getelementptr inbounds [80 x i16]* %dest, i64 0, i64 7, !dbg !20095 // load addr of null byte (7th index)
%77 = load i16* %76, align 2, !dbg !20095, !tbaa !20010
%78 = icmp eq i16 %77, 0, !dbg !20095
br i1 %78, label %80, label %79, !dbg !20095

GVN calls processNonLocalLoad() on “%77 = load…” and replaces it with init value from scalar.ph.

bb2:
%lcssa = phi [%ptr, loop1], [%ptr, preheader]
store i16 0, i16* %lcssa, !dbg !20094, !tbaa !20030
%76 = getelementptr inbounds [80 x i16]* %dest, i64 0, i64 7, !dbg !20095
br i1 icmp eq (i16 trunc (i128 lshr (i128 bitcast (<8 x i16> <i16 120, i16 120, i16 120, i16 120, i16 120, i16 120, i16 120, i16 120> to i128), i128 96) to i16), i16 0), label %78, label %77, !dbg !20095 // simplifies to “icmp eq (i16 120, i16 0) → false”

I first suspected problem might be TBAA; invoking clang with -fno-strict-aliasing makes the test pass (similarly, opt with -basicaa does not make GVN transform the load). When I look at the C/C++ source code, I cannot find any type-based aliasing violations from eyeballing.

I started looking at the aliasing and landed at getNonLocalPointerDepFromBB(), in which the worklist algorithm to find MemDep reported the result from the init block, ignoring all the kills after it. I did see one of the parm was SkipFirstBlock and this appears to ignore the store %ptr above the load. Is there a reason why it skips first block? Shouldn’t it look at the preceding instructions in the same block as they could contain kill points? I am still unfamiliar with the algorithm used here and would very much appreciate if someone could educate or point me towards right direction.

Regards,

Kevin

Hello to whom this may concern,

Versioned this as I saw identical title before. I'm compiling a clang
project where I'm seeing GVN mess up and replace a load with a wrong def
value.

Do you have a c testcase or a .ll file i can actually compile?

I am using LLVM-3.5, but the problem has been observed upto 3.8.
To illustrate the problem,

define i32 @main
scalar.ph:
    <initialize [80 x i16] %dest>
...
preheader:
%index=0
br test, loop1, bb2
loop1:
  ... write to %dest in increasing index // ptr-based while loop
  %ptr++;
  br test, loop1, bb2
bb2:
  %lcssa = phi [%ptr, loop1], [%ptr, preheader]
  store i16 0, i16* %lcssa !dbg !20094 !tbaa 20030 // write null byte at
end
  %76 = getelementptr inbounds [80 x i16]* %dest, i64 0, i64 7, !dbg
!20095 // load addr of null byte (7th index)
  %77 = load i16* %76, align 2, !dbg !20095, !tbaa !20010
  %78 = icmp eq i16 %77, 0, !dbg !20095
  br i1 %78, label %80, label %79, !dbg !20095

GVN calls processNonLocalLoad() on "%77 = load..." and replaces it with
init value from scalar.ph.

bb2:
%lcssa = phi [%ptr, loop1], [%ptr, preheader]
store i16 0, i16* %lcssa, !dbg !20094, !tbaa !20030
%76 = getelementptr inbounds [80 x i16]* %dest, i64 0, i64 7, !dbg !20095
br i1 icmp eq (i16 trunc (i128 lshr (i128 bitcast (<8 x i16> <i16 120, i16
120, i16 120, i16 120, i16 120, i16 120, i16 120, i16 120> to i128), i128
96) to i16), i16 0), label %78, label %77, !dbg !20095 // simplifies to
"icmp eq (i16 120, i16 0) --> false"

I first suspected problem might be TBAA; invoking clang with
-fno-strict-aliasing makes the test pass (similarly, opt with -basicaa does
not make GVN transform the load). When I look at the C/C++ source code, I
cannot find any type-based aliasing violations from eyeballing.

Can you excerpt the actual source code?

I started looking at the aliasing and landed at
getNonLocalPointerDepFromBB(), in which the worklist algorithm to find
MemDep reported the result from the init block, ignoring all the kills
after it. I did see one of the parm was SkipFirstBlock and this appears to
ignore the store %ptr above the load. Is there a reason why it skips first
block?

Yes. It is looking for *non-local* dependences. The block it should skip
is bb2, not loop1.
There is another call for local dependences. If you ask for local
dependences, then non-local ones, as GVN does, there is no point in having
it go and look at the local ones again :slight_smile:

Look at how GVN calls this:
MemDepResult Dep = MD->getDependency(L);

   // If it is defined in another block, try harder.
   if (Dep.isNonLocal())
     return processNonLocalLoad(L);

(IE get local dependence, if there is none, go looking at non-local
dependences).

The list of non-local dependences it includes should definitely include the
one from block "loop1", and if it is ignoring it, it is most likely thinks
it does not alias for some reason.

Thanks for quick reply Daniel,

I tried to make a simple C testcase, but could not reproduce the same condition with output from Clang. I suppose I could modify the C code to make it look similar with TBAA’s; I may be able to provide this by eod.

store %ptr above the load.

My mistake; I was referring to the store $lcssa in bb2. Looking at the C source code, it should definitely alias with %dest. I will try and see if I can find out why it thinks there is no local dependence.

Thanks,

Kevin

Thanks for quick reply Daniel,

I tried to make a simple C testcase, but could not reproduce the same
condition with output from Clang.

even if you have a .ll file you can share, that would be helpful.

I suppose I could modify the C code to make it look similar with TBAA's; I
may be able to provide this by eod.

> store %ptr above the load.

My mistake; I was referring to the store $lcssa in bb2. Looking at the C
source code, it should definitely alias with %dest. I will try and see if I
can find out why it thinks there is no local dependence.

Yes, i would start by seeing why the getdependency call is returning no dep
:slight_smile:

At a glance from your file, the load and that store are tagged with
different TBAA tags.
THat does not necessarily mean they don't alias, i'd have to see the TBAA
tree to say for sure.
The TBAA rule is that a tag aliases all of it's descendants and ancestors
in the tree.

So if the tree look like this:

!something
    / \
!20010 !20030

It will say no alias
but if it looks like

!something
     >
!20010
    >
!20030

TBAA will say they alias (something else may still say no-alias for other
reasons, of course)

before inlining
all 20005

after inlining
somewhere here changed made it NoAlias
after Global Variable Optimizer
20014
20373 20255
20372 20254

before GVN
19993
20011 19991
20010 20030

It appears that TBAA metadata certainly changed after inlining and subsequent passes. I have attached the .bc file. I think I will try to dump out more TBAA metadata between passes. The method in interest is @main.

test.bc (2.91 MB)

I think I’ve narrowed it to a failure point:

I am seeing that after
[0x1a5b080] %35 = call i16* @PAL_wcsncat(i16* %7, i16* %33, i64 %34), !dbg !20064

is inlined into main, the alias of load and the preceding store no longer are aliased. I verified this via inserting modulePrinterPass right after inliningPass.

I am wondering if this is because pointer aliasing is incorrectly set up on call to PAL_wcsncat or something is wrong in the inliner. I will see if I can pull the pointer aliases on the callsite.

-Kevin

Does anyone know how LLVM encodes aliasing on a call? I’m looking at the code in GVN and it seems like it’s pulling TBAA metadata as below.

(ImmutableCallSite*)CS.getInstruction()->getMetadata(LLVMContext::MD_tbaa))

but I do not see !tbaa attached to the call instruction, only !dbg. This is bitcode from LTO link step and I see that LTO pass manager uses both TypeAA and BasicAA.

Any input is appreciated,

-Kevin

So after checking the emitted bitcode in the LTO-link step, I found that the method wcsncat already had NoAlias relation with the destination buffer. I had assumed that the !tbaa metadata was module-wide and compared the load/store aliases of method wcsncat and its caller, main.

Dumped out LLVM bitcode at different steps:

test1.c.ll: // test case compiled with -flto and linked against the static lib
main: !179

wchar.cpp.ll: // compiled with -flto that is a part of the static lib
wcsncat: !362

lto.ll // bitcode emitted by LTO link
main: !20057
wcsncat: !20871 // these metadata tells us that they have a NoAlias relation

I have a few questions at this point:

  1. Is the !tbaa metadata module-wide as in, effective across methods? If not, is inliner suppose to resolve the aliasing of parms/args during inlining?

  2. If 1)'s first Q is true and second false, should the gold linker be fixing alias relations? (if true, possibly a gold bug?)

Thanks in advance,

Kevin Choi

Intel WOS