Inexplicable ASAN report. Code generation bug?

I'm struggling to explain an ASAN report I'm now getting that I didn't
get previously on the same code. In fact the report only happens with
-O2 and not when I remove the -O flags which makes it hard to debug
and makes me suspect it's dependent on exactly which instructions the
code generation decides to access the bytes involved. Afaict the C
code shouldn't be accessing the poisoned bytes.

The report looks like:

init_var_from_num: NUMERIC (short) w=0 d=1 POS 6 bytes: 18 00 00 00 80 80

So it looks to me like -O2 is causing the optimizer to turn the 2-byte
read into a 4-byte read and overrun the allocated object. But I
haven't tried looking at the assembly yet.

Fwiw the assembly is obviously a 4-byte load but identifying the
mapping from the C code reading two of those bytes to the assembly is
a bit beyond my level of familiarity with x86 assembly:

#define NUMERIC_WEIGHT(n) (NUMERIC_HEADER_IS_SHORT((n)) ? \
    (((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_SIGN_MASK ? \
      ~NUMERIC_SHORT_WEIGHT_MASK : 0) \
     > ((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_MASK)) \
    : ((n)->choice.n_long.n_weight))

...
   0x000000000144a6ba <+1050>: mov %r15d,%ecx
   0x000000000144a6bd <+1053>: and $0x7,%ecxn
   0x000000000144a6c0 <+1056>: add $0x3,%ecx
   0x000000000144a6c3 <+1059>: movsbl %al,%eax
   0x000000000144a6c6 <+1062>: cmp %eax,%ecx
   0x000000000144a6c8 <+1064>: jl 0x144a441 <numeric_out+417>
   0x000000000144a6ce <+1070>: mov %r15,%rdi
   0x000000000144a6d1 <+1073>: callq 0x526350 <__asan_report_load4>
==> 0x000000000144a6d6 <+1078>: mov $0x2e02680,%edi
   0x000000000144a6db <+1083>: mov %r14,0x10(%rbx)
   0x000000000144a6df <+1087>: mov %rsi,%r14
   0x000000000144a6e2 <+1090>: callq 0x53b1b0 <__sanitizer_cov()>
   0x000000000144a6e7 <+1095>: mov %r14,%rsi
   0x000000000144a6ea <+1098>: mov 0x10(%rbx),%r14
   0x000000000144a6ee <+1102>: jmpq 0x144a4a1 <numeric_out+513>
   0x000000000144a6f3 <+1107>: mov %edi,%ecx
   0x000000000144a6f5 <+1109>: and $0x7,%ecx
   0x000000000144a6f8 <+1112>: add $0x3,%ecx
   0x000000000144a6fb <+1115>: movsbl %al,%eax
...

So I'm guessing the logic is that the struct as that VLA so the
compiler sees that the struct size is at least two more bytes so it
assumes memory will always be allocated for the whole object and feels
free to reference those extra bytes? In practice this is a false
positive since the object will always be aligned so those two extra
bytes will always be on the same page. We already have another code
site with a similar false positive but it's a much narrower
circumstance. If this can happen anywhere there's a non 4-byte access
then that will remove a lot of the value in asan (fwiw msan doesn't
complain about this)

2 questions:

  • Do you see this with the fresh llvm trunk?
  • Can you prepare a minimized example?

Pretty recent, I updated a couple days ago. I tried to minimize the
attached but at the same time I didn't want to lose too many unions
and casts in case it didn't trigger any more.

$ clang -fsanitize=address -Wall numeric-asan-test.c
$ ./a.out
VARSIZE 6
NDIGITS 0
WEIGHT 0
SIGN 0
DSCALE 0
$ clang -fsanitize=address -O2 -Wall numeric-asan-test.c
$ ./a.out
VARSIZE 6

numeric-asan-test.c (4.36 KB)

Confirmed.

A smaller repro:

typedef union {
short q;
struct {
short x;
short y;
int for_alignment;
} w;
} U;
char *buf = new char[2];
int main() {
buf[0] = buf[1] = 0x0;
U *u = (U *)buf;
return u->q == 0 ? 0 : u->w.y;
}

With O2 asan produces false positive and with -O1/-O0 it does not.
One more case where an optimization is hostile to asan…
Looking.

BTW, this is very similar to https://github.com/google/sanitizers/issues/20 (fixed long ago),
apparently we need to add another check to gvn, similar to the one we already have.
Or figure out why this one does not work…

in lib/Analysis/MemoryDependenceAnalysis.cpp:

346 if (LIOffs + NewLoadByteSize > MemLocEnd &&
347 LI->getParent()->getParent()->hasFnAttribute(
348 Attribute::SanitizeAddress))
349 // We will be reading past the location accessed by the original program.
350 // While this is safe in a regular build, Address Safety analysis tools
351 // may start reporting false warnings. So, don’t do widening.
352 return 0;

Moving to https://llvm.org/bugs/show_bug.cgi?id=25550