Hi,
I was looking into the semantics of GEP inbounds and some BasicAA rules and I'm wondering if it's valid in LLVM IR to allocate more than half of the address space with a global variable or an alloca.
If that's a scenario want to consider, then we have problems
Consider this C code (32 bits):
#include <string.h>
char obj[0x80000008];
char f() {
char *p = obj + 0x79999999;
char *q = obj + 0x80000000;
*q = 1;
memcpy(p, "abcd", 4);
return *q;
}
Clearly the stores alias, and the memcpy should override the value written by "*q = 1".
I dunno if this is legal in C or not, but the IR produced by clang looks like (32 bits):
@obj = common global [2147483656 x i8] zeroinitializer, align 1
define signext i8 @f() {
store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), i32 -2147483648), align 1
call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465), i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i32 4, i32 1, i1 false)
%1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), i32 -2147483648), align 1
ret i8 %1
}
With -O2, the store to q gets forwarded, and so we get "ret i8 1".
So, BasicAA concluded that p and q don't alias. The culprit is an overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
So my question is do we care about this use case where a single allocation can take more than half of the address space?
Thanks,
Nuno