Constant folding inttoptr i32 0 to null pointer?

Hello,

It seems that ConstantFoldCastInstruction in ConstantFold.cpp folds inttoptr instruction with 0 as operand to a null pointer. It makes sense, when talking about a C-style frontend, as the C99 spec (6.3.2.3) states:

“An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.”

On the other hand, some architectures use 0 as a valid memory location, and this constant folding seems to be possibly harmful when the code actually tries to access the memory location at address 0.

Is this behavior intentional? Do I miss something? Will a load from address null try to access address 0, or may it become an undef value?

Thanks

Guy

Hello,

It seems that ConstantFoldCastInstruction in ConstantFold.cpp folds
inttoptr instruction with 0 as operand to a null pointer. It makes sense,
when talking about a C-style frontend, as the C99 spec (6.3.2.3) states:

“An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant.”

On the other hand, some architectures use 0 as a valid memory location,
and this constant folding seems to be possibly harmful when the code
actually tries to access the memory location at address 0.

Is this behavior intentional? Do I miss something? Will a load from
address null try to access address 0, or may it become an undef value?

LLVM assumes that the null pointer in address space zero can never be
successfully dereferenced. You must utilize some other address space to
dereference a null pointer.

Thanks David,

It turns out, that the address space I was using was not 0, and yet the pointer was constant folded to null.

Here is the sequence:

Unoptimized code:

define i32 @foo() #0 {

entry:

%address.addr.i = alloca i32, align 4

%value.i = alloca i32, align 4

store i32 0, i32* %address.addr.i, align 4

%0 = load i32* %address.addr.i, align 4

%1 = inttoptr i32 %0 to i32 addrspace(1)*

%std_ld.i = load volatile i32 addrspace(1)* %1

store i32 %std_ld.i, i32* %value.i, align 4

%2 = load i32* %value.i, align 4

ret i32 %2

}

After optimization (early CSE):

define i32 @foo() #0 {

entry:

%std_ld.i = load volatile i32 addrspace(1)* null, align 536870912

ret i32 %std_ld.i

}

The contant folder doesn’t seem to check for address space, it simply checks if the integer in question is zero, and folds the inttoptr to null:

Constant *llvm::ConstantFoldCastInstruction(unsigned opc, Constant *V,

Type *DestTy) {

if (V->isNullValue() && !DestTy->isX86_MMXTy())

return Constant::getNullValue(DestTy);

Is this a bug?

Thanks

Guy

‘load volatile i32 addrspace(1)* null’ seems fine to me. However, it looks like instcombine will turn:

define i32 @foo() {
entry:
%std_ld.i = load volatile i32, i32 addrspace(1)* null
ret i32 %std_ld.i
}

into:

define i32 @foo() {
entry:
%std_ld.i = load volatile i32, i32 addrspace(1)* null, align 536870912
ret i32 %std_ld.i
}

which is not ok.

'load volatile i32 addrspace(1)* null' seems fine to me. However, it
looks like instcombine will turn:
define i32 @foo() {
entry:
  %std_ld.i = load volatile i32, i32 addrspace(1)* null
  ret i32 %std_ld.i
}

into:
define i32 @foo() {
entry:
  %std_ld.i = load volatile i32, i32 addrspace(1)* null, align 536870912
  ret i32 %std_ld.i
}

which is not ok.

On second thought, I think that high alignment is benign. It simply
indicates that your backend is free to load the value as-if it were aligned
to an arbitrary boundary (because it is 1 byte, 2 byte, 4 byte, etc.
aligned).

Well, address 0 is pretty well aligned for whatever alignment we like.

So I understand, that “load volatile i32, i32 addrspace(1)* null” is a valid memory access, accessing address 0?

Thanks

Guy

Thanks David,

It turns out, that the address space I was using was not 0, and yet the
pointer was constant folded to null.

The lang ref is not entirely clear on whether 0 is always equal to the
null pointer. It (at least to me) implies it is:
"
Any memory access must be done through a pointer value associated with
an address range of the memory access, otherwise the behavior is
undefined. Pointer values are associated with address ranges according
to the following rules:
...

A pointer value is associated with the addresses associated with any
value it is based on.
A null pointer in the default address-space is associated with no address.
An integer constant other than zero or a pointer value returned from a
function not defined within LLVM may be associated with address ranges
allocated through mechanisms other than those provided by LLVM. Such
ranges shall not overlap with any ranges of addresses allocated by
mechanisms provided by LLVM.
....

A pointer value formed by an inttoptr is based on all pointer values
that contribute (directly or indirectly) to the computation of the
pointer’s value.

"
(Fun lawyering: This, and the remaining clauses, do not ever define
the behavior of a null pointer constant in the non-default address
space, which means it's behavior is undefined by the first sentence. I
know this is not what is intended, so this should probably be cleaned
up)

Anyway, the above to me implies the inttoptr of the integer constant
zero is the null pointer, because if it isn't, it's a pretty striking
omission to cover everything *but* the integer constant zero ;).

As such, folding it to null should be correct, and not cause wrong
behavior for your program.

As David says, the question of whether it can be dereferenced or not
is separate, specifically to allow the null pointer in non-default
address spaces to do different things.

Do you have a case where it does something wrong?

The contant folder doesn’t seem to check for address space, it simply checks
if the integer in question is zero, and folds the inttoptr to null:

I believe this is correct by the above.

Yes.

Currently it seems to work fine, but as you said, this behavior is not exactly well defined. I would really expect any access to null - no matter in what address space - to be replaced with undef by some optimization pass.

"An integer constant other than zero or a pointer value returned from a function not defined within LLVM may be associated with address ranges allocated through mechanisms other than those provided by LLVM. Such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM."

Doesn't it mean, that the integer constant zero cannot be associated with any kind of memory, including non-default address space memory? Seems that address 0 still has to be handled somehow...

Thanks
        Guy

I agree it's confusing, but for what it's worth, we already do things like
loading from null in address space 257 and 256 to load from [gs:00] and
[fs:00] on x86. It's supposed to work.

Currently it seems to work fine, but as you said, this behavior is not exactly well defined. I would really expect any access to null - no matter in what address space - to be replaced with undef by some optimization pass.

You should not expect this.
It is explicitly known that dereference of null in address space other
than 0 is okay.
Anything that does otherwise is a bug.
The fact that langref says otherwise is a bug in langref :slight_smile:

"An integer constant other than zero or a pointer value returned from a function not defined within LLVM may be associated with address ranges allocated through mechanisms other than those provided by LLVM. Such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM."

Doesn't it mean, that the integer constant zero cannot be associated with any kind of memory, including non-default address space memory? Seems that address 0 still has to be handled somehow...

This part is badly written/was not updated for address space other than zero :slight_smile:

IMHO (and i'm sure david can correct my wording of this), I believe
you should read it as if it said:
"
A null pointer in the default address-space is associated with no address.
A null pointer in the non-default address-space is associated with an
implementation defined address."

Then, because integer constant zero is the null pointer, you have no issue.