"icmp sgt" when it should be "ugt" ?

Hello,

while writing a new LLVM backend I have observed that in some cases the
optimizer produces an "icmp sgt i32 %a, 0" where I would have expected an
"icmp ugt i32 %a, 0".

For example when I feed "opt -O3 -S ..." (LLVM 2.9, Windows) with

Icmp sgt is correct. Note that "ugt x, 0" is the same as "x != 0" which is not what you want.

-Chris

Hi Chris,

Icmp sgt is correct.

while ugt would be wrong, I think sgt is too!

For example, suppose %buf is 0 and %bufLen is ~0U. Then %add.ptr is ~0U, and
%cmp is true, so control branches to %if.then. However in the optimized version
%cmp is false and control branches to %if.end.

The GEP does have an inbounds attribute, I'm not sure if that is relevant here.

Ciao, Duncan.

   Note that "ugt x, 0" is the same as "x != 0" which is not what you want.

It is relevant: in your proposed scenario, the GEP returns undef.

-Eli

Hi Eli,

Icmp sgt is correct.

while ugt would be wrong, I think sgt is too!

For example, suppose %buf is 0 and %bufLen is ~0U. Then %add.ptr is ~0U, and
%cmp is true, so control branches to %if.then. However in the optimized version
%cmp is false and control branches to %if.end.

The GEP does have an inbounds attribute, I'm not sure if that is relevant here.

It is relevant: in your proposed scenario, the GEP returns undef.

by the way, is GEP arithmetic supposed to be signed or unsigned?

The LangRef says: "...if any of the addresses that would be formed by
successive addition of the offsets implied by the indices to the base address
with infinitely precise arithmetic are not an in bounds address of that
allocated object". But it doesn't say how the address (i.e. a number in the
ring of integers mod 2^32) gets represented as an integer (presumably what the
"infinitely precise arithmetic" refers too).

For example, consider the address ~0U. This could be represented by any of the
integers: ...-1 - 2^32, -1, 2^32 - 1 , 2*2^32 - 1, ...

If you choose -1, and also have an offset of -1, then the sum is -2 which may
well still be inside your object. If you choose 2^32-1, and also 2^32-1 for
the offset, then the sum is 2*2^32-2, which presumably is not considered to be
inside your object (it would be if you reduced modulo 2^32, but then there
would be no point in using infinite precision arithmetic, so I suppose that that
is not what is intended).

Ciao, Duncan.

Signed; is that really not stated anywhere in LangRef?

-Eli

It's a mix of both.

Indices are signed, but the other operand is a pointer, and pointers
can be interpreted in a variety of ways. On one hand, pointers are
unsigned, since SIZE_MAX/2 and SIZE_MAX/2+1 are contiguous addresses,
while SIZE_MAX and SIZE_MAX+1 are not, because address 0 is special.
On the other hand, it's generally considered to be impossible to
allocate more than half the address space to a single object, so it
often works to analyze them as if they were signed.

Dan

Hi all,

thanks for your quick and helpful replies. Pointing me/us to the
"inbounds" has solved my confusion, as I originally had the exactly the
same concerns as Duncan.

Eli: I just rescanned the documentation of "getelementptr" and it really
doesn't say anything about (un)signedness of the indeces. The only
reference I've found is in "What happens if a GEP computation overflows?"
of the FAQ. So, maybe a short note in the reference could be useful.

Duncan: in the documentation of "ptrtoint" there is an implicit hint that
pointers are unsigned: "If value is smaller than ty2 then a zero extension
is done". Additionally, the above mentioned entry in the FAQ sheds a
little more light on it.

Cheers,
Jonas