nsw/nuw for trunc

Hi everyone,

we'd like to be able to check for loss of information in trunc operations in
our LLVM-based bounded model checker [1]. For this it is important if the
trunc was on a signed or unsigned integer, so we need nsw and nuw flags for
this. Would you accept a patch that adds these flags to LLVM (and possibly
clang)?

Regards,
Florian

[1] http://baldur.iti.uka.de/llbmc/

Hi Florian,

we'd like to be able to check for loss of information in trunc operations in
our LLVM-based bounded model checker [1]. For this it is important if the
trunc was on a signed or unsigned integer, so we need nsw and nuw flags for
this. Would you accept a patch that adds these flags to LLVM (and possibly
clang)?

nsw/nuw don't mean signed/unsigned arithmetic. They mean that signed/unsigned
overflow in the operation results in undefined behaviour. As far as I know,
truncating a large signed value to a too small signed integer type does not
result in undefined behaviour. For example, the result of (signed char)999
is perfectly well defined. So it seems to me that nsw/nuw on truncate (which
is what this cast turns into in LLVM) don't make any sense. Also, a truncate
operation doesn't need to be signed or unsigned, since the operation performed
is exactly the same (the same set of input bits -> the same set of output bits)
regardless of the sign of the original types.

Ciao, Duncan.

Hi Duncan,

Hi Florian,

> we'd like to be able to check for loss of information in trunc operations
> in our LLVM-based bounded model checker [1]. For this it is important if
> the trunc was on a signed or unsigned integer, so we need nsw and nuw
> flags for this. Would you accept a patch that adds these flags to LLVM
> (and possibly clang)?

nsw/nuw don't mean signed/unsigned arithmetic. They mean that
signed/unsigned overflow in the operation results in undefined behaviour.
As far as I know, truncating a large signed value to a too small signed
integer type does not result in undefined behaviour. For example, the
result of (signed char)999 is perfectly well defined. So it seems to me
that nsw/nuw on truncate (which is what this cast turns into in LLVM)
don't make any sense. Also, a truncate operation doesn't need to be
signed or unsigned, since the operation performed is exactly the same (the
same set of input bits -> the same set of output bits) regardless of the
sign of the original types.

You're perfectly right. But we'd like to check if the number represented is
still the same after truncating, or if the value of the number changed due to
the truncation.

For example let's consider truncating from 16bit to 8bit:

Truncating 0x0180 (384 in decimal) to 8bit results in 0x80 (128 in decimal).
The value changed so we'd like to report this to the user of our model
checker. This is easy to do so far, no problems there. It get's more
complicated if we consider signed integers:

0x0080 (128 in decimal) would be truncated to 0x80. If we interpret this as an
unsigned integer, this is still 128 in decimal and no information is lost.
But if we interpret it as signed, we turned a positive number into a negative
number (-128), and therefore the value was not preserved. So in order to know
if this instruction can have this kind of overflow, we need to know if the
result of the truncation is supposed to be interpreted as signed or unsigned.

If we had nsw and nuw flags for truncations we'd know when to check for this
kind of overflow and when not. The compiler likely doesn't need these flags and
can still ignore them, for us they would be useful.

I hope this explanation makes our intention more clear.

Regards,
Florian

I have a requirement where I need to get metadata on particular IR Instructions to survive instruction selection and find their way on to the MachineInstrs. In particular, I am interested in labeling Instructions for which I know there is a one to one mapping of Instruction to MachineInstr, however, if this is was not the case an approximation would be alright.

The problem I am trying to solve is one of calculating worst case execution time at source level. By labeling I/O operations on the source a later analysis program can analyze the generated binary to determine WCET for given paths between these labels.

What would be the best approach to solving this problem, thanks for any input.

Andrew

Duncan's point is that this is totally different from the semantics of
nsw/nuw on other instructions, which means it would be
inappropriate to reuse nsw/nuw. That's especially true because
knowing that a trunc is nsw/nuw would be very useful to a lot of
clients — the only problem is that nobody can think of any languages
or situations where truncations actually have undefined behavior
semantics.

You should probably just modify your frontend to emit explicit range
checks in the cases you want.

John.

> If we had nsw and nuw flags for truncations we'd know when to check for
> this kind of overflow and when not. The compiler likely doesn't need
> these flags and can still ignore them, for us they would be useful.

Duncan's point is that this is totally different from the semantics of
nsw/nuw on other instructions, which means it would be
inappropriate to reuse nsw/nuw. That's especially true because
knowing that a trunc is nsw/nuw would be very useful to a lot of
clients — the only problem is that nobody can think of any languages
or situations where truncations actually have undefined behavior
semantics.

Ok, then let's not call it nsw/nuw but something different, would that be
acceptable? The information would be useful at least for software verification
and static analysis, though admittedly not for the compiler itself. Our tool
could spit out a warning, that the value has changed because of the
truncation, which could be valuable information for the programmer. Also,
adding such a flag would likely not hurt the compiler, would it?

Btw I think it's quite similar to the addition of nsw/nuw to shl, with the
exception that that one actually allows some new optimizations, as afar as I
know.

You should probably just modify your frontend to emit explicit range
checks in the cases you want.

That's certainly possible, but I guess it's even harder to push such a change
into clang, than it is to push the additional of a flag for an instruction,
which can be ignored by all passes, into llvm.

Regards,
Florian

In contrast to the other folks, I think that this makes perfect sense, and NSW/NUW are the right thing to use.

It would be valid for instcombine to delete "sext(trunc_nsw x)" when the source and dest types are the same, similarly "zext(trunc_nuw x)".

The semantics are very similar to the left shift operator and many others.

-Chris

If we had nsw and nuw flags for truncations we'd know when to check for
this kind of overflow and when not. The compiler likely doesn't need
these flags and can still ignore them, for us they would be useful.

Duncan's point is that this is totally different from the semantics of
nsw/nuw on other instructions, which means it would be
inappropriate to reuse nsw/nuw. That's especially true because
knowing that a trunc is nsw/nuw would be very useful to a lot of
clients — the only problem is that nobody can think of any languages
or situations where truncations actually have undefined behavior
semantics.

Ok, then let's not call it nsw/nuw but something different, would that be
acceptable?

I can't imagine this going into the core IR. You should either use
metadata or explicit control flow in the frontend.

Btw I think it's quite similar to the addition of nsw/nuw to shl, with the
exception that that one actually allows some new optimizations, as afar as I
know.

nsw on shl satisfies the two crucial properties of (1) actually providing
some generic value, instead of just being a way to communicate to
your specific out-of-tree analysis, and (2) actually encoding the semantics
of a real operation, to wit, the fact that it's undefined behavior if << on C's
signed integer types causes overflow.

You should probably just modify your frontend to emit explicit range
checks in the cases you want.

That's certainly possible, but I guess it's even harder to push such a change
into clang, than it is to push the additional of a flag for an instruction,
which can be ignored by all passes, into llvm.

Clang is actually somewhat laxer about taking extensions than LLVM is
about modifying its core IR. That said, no, you would need to maintain
this out of tree.

John.

I agree that there's some abstract consistency in providing nsw/nuw
for trunc, even if no current frontends would ever emit it, but I do not
get the impression that Florian actually wants the optimizer making
random transformations based on the assumed undefined behavior
of these truncations.

John.

For clarity, I'm saying that it would make sense for LLVM IR to support this concept (instcombine could infer it for example), but I don't think it makes sense for clang to generate it.

-Chris

I can think of a use for it in my project if it were in the LLVM IR. It would be nice if LLVM optimizers could understand this feature. I would encourage you to consider accepting patches that implemented this improvement.

Hi Duncan,

sorry for digging out such an old thread. You stated that
'(signed char) 999' is perfectly well defined in C/C++. I just looked into the C++ standard [1] and could not find this. The section that seems to apply is:

It does apply: the value is implementation-defined, and the definition
that any sane implementation uses is 2's complement truncation.

-Eli

   3) If the destination type is signed, the value is unchanged if it
      can be represented in the destination type (and bit-field
      width); otherwise, the value is implementation-defined.
----------------------------------------------------

4.7.3 suggest to me, that the standard does not define a result for
'(signed char) 999'. I assume you know this section, but I could not
find a reason why this section should not apply in this case. Any ideas?

It does apply: the value is implementation-defined, and the definition
that any sane implementation uses is 2's complement truncation.

Right, implementation defined is not the same as undefined.

Ciao, Duncan.

Thanks for your help. I found that the undefined behavior on integer overflow, is specified in 5 (5):

"If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable
values for its type, the behavior is undefined, unless such an expression is a constant expressionappears where an integral
constant expression is required (5.19)"

This seems to apply to mathematical expressions, but not to explicit/implicit type conversions.

Hence, code like

int index = a + 100;
A[index] = ...

may result on 64bit architectures in a sext(trunc(index)) expression, that can be optimized out easily. Very interesting.

Cheers and thanks again for your help
Tobi

I ment '... that can not be ...'.

Also, some interesting detail I found. Integer to floating point conversion may actually yield undefined behavior as defined at C99:

6.3.1.4 Real floating and integer

1 When a finite value of real floating type is converted to an integer
   type other than _Bool, the fractional part is discarded (i.e., the
   value is truncated toward zero). If the value of the integral part
   cannot be represented by the integer type, the behavior is
   undefined.50)

Cheers
Tobi