I'm implementing an integer bounds analysis (i.e., for each variable

compute its lower and upper bound) as a LLVM pass. My objective is to

identify potential overflows in both signed and unsigned C integers.

To do that I need to know the sign of each integer. However, we know

that LLVM IR does not keep sign information explicitly.

My understanding is that the only instructions with explicit sign

information is ICmp (e.g., icmp ugt ....) for the case of

integers. Thus, sign information may be reconstructred from icmp

instructions. On the other hand, arithmetic instructions as Add, Sub,

and Mul do not have this information. (I'm aware of the "nsw" and

"nuw" flags but they are optional.) As a result, it's possible to have

a straight-line piece of code (without icmp instructions) where no

sign information can be inferred but potential overflows can happen

without knowing whether the integer is signed or not.

Recently, somebody posted:

Some time ago I posted here a couple times about integer overflow

checking. Since then we (at Utah) joined forces with Vikram Adve and

his student Will Dietz who are also looking at integer issues. Between

the four of us we've done a lot of looking at and thinking about integer

overflow in C/C++, and have written up a paper containing some data and

observations. Perhaps it'll be interesting to people here:

In this work, they implement a dynamic analysis for detecting integer

overflows in the front-end (using clang). I belive they moved to the

front-end rather than implementing a LLVM pass precisely due to, among

others, the sign problem I mention.

Is there a quick solution to infer sign information without moving to

the front-end? Maybe encoding some metadata?

Thanks!

Jorge