I'm implementing an integer bounds analysis (i.e., for each variable
compute its lower and upper bound) as a LLVM pass. My objective is to
identify potential overflows in both signed and unsigned C integers.
To do that I need to know the sign of each integer. However, we know
that LLVM IR does not keep sign information explicitly.
My understanding is that the only instructions with explicit sign
information is ICmp (e.g., icmp ugt ....) for the case of
integers. Thus, sign information may be reconstructred from icmp
instructions. On the other hand, arithmetic instructions as Add, Sub,
and Mul do not have this information. (I'm aware of the "nsw" and
"nuw" flags but they are optional.) As a result, it's possible to have
a straight-line piece of code (without icmp instructions) where no
sign information can be inferred but potential overflows can happen
without knowing whether the integer is signed or not.
Recently, somebody posted:
Some time ago I posted here a couple times about integer overflow
checking. Since then we (at Utah) joined forces with Vikram Adve and
his student Will Dietz who are also looking at integer issues. Between
the four of us we've done a lot of looking at and thinking about integer
overflow in C/C++, and have written up a paper containing some data and
observations. Perhaps it'll be interesting to people here:
In this work, they implement a dynamic analysis for detecting integer
overflows in the front-end (using clang). I belive they moved to the
front-end rather than implementing a LLVM pass precisely due to, among
others, the sign problem I mention.
Is there a quick solution to infer sign information without moving to
the front-end? Maybe encoding some metadata?