# Dependence Analysis [was: Flow-Sensitive AA]

However, there is one issue I have ignored - possibility of overflow in
the index expression. Suppose, we have such a loop:
for (i8 i = 0; i != 200; ++i) {
A[2 * i + 5] = ...
... = A[2 * i + 3]
}
If both index expressions are evaluated in 8-bit arithmetic,
then the dependence equation should be solved in modular arithmetic:
2 * i + 5 == 2 * (i + delta) + 3 (mod 256)
So the dependence distance is delta == 1 or delta == -127 what gives
two direction vectors: (<), (>). My version will produce only the first
one. Taking into account overflow issue during dependence analysis seems
very difficult to me. Does anybody have some experience in this area?
Fortunately, programmers generally do not write programs in such a
nasty way.

I asked myself the same question. Without mod, how do you ensure that for instance the expression 2*i+255 was not actually 2*i-1 ?

In the general case, I think you have to be conservative about this because programmers may deliberately want this kind of "wraparound" behavior, e.g., with periodic boundary conditions. But 99.9% of programs probably don't need that so it would be bad to penalize them for this corner case. In such a situation, I think you just have to support both choices, but choose the default as sensibly as possible. I discussed this with Matthieu and this is what we came up with:

1. (Eventually) Support both choices: conservative or optimistic. Conservative means you assume that index arithmetic can wrap around; optimistic assumes it cannot. What I think you've implemented is the optimistic case.

2. Have a default choice but give the programmer an option to force this on or off.

3. (Matthieu's idea) Default to optimistic if the index expressions are i32, but conservative if i8 or i16. This assumes that there is no good reason to use i8 or i16 index variables except for wraparound behavior (and it is highly unlikely that programmers are using arrays of size 2^32 *and* requiring wraparound behavior for that).

Of course, for now, we don't have to implement the conservative version: what you've done should be good enough.

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign

because programmers may deliberately want this kind of "wraparound"
behavior, e.g., with periodic boundary conditions. But 99.9% of
programs probably don't need that so it would be bad to penalize them
for this corner case. In such a situation, I think you just have to
support both choices, but choose the default as sensibly as possible.
I discussed this with Matthieu and this is what we came up with:

C has a way to express this: signed integers are defined to never overflow, unsigned integers are defined to wrap gracefully on overflow. Other languages that target this may want some sort of command line option to control the behavior or the language may define it one way or the other.

We don't model this in LLVM IR, but this is a desired feature if anyone is interested. Capturing this in the IR is a better way to go than having the optimizers have a big global knob.

-Chris

because programmers may deliberately want this kind of "wraparound"
behavior, e.g., with periodic boundary conditions. But 99.9% of
programs probably don't need that so it would be bad to penalize them
for this corner case. In such a situation, I think you just have to
support both choices, but choose the default as sensibly as possible.
I discussed this with Matthieu and this is what we came up with:

C has a way to express this: signed integers are defined to never
overflow,

More precisely, signed integer overflow is undefined behavior, which gives the compiler great latitude.
Assuming this will never happen and doing optimizations on that basis is valid, but so are other things.
An easy implementation, which often matches user expectations because it is what most hardware does,
is to treat signed and unsigned overflow alike, with clean wraparound.

because programmers may deliberately want this kind of "wraparound"
behavior, e.g., with periodic boundary conditions. But 99.9% of
programs probably don't need that so it would be bad to penalize them
for this corner case. In such a situation, I think you just have to
support both choices, but choose the default as sensibly as possible.
I discussed this with Matthieu and this is what we came up with:

C has a way to express this: signed integers are defined to never
overflow,

Ok, I thought they both had the same wraparound behavior: thanks for clarifying that.

More precisely, signed integer overflow is undefined behavior, which

gives the compiler great latitude.
Assuming this will never happen and doing optimizations on that basis
is valid, but so are other things.
An easy implementation, which often matches user expectations because
it is what most hardware does,
is to treat signed and unsigned overflow alike, with clean wraparound.

That makes sense. But we would have to distinguish them during dependence analysis at least, if we want to use the solution Chris suggested. Which brings me to a question:

unsigned integers are defined to wrap gracefully on
overflow. Other languages that target this may want some sort of
command line option to control the behavior or the language may define
it one way or the other.

We don't model this in LLVM IR, but this is a desired feature if
anyone is interested. Capturing this in the IR is a better way to go
than having the optimizers have a big global knob.

But signed and unsigned values are not distinguished any more in the LLVM IR, and I don't think you're suggesting we restore unsigned types. What do you have in mind here?

--Vikram

And gcc has yet more fun in it:

-fstrict-overflow
Allow the compiler to assume strict signed overflow rules, depending on the language
being compiled. For C (and C++) this means that overflow when doing arithmetic with
signed numbers is undefined, which means that the compiler may assume that it will
not happen. This permits various optimizations. For example, the compiler will
assume that an expression like "i + 10 > i" will always be true for signed "i". This
assumption is only valid if signed overflow is undefined, as the expression is false
if "i + 10" overflows when using twos complement arithmetic. When this option is in
effect any attempt to determine whether an operation on signed numbers will overflow
must be written carefully to not actually involve overflow.

See also the -fwrapv option. Using -fwrapv means that signed overflow is fully
defined: it wraps. When -fwrapv is used, there is no difference between
-fstrict-overflow and -fno-strict-overflow. With -fwrapv certain types of overflow
are permitted. For example, if the compiler gets an overflow when doing arithmetic
on constants, the overflowed value can still be used with -fwrapv, but not otherwise.

The -fstrict-overflow option is enabled at levels -O2, -O3, -Os.

-ftrapv
This option generates traps for signed overflow on addition, subtraction,
multiplication operations.

-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of
addition, subtraction and multiplication wraps around using twos-complement
representation. This flag enables some optimizations and disables others. This
option is enabled by default for the Java front-end, as required by the Java language
specification.

Thanks! This is all very interesting, and tells me that LLVM has a way to go to fully support all of these capabilities (if that is the right thing to do, which isn't clear). OTOH, it looks like a lot of real-world software that is using LLVM already doesn't seem to be affected by the lack of them.

Does anyone know of any C/C++ programs that require integer overflow on signed arithmetic (even though it is not strictly allowed by the standard)?

Also, does anyone know how -ftrapv is implemented on processors that don't have hardware detection of integer overflow? It sounds very expensive to do entirely in software.

Thanks,

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign

In other words, our current implementation is correct. However, we are missing the opportunity to optimize some things. Trivial examples include some cases where you can't compute a simple loop count due to potential overflow.

-Chris

Yes, see the unending discussion on the gcc mailing list about
programs we are breaking that led to the introduction of this option.

My serious suggestion would be to tell any users who require this flag
to shove it.

(Otherwise you end up in a very dark place because there are a lot of
optimizations that make assumptions about overflow when rearranging
expressions, etc).

This should be an attribute on the underlying operation. For example, an i32 add should be tagged as:

1) being defined to wrap on overflow
2) being undefined on signed overflow
3) being defined to saturate on signed overflow
4) being undefined on unsigned overflow
5) being defined to saturate on unsigned overflow
etc.

-Chris

Thanks! This is all very interesting, and tells me that LLVM has a
way to go to fully support all of these capabilities (if that is the
right thing to do, which isn't clear). OTOH, it looks like a lot of
real-world software that is using LLVM already doesn't seem to be
affected by the lack of them.

Does anyone know of any C/C++ programs that require integer overflow
on signed arithmetic (even though it is not strictly allowed by the
standard)?

Yes, it is a common bug to depend on this. See (for example):

Also, does anyone know how -ftrapv is implemented on processors that
don't have hardware detection of integer overflow? It sounds very
expensive to do entirely in software.

GCC's -ftrapv option is not fully implemented on any target, so it is not really dependable. The Ada frontend (which requires trap on overflow in some cases) for example implement their own range checks instead of relying on GCC's ftrapv support.

-Chris

IBM (and Sun, but not Intel, IIRC) went even further, and at O3 or
above actually assume *unsigned* integers don't overflow or wraparound
for loops iv's.
they did this because it caused something like a 50% performance
regression on a bunch of serious applications when they were compiling
code that used 32 bit types as index variables on 64 bit machines (and
the resulting sign extensions, etc).

See /qstrict_induction

Personally i think this goes too far, but it would be nice if you
could tell a user "btw, we can't optimize this loop because we are
assuming your index variables can overflow"

Thanks! This is all very interesting, and tells me that LLVM has a
way to go to fully support all of these capabilities (if that is the
right thing to do, which isn't clear). OTOH, it looks like a lot of
real-world software that is using LLVM already doesn't seem to be
affected by the lack of them.

Does anyone know of any C/C++ programs that require integer overflow
on signed arithmetic (even though it is not strictly allowed by the
standard)?

Yes, see the unending discussion on the gcc mailing list about
programs we are breaking that led to the introduction of this option.

My serious suggestion would be to tell any users who require this flag
to shove it.

That sounds good to me! In any case, for dependence analysis and parallel code generation at least, we have many more fish to fry before this even becomes an issue.

(Otherwise you end up in a very dark place because there are a lot of
optimizations that make assumptions about overflow when rearranging
expressions, etc).

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign

> Thanks! This is all very interesting, and tells me that LLVM has a
> way to go to fully support all of these capabilities (if that is the
> right thing to do, which isn't clear). OTOH, it looks like a lot of
> real-world software that is using LLVM already doesn't seem to be
> affected by the lack of them.
>
> Does anyone know of any C/C++ programs that require integer overflow
> on signed arithmetic (even though it is not strictly allowed by the
> standard)?

Yes, see the unending discussion on the gcc mailing list about
programs we are breaking that led to the introduction of this option.

My serious suggestion would be to tell any users who require this flag
to shove it.

Right on!

(Otherwise you end up in a very dark place because there are a lot of
optimizations that make assumptions about overflow when rearranging
expressions, etc).

Exactly right. One of my first jobs here was to fix a bunch of overflow
problems exposed by optimization. It's now become a tradition to
give this task to any new optimizer employee.

-Dave

Thanks! This is all very interesting, and tells me that LLVM has a
way to go to fully support all of these capabilities (if that is the
right thing to do, which isn't clear). OTOH, it looks like a lot of
real-world software that is using LLVM already doesn't seem to be
affected by the lack of them.

LLVM's current choice is safe for all applications. The trapping behavior would be a really nice addition, though.

Has anyone quantified the optimizations afforded by undefined signed overflow? I'd expect that the benefits are minimal for most codes. On the other hand I've seen reports that gcc's -fwrapv hurts performance of gcc output significantly. I'm not sure if that is true. Also, it could be the case that -fwrapv is implemented poorly.

Does anyone know of any C/C++ programs that require integer overflow
on signed arithmetic (even though it is not strictly allowed by the
standard)?

I've seen embedded programs that do this.

Of course, the C standard is clear: these programs are wrong.

John

In most cases, I agree. But for codes that depend heavily on dependence analysis, I would think that being conservative with index expressions would really kill any disambiguation capability and make many loop optimizations and other dependence-based optimizations much weaker. For example, static scheduling of array intensive loops seems vulnerable to this.

--Vikram

But that sounds like Cray is being fairly conservative in treating overflow problems as errors during optimization. Is that correct? That's more or less the opposite of what Dan Berlin was suggesting.

This is not just idle curiosity -- I'm trying to get an understanding of what real-world compilers are actually doing on this issue.

--Vikram

No, it's not implemented badly. We've quantified it's performance
before and it hurts about 3-5 without high level loop
opts/vectorization on. With them on, they do roughly nothing in the
presence of -fwrapv because you can't determine bounds of any non
trivial loop.

take a simple loop

void foo(int a)
{
for (int i = 0; i < a; i += 2)
{
}
}
If you assume signed overflow is undefined, this loop iterates at max,
a/2 times.
If you assume it is defined to wraparound, it can iterate forever
(consider a == INT_MAX)

3-5%, sorry.

There are many 'instcombine' level optimizations that become safe when overflow can't happen, e.g. x*2/2 -> x.

However, by far the most important case is loop induction variables for realistic code.

-Chris