Integer divide by zero

Hey guys,

I’m learning that LLVM does not preserve faults during constant folding. I realize that this is an architecture dependent problem, but I’m not sure if it’s safe to constant fold away a fault on x86-64.

A little testcase:

#include <stdio.h>

int foo(int j, int d) {
return j / d ;
}

int bar (int k, int d) {
return foo(k + 1, d);
}

int main( void ) {
int r = bar(5, 0);
printf( " r = %d\n", r);
}

At execution, on an x86-64 processor, this optimized code should fault with a divide-by-zero. At -O2, LLVM 3.1 folds the divide by zero into a constant load. This is, of course, undefined by the C standard, so the folding makes sense on that front. But on x86-64, I don’t think this is okay. On x86-64, an integer divide by zero is non-maskable, meaning it will always trap and has no result of any type.

My first question is, what’s the rationale behind not preserving faults on constant integer folding? Second, is this a bug? With my current understanding, this is most definitely a bug for me on x86-64. But, I’m wondering if there is a hole in my understanding.

Thanks in advance,
Cameron

Per C and C++, integer division by 0 is undefined. That means, if it happens, the compiler is free to do whatever it wants. It is perfectly legal for LLVM to define r to be, say, 42 in this code; it is not required to preserve the fact that the idiv instruction on x86 and x86-64 will trap.

Undefined behaviour aside, would be good to have a warning, I think. Does
the static analyser have something like it?

cheers,
--renato

...

Per C and C++, integer division by 0 is undefined. That means, if it
happens, the compiler is free to do whatever it wants. It is perfectly
legal for LLVM to define r to be, say, 42 in this code; it is not required
to preserve the fact that the idiv instruction on x86 and x86-64 will trap.

This is quite a conundrum to me. Yes, I agree with you on the C/C++
Standards interpretation. However, the x86-64 expectations are orthogonal.
I find that other compilers, including GCC, will trap by default at high
optimization levels on x86-64 for this test case. Hardly scientific, but
every other compiler on our machines issues a trap.

We take safety seriously in our compiler. Our other components go to great
lengths not to constant fold an integer division by zero. We would also
like LLVM to do the same. Are there others in the community who feel this
way?

I can envision an option which preserves faults during constant folding.
E.g. an integer version of -enable-unsafe-fp-math or gfortran's -ffpe-trap.
More accurately, this hypothetical option would suppress the folding of
unsafe integer expressions altogether. Would an option such as this benefit
the community at large?

To be complete, I've also explored the idea of generating a
__builtin_trap() call for such expressions before the IR level. However, I
have not yet convinced myself that this will generate the same fault as the
actual sdiv/udiv instruction would. Things to do.

-Cameron

Clang’s -fsanitize=integer-divide-by-zero option seems to provide what you want. See the Clang User Manual: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

–Owen

...

Clang's -fsanitize=integer-divide-by-zero option seems to provide what you
want. See the Clang User Manual:
http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

That looks pretty close. I assume there are performance penalties for the
runtime checks?

I would really like to produce the actual sdiv/udiv instruction and allow
the processor to fault on it. That seems like the right fix to me.

-Cameron

The platform is irrelevant; division by zero is undefined, and compilers
are allowed to optimize as if it can't happen. Both gcc and clang will lift
the division out of this loop, for example, causing the hardware trap to
happen in the "wrong" place if bar is passed zero:

void foo(int x, int y);

void bar(int x) {
for (int i = 0; i < 100; ++i) {
foo(i, 200/x);
}
}

If your program relies on the exact semantics of the 'idiv'/'udiv' hardware
instruction, you might want to use an asm volatile statement.

-Joe

Understood.

I don’t mean to nag, but I’m not arguing which is [more] correct: the Standards or the platform. Both have their own merits. At least to me they both have merits. The choice should be up to the compiler implementor. And, I would like to see LLVM make that choice easy for the compiler implementor. Maybe an anachronism, but there is some history of this option being built-in to other major compilers: GNU, Sun, HP, IBM, SGI, Compaq, etc.

But, if others can not (or will not) benefit from this option, I will not force my individual needs on the whole.

I’m less concerned about “where” the trap happens as I am about “if” it happens. For example, a Fortran program with division-by-zero is, by the Standard, non-conforming. Pragmatically, not a Fortran program. Rather than wrong answers, I would like to see a hard error indicating that a program is non-conforming to the Standard.

-Cameron

As I’ve pointed out, clang does provide such functionality as an opt-in feature through its -fsanitize options. A hypothetical Fortran frontend could do the same, and even make it an opt-out feature if it chose. I’m sorry if its implementation mechanism doesn’t match exactly what you want it to be, but it’s not like nobody else has thought about this problem. They have, and they’ve designed and shipped a solution!

Side note: even if the -fsanitize option introduces a branch around the division (which I haven’t verified), it’s quite unlikely to cause a significant performance regression. The branch to the error block should be perfectly predicted on any CPU made in the last 25 years.

–Owen

I'm also not fully happy with LLVM's behavior here. There is another
undefined case too, which is the minimum integer divided by -1. In
Julia I can get "random" answers by doing:

sdiv_int(-9223372036854775808, -1)

87106304

sdiv_int(-9223372036854775808, -1)

87108096

In other contexts where the arguments are not constant, this typically
gives an FPE trap.
More than insisting on a particular behavior, I'd like it to be
consistent. I know the result is undefined, so LLVM's behavior here is
valid, but I find that to be an overly lawyerly interpretation.
Presumably the optimizer benefits from taking advantage of the
undefined behavior, but to get a consistent result you need to check
for both zero and this case, which is an awful lot of checks. Yes they
will branch predict well, but this still can't be good, for code size
if nothing else. How much performance can you really get by constant
folding -9223372036854775808/-1 to an unspecified value?

Hey Owen,

Thanks for your reply. I definitely understand your perspective. Hopefully
I can change it. }:slight_smile:

...

As I've pointed out, clang does provide such functionality as an opt-in
feature through its -fsanitize options. A hypothetical Fortran frontend
could do the same, and even make it an opt-out feature if it chose. I'm
sorry if its implementation mechanism doesn't match exactly what you want
it to be, but it's not like nobody else has thought about this problem.
They have, and they've designed and shipped a solution!

Side note: even if the -fsanitize option introduces a branch around the
division (which I haven't verified), it's quite unlikely to cause a
significant performance regression. The branch to the error block should
be perfectly predicted on any CPU made in the last 25 years.

Even on a Xeon Phi? Just kidding around!

All kidding aside, an instruction is an instruction. We have the
functionality in hardware for free; it would be a shame not to use it. I
should also mention that for our team, performance is job #1. Extra
instructions, however insignificant, don't go over well without solid
justification.

Please don't misunderstand, I'm not trying to be unreasonable about this
behaviour change. I do believe that there are a few strong arguments for
this behaviour that we haven't touched upon yet. I also believe that this
option would be an improvement to LLVM overall. But, I also haven't heard
anyone in the community voicing interest. So, if no one speaks up and would
like to discuss further, I'll be happy to keep such a change in my local
branch.

-Cameron

A compromise might be to add an llvm.div.or.trap intrinsic, similar to the llvm.add.with.overflow intrinsics used to implement Clang’s optional integer overflow checks, which could be emitted efficiently on x86 or other platforms where division by zero or INT_MAX/-1 traps natively.

-Joe

Constant folding undefined expressions is sort of silly, but I appreciate that it makes undefined behavior problems in frontends immediately apparent with trivial cases before they creep up on you in more complicated optimized code. After all, even if the backend makes practical concessions to trivial cases, the underlying semantic problem is still there and will bite you eventually. For high-level languages like Julia that want to provide efficiency but also give defined behavior to all user-exposed cases, I think adding an LLVM intrinsic to represent division that’s defined to trap on division by zero or overflow would be the best solution. Since the trap is a side effect, it would stifle optimization of code that used the intrinsic, but the intrinsic could still be lowered to a single hardware instruction and avoid the branching and code bloat of manual checks on hardware where division by zero natively traps.

-Joe

The IBM XLC compiler has a -qcheck option, which takes arguments specifying what type of runtime checks are performed. For example, -qcheck=divzero would cause the program to abort with a SIGTRAP if a division by zero was detected (the compiler would insert a conditional trap instruction).

Producing a diagnostic message is one way to address this[1], but generating some sort of an exception has its merits as well---it all depends on the specific situation. I don't think that the existence of one precludes the other.

[1] I've never used fsanitize---this is based on what I've read in the clang's manual.

-Krzysztof

A division intrinsic with defined behavior on all arguments would be
awesome! Thanks for considering this.

'Tis a good compromise. If there are no objections/concerns, I would like
to move forward with it.

Thanks, Joe!

-Cameron

Hi Cameron,

Hey Duncan,

...

can't front-ends implement this themselves using the "select" constant
expression? For example, rather than dividing by "x" you can divide by
"x == 0 ? 1 : x".

I'm not sure that I follow. I would like to see the division by zero
execute and trap at runtime.

For example, x = 2/0. It looks like your suggestion would generate the
expression 2/1, which wouldn't trap as expected.

Am I misinterpreting your suggestion?

Thanks,
Cameron

P.S. Is it your intention to hide the simple constant expression from the
constant folder? I.e. (x == 0) ? 0 : x.

If you can find a way to implement -fsanitize=undefined to use the FP
trap instead of a branch when checking floating division by 0, I
suspect such a patch would stand a good chance of being accepted.
(Although I'm not intimately familiar with the implementation, so
there could be some subtlety that would prevent it.) You might need to
do this in the processor-specific backend to avoid other
undefined-behavior-based optimizations—that is, recognize "if (x == 0)
goto err_handler; else y/x;" and replace it with
"register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".

Jeffrey

Hi Cameron,

Hey Duncan,

...

    can't front-ends implement this themselves using the "select" constant
    expression? For example, rather than dividing by "x" you can divide by
    "x == 0 ? 1 : x".

I'm not sure that I follow. I would like to see the division by zero execute and
trap at runtime.

For example, x = 2/0. It looks like your suggestion would generate the
expression 2/1, which wouldn't trap as expected.

Am I misinterpreting your suggestion?

I didn't realize you wanted a trap, I just thought you wanted to avoid undefined
behaviour. If you want a trap you are going to have to (IMO) output IR
instructions rather than a constant expression. For example you can output the
equivalent of
   if (x == 0)
      llvm.trap() // or maybe some special trap routine
   y = whatever/x
   ... use y ...
rather than the constant expression "whatever/x". If the optimizers can prove
that x is never zero then this will be sunk back into a constant expression (if
"whatever" is a constant expression).

Ciao, Duncan.