Checked arithmetic

In looking at the LLVM reference manual, it is conspicuous that (a) the
IR does not define condition codes, and (b) the IR does not define
opcodes that return condition results in addition to their computational
results.

Given the IR as it stands, how does one go about *efficiently*
implementing a language that requires checked arithmetic? I do
understand that it can be done using intrinsics, but that implementation
is (to put it mildly) suboptimal.

For integer ops, I really want to be able to get at the carry/overflow
bit. For floating point ops, I really want to be able to get out the
floating point NaN state in order to exploit the NaN propagation
features provided by some hardware.

I'm sure this has been considered, but no means for dealing with this
sort of thing is jumping out at me from looking at the IR spec. What am
I missing?

Note: I can see at least two ways to deal with this by extending the IR,
but I would like to understand the current intentions first.

shap

In looking at the LLVM reference manual, it is conspicuous that (a) the
IR does not define condition codes, and (b) the IR does not define
opcodes that return condition results in addition to their computational
results.

We currently don't have this because noone has implemented it yet. It would be great to have this.

Given the IR as it stands, how does one go about *efficiently*
implementing a language that requires checked arithmetic? I do
understand that it can be done using intrinsics, but that implementation
is (to put it mildly) suboptimal.

Why?

-Chris

Hmm. Perhaps I shouldn't have written so quickly. My assumption was that
intrinsics would likely entail a procedure call.

More later -- my three year old just woke up.

shap

I want to background process this for a bit, but it would be helpful to
discuss some approaches first.

There would appear to be three approaches:

  1. Introduce a CC register class into the IR. This seems to be a
     fairly major overhaul.

  2. Introduce a set of scalar and fp computation quasi-instructions
     that accept the same arguments as their computational counterparts,
     but produce *only* the condition code. For example:

        add i32 ... produces result
        add_cc i32 ... produces condition codes

     Once this exists, re-combine the instructions in the back end with
     peepholes.

  3. Handle CC as a black magic special case, which at least has the
     merit of tradition. :slight_smile:

Given the number of different ways that different bits of hardware
handle CCs, I am inclined to think that the right approach, in abstract,
is to introduce a CC register class and deal with CCs in a fully general
way. If so, this is not a small change.

Opinions/reactions?

Hi,

There would appear to be three approaches:

  1. Introduce a CC register class into the IR. This seems to be a
     fairly major overhaul.

  2. Introduce a set of scalar and fp computation quasi-instructions
     that accept the same arguments as their computational counterparts,
     but produce *only* the condition code. For example:

        add i32 ... produces result
        add_cc i32 ... produces condition codes

     Once this exists, re-combine the instructions in the back end with
     peepholes.

  3. Handle CC as a black magic special case, which at least has the
     merit of tradition. :slight_smile:

4. Do arithmetic in a type with one more bit. For example, suppose you
want to know if an i32 add "x+y" will overflow. Extend x and y to 33
bit integers, and do an i33 add. Inspect the upper bit to see if it
overflowed. Truncate to 32 bits to get the result. Probably codegen
can be taught to implement this as a 32 bit add + inspection of the CC.

Ciao,

Duncan.

PS: In order for codegen to be able to handle i33, you need to turn
on the new LegalizeTypes infrastructure.

As a quick fix, that isn't a bad solution for bignum arithmetic, but it
doesn't deal with the upper bits from multiply, and it doesn't deal with
floating point condition codes at all.

shap

Hi Shap,

> 4. Do arithmetic in a type with one more bit. For example, suppose you
> want to know if an i32 add "x+y" will overflow. Extend x and y to 33
> bit integers, and do an i33 add. Inspect the upper bit to see if it
> overflowed. Truncate to 32 bits to get the result. Probably codegen
> can be taught to implement this as a 32 bit add + inspection of the CC.

As a quick fix, that isn't a bad solution for bignum arithmetic, but it
doesn't deal with the upper bits from multiply, and it doesn't deal with
floating point condition codes at all.

can you really use condition codes for multiply and floating point? You
can handle integer multiply by doing it in an integer twice as wide as the
one you are interested in, so an i64 on a 32 bit machine. Is there hardware
that supports checking for multiply overflow in a more efficient fashion,
and if so how does it work/get used?

Thanks,

Duncan.

can you really use condition codes for multiply and floating point?

For scalar multiply, not really. What you *can* do is use the native
multiply instruction that does (32x32)->(lower,upper), and then decide
based on the upper 32 bits of the result whether you had a
multiply-carry. Similarly for i-divide 64->(quotient,remainder).

For floating point I am thinking about something else entirely. There is
a NaN propagation mode specified in IEEE that lets you run a chain of FP
ops and then check for NaN at the end, with the guarantee that the
intervening ops will be NaN-preserving. You can't store the results back
to memory without a check, but the technique avoids a check in the fp
pipeline that can be serializing on intermediate expression results.

The catch is that you can't use that technique unless you can ensure
that the FPcc isn't clobbered by something else in the middle, which
seems to suggest that you want to track the FP condition code as a
register.

And then there are machines with multiple condition codes, and machines
the simply put the condition codes into a scalar register. These seem to
suggest that first-rate support for condition codes wants them to be
handled as a register class.

  You
can handle integer multiply by doing it in an integer twice as wide as the
one you are interested in, so an i64 on a 32 bit machine. Is there hardware
that supports checking for multiply overflow in a more efficient fashion,
and if so how does it work/get used?

On some machines there is. For example, both SPARC and MIPS have
instructions that do a 32x32 multiply producing a 64 bit result. The
upper bits of the result go into a side register, or in some cases the
output of the multiply unit goes into a distinct (lower, upper) register
pair that is not part of the normal register set.

This is an idea that comes and goes. The side register has to be renamed
specially, and can become a rename bottleneck in the issue unit, so it
may be an idea whose time came and went. On machines that do 32x32->32,
I do not know what outcome happens on the condition codes, but we can
surely look that up.

shap

Hi,

> can you really use condition codes for multiply and floating point?

For scalar multiply, not really. What you *can* do is use the native
multiply instruction that does (32x32)->(lower,upper), and then decide
based on the upper 32 bits of the result whether you had a
multiply-carry. Similarly for i-divide 64->(quotient,remainder).

I think this is what you get if you do the multiply in an integer of twice
the width.

For floating point I am thinking about something else entirely. There is
a NaN propagation mode specified in IEEE that lets you run a chain of FP
ops and then check for NaN at the end, with the guarantee that the
intervening ops will be NaN-preserving. You can't store the results back
to memory without a check, but the technique avoids a check in the fp
pipeline that can be serializing on intermediate expression results.

Yes.

The catch is that you can't use that technique unless you can ensure
that the FPcc isn't clobbered by something else in the middle, which
seems to suggest that you want to track the FP condition code as a
register.

And then there are machines with multiple condition codes, and machines
the simply put the condition codes into a scalar register. These seem to
suggest that first-rate support for condition codes wants them to be
handled as a register class.

In any case, this is all at the codegen level. You suggested introducing
condition codes into the IR, and I tried to point out that you can get
at least some of what you want by using arbitrary precision integers and
floating point types (these last don't exist yet) in the IR. For sure there
needs to be some careful codegen work to have everything come out optimally
when you codegen.

> You
> can handle integer multiply by doing it in an integer twice as wide as the
> one you are interested in, so an i64 on a 32 bit machine. Is there hardware
> that supports checking for multiply overflow in a more efficient fashion,
> and if so how does it work/get used?

On some machines there is. For example, both SPARC and MIPS have
instructions that do a 32x32 multiply producing a 64 bit result. The
upper bits of the result go into a side register, or in some cases the
output of the multiply unit goes into a distinct (lower, upper) register
pair that is not part of the normal register set.

This is an idea that comes and goes. The side register has to be renamed
specially, and can become a rename bottleneck in the issue unit, so it
may be an idea whose time came and went. On machines that do 32x32->32,
I do not know what outcome happens on the condition codes, but we can
surely look that up.

Ciao,

Duncan.

Why not define an "add with overflow" intrinsic that returns its value and overflow bit as an i1?

-Chris

Chris:

I understand several simple ways to implement add with carry. Your
suggestion is one of them. What I'm trying to understand is how to
handle the conditional code issue generally.

shap

Hi Chris,

Why not define an "add with overflow" intrinsic that returns its value and
overflow bit as an i1?

what's the point? We have this today with apint codegen (if you turn on
LegalizeTypes). For example, this function

define i1 @cc(i32 %x, i32 %y) {
  %xx = zext i32 %x to i33
  %yy = zext i32 %y to i33
  %s = add i33 %xx, %yy
  %tmp = lshr i33 %s, 32
  %b = trunc i33 %tmp to i1
  ret i1 %b
}

codegens (on x86-32) to

cc:
        xorl %eax, %eax
        movl 4(%esp), %ecx
        addl 8(%esp), %ecx
        adcl $0, %eax
        andl $1, %eax
        ret

which uses the condition code as desired. Pity about the
redundant andl $1, %eax!

Ciao,

Duncan.

Hi Chris,

Why not define an "add with overflow" intrinsic that returns its value and
overflow bit as an i1?

what's the point? We have this today with apint codegen (if you turn on
LegalizeTypes). For example, this function

The desired code is something like:

foo:
    addl %eax, %ecx
    jo overflow_happened
    use(%ecx)

etc.

-Chris

define i1 @cc(i32 %x, i32 %y) {
%xx = zext i32 %x to i33
%yy = zext i32 %y to i33
%s = add i33 %xx, %yy
%tmp = lshr i33 %s, 32
%b = trunc i33 %tmp to i1
ret i1 %b
}

codegens (on x86-32) to

cc:
       xorl %eax, %eax
       movl 4(%esp), %ecx
       addl 8(%esp), %ecx
       adcl $0, %eax
       andl $1, %eax
       ret

which uses the condition code as desired. Pity about the
redundant andl $1, %eax!

Ciao,

Duncan.

-Chris

Hi Chris,

> what's the point? We have this today with apint codegen (if you turn on
> LegalizeTypes). For example, this function

The desired code is something like:

foo:
    addl %eax, %ecx
    jo overflow_happened
    use(%ecx)

how's an intrinsic going to help with this? Also, is there any chance
of codegen being capable one day of combining carry arithmetic coming
from apint and the appropriate conditional branch into a "jo"?

Ciao,

Duncan.

I'm suggesting basically:

res,overflow = add_with_cary(a,b);
if (overflow) goto somewhere
use(res)

-Chris

Yes. Sorry. I do see that that will work, and I had not intended to
sound ungrateful for your response. It is my nature, when I look at a
design problem, to try to consider the whole problem before implementing
some part. It may turn out here that there isn't really a "whole
problem" to consider. Even if there is, I agree that the shortest path
to solving my immediate problem is to do exactly as you suggest.

I guess my take is that when faced with an architectural question that
you eventually may have to address in full, quick fixes tend to accrete
that have to be undone when you get around to the general solution, and
these make implementing the general thing harder -- unless you have
thought it out in advance and the quick fixes are in line with the
eventual solution.

Now it may turn out that there isn't any "general thing" here at all.
It's just that I'ld like to think that through before adopting what you
suggest.

shap

This particular use case pattern (cond branch using CC value generated
as a side effect of a useful operation) appears often enough to be worth
getting right.

So if it's beyond what codegen can do, then codegen wants to be fixed.

It's clear enough that I can get working code emitted today -- that's
not the issue. The concern is that languages supporting non-hardware
arithmetic precisions rely on this particular pattern a whole lot.

I definitely need to dig in harder and understand how LLVM works before
I try to seriously advocate anything on this topic (or any other), but
from a surface understanding of the IR, and experience building several
prior compilers, I would say that the current IR is very well suited to
C and not yet an ideal match to Ada, COBOL, or some other languages.

That's not a criticism. Just a statement of first impressions.

shap

Hey, you need to be careful with this reasoning or else you'll end up implementing a whole new language, compiler, and OS.

Oh wait nevermind :).

John

Don't forget prover. :slight_smile:

shap

Don't forget prover. :slight_smile:

Say on that note here's something that I want to see: a formal semantics for LLVM in for example higher order logic. This would probably not be that difficult.

The problem that this solves is that current verified compiler efforts appear to be highly specific to both the language and the target.

Once the semantics exists, you can either prove once and for all the that each LLVM->LLVM optimization pass preserves the semantics of the code, or else just do translation validation.

Verifying the LLVM->native backends should not be incredibly difficult.

Verifying a real LLVM front end would be a big projet but substantial headway is being made on that kind of thing for example by Xavier Leroy's group.

I've tried to sell some of the HOL folks on this idea (they have a formal semantics for an ARM ISA, for example) but so far no dice.

John Regehr