Here is the next iteration of the patch. The only comment not
addressed is this one:
It would be better to implement a target-independent check for
overflow for the "Legal" case (like how SADDO does). Hacker's > Delight
has some hints on how to do this. It's not easy for the signed case,
but is do-able.
It can be lowered to a division + a branch, so it would be
inefficient, plus it would
be a lot of work to implement it correctly (for me at least).
I was subscribed to llvmdev in 'digest' mode, so the reply might not
be correctly
handled by mailers. Sorry about that.
If you can get the relevant high product (UMUL_LOHI and friends), it's
a relatively straightforward comparison. Otherwise, yes, the general
case is quite tricky; inserting a division here is non-trivial.
There's also the special-case of multiplication by a constant: here,
the computation can be done with a single straightforward comparison.
Here is the next iteration of the patch. The only comment not
addressed is this one:
Thanks! It's looking good.
It would be better to implement a target-independent check for
overflow for the "Legal" case (like how SADDO does). Hacker's > Delight
has some hints on how to do this. It's not easy for the signed case,
but is do-able.
It can be lowered to a division + a branch, so it would be
inefficient, plus it would
be a lot of work to implement it correctly (for me at least).
Okay. It would be tricky. Please put a "FIXME" in there indicating
that it would be nice to have at some point. It's okay to submit this.
The add.with.overflow instrinsics don't seem to work with constant
arguments, i.e.
changing the call in add-with-overflow.ll to:
%t = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 0, i32 0)
causes the following exception when running the codegen tests:
Oh, and a few more possibilities:
1) Hackersdelight.org, for unsigned;
whether this is a good idea depends on the speed of the nlz
implementation. That version uses branches, but of course it can also
be done by ORing together the comparison results.
2) (C && A) | MulOverflow(B,C) | MulOverflow(A,D) | AddOverflow(B*C,
MulHi(B, D), A*D), where A, B are the two halves of Op1, C, D are the
two halves of Op2, and all operations are in the width of the halves.
This could be useful on x86 for 64-bit multiplies; it's not cheap, but
it can't really get much cheaper (besides the fact that Legalize can't
insert branches, which would be a nice improvement here). This is
roughly just adding overflow checks to every step of the normal 64-bit
multiply splitting algorithm.
3) (C && A) || ((C?B:D)*(C?C:A) + (B*D>>(Width/2)) >= 2^(Width/2)),
for unsigned, where A, B are the two halves of Op1, C, D are the two
halves of Op2, Width is the width of the original operands, and
everything is done zero-extended in the original width. Not the
greatest approach, but it's difficult to do much better with just
regular arithmetic and conditionals (besides the divide and compare
approach, which as far as I know isn't feasible here because of the
required chain?).
Just a possible issue from inspection: does this handle 64-bit
operations correctly on x86? Unless I'm missing something, it seems
likely to lead to a mysterious error for such operations.
Also, does DAGCombiner know not to touch operations with a known user
of the non-primary return value? This seems like it could lead to
strange code in some cases, and it seems likely to be the cause of the
crash Zoltan mentioned... would it be better to use target-specific
nodes for the calculation?