overflow + saturation stuff

Edwin was asking about how we should handle PR3328, how we should make GEP respect -fwrapv etc. I wrote up some thoughts here if anyone is interested:
http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt

-Chris

Sounds ambitious! A comprehensive, efficient trapv would be excellent.
gcc's implementation seems quite incomplete, for example it fails to trap overflows in the constant folder.

John Regehr

The proposal suggests to change/split the existing sub/add/mul opcodes. This makes me wonder to what extent it is (currently, or ever) advisable for an external compiler to generate LLVM IR. Is there a plan to stabilise at some point and guarantee backwards compatibility to a certain extent, or should compilers that are not integrated in the LLVM infrastructure always target one particular release of LLVM?

Jonas

LLVM does guarantee backwards compatibility with compiled bitcode. The C++ interfaces are not frozen, so you may need to upgrade code targeting LLVM when upgrading; reasonable efforts are made to avoid making this process painful. Of course, what code is contributed to the project will be maintained through these changes.

— Gordon

The proposal suggests to change/split the existing sub/add/mul
opcodes. This makes me wonder to what extent it is (currently, or
ever) advisable for an external compiler to generate LLVM IR. Is
there a plan to stabilise at some point and guarantee backwards
compatibility to a certain extent, or should compilers that are not
integrated in the LLVM infrastructure always target one particular
release of LLVM?

LLVM does guarantee backwards compatibility with compiled bitcode. The
C++ interfaces are not frozen, so you may need to upgrade code
targeting LLVM when upgrading; reasonable efforts are made to avoid
making this process painful.

Sorry for being unclear: I did not mean the C++ interface nor compiled bitcode, but the LLVM IR "assembler" interface (i.e., the .s files that llvm-gcc generats with "-emit-llvm -S").

Of course, what code is contributed to
the project will be maintained through these changes.

In our case, I doubt that would happen since it's a self-hosting Pascal compiler (with several other code generators besides the under-development LLVM backend).

But basically, if I understand you correctly: the correct interface would be compiled bitcode rather than the "assembler" level interface?

Jonas

Hi Chris,

Would it be better to split add into multiple opcodes instead of using
SubclassData bits? Compare this:

    switch (I->getOpcode()) {
    case Instruction::Add: {
       switch (cast<Add>(I)->getOverflowBehavior()) {
       case AddInstruction::Wrapping:
          // ...
       case AddInstruction::UndefinedSigned:
          // ...
       case AddInstruction::UndefinedUnsigned:
          // ...
       }
    }

with this:

    switch (I->getOpcode()) {
    case Instruction::Add:
      // ...
    case Instruction::SAdd_Open:
       // ...
    case Instruction::UAdd_Open:
       // ...
       break;
    }

I'm not sure about the name "Open"; fixed-size integers are "closed" under
wrapping and saturating add, so "open" sort of suggests an alternative,
and is concise. But regardless, a one-level switch seems more convenient
than a two-level one.

It's a little less convenient in the case of code that wants to handle all the
flavors of add the same way, but it still seems worth it.

Encoding might be a concern, as Sub, Mul, Div, and Rem would all have
variants, but there are plenty of bits in SubclassID, and it doesn't look like
the bitcode representation uses packed opcode fields.

Dan

Jonas Maebe wrote:

The proposal suggests to change/split the existing sub/add/mul
opcodes. This makes me wonder to what extent it is (currently, or
ever) advisable for an external compiler to generate LLVM IR. Is
there a plan to stabilise at some point and guarantee backwards
compatibility to a certain extent, or should compilers that are not
integrated in the LLVM infrastructure always target one particular
release of LLVM?

LLVM does guarantee backwards compatibility with compiled bitcode. The
C++ interfaces are not frozen, so you may need to upgrade code
targeting LLVM when upgrading; reasonable efforts are made to avoid
making this process painful.

Sorry for being unclear: I did not mean the C++ interface nor compiled bitcode, but the LLVM IR "assembler" interface (i.e., the .s files that llvm-gcc generats with "-emit-llvm -S").

Of course, what code is contributed to
the project will be maintained through these changes.

In our case, I doubt that would happen since it's a self-hosting Pascal compiler (with several other code generators besides the under- development LLVM backend).

But basically, if I understand you correctly: the correct interface would be compiled bitcode rather than the "assembler" level interface?

The textual IR (generally .ll files, not .s) are run through an auto-upgrader, the same as bitcode. Once we reach LLVM 3.0, we may break support for 2.x series .ll and .bc files.

Of course, we might get to LLVM 3.0 before implementing this feature. What really happens is that once we've changed the .ll/.bc format enough that backwards compatibility is difficult to maintain, we'll declare LLVM 3.0.

Nick

GCC's implementation has a huge number of problems, and I really don't think that implementing trapv in llvm-gcc would fare much better (fold mangles trees severely). Clang preserves and hands full unmangled source-level ASTs to codegen, so codegen could handle this properly.

That said, I don't know of anyone interested in implementing this in the short term.

-Chris

Hi Chris,

Would it be better to split add into multiple opcodes instead of using
SubclassData bits?

No, I don't think so. The big difference here is that (like type) "opcode" never changes for an instruction once it is created. I expect that optimizations would want to play with these (e.g. convert them to 'undefined' when it can prove overflow never happens) so I think it is nice to not have it be in the opcode field.

This also interacts with FP rounding mode stuff, which I expect to handle the same way with FP operations some day.

Compare this:

   switch (I->getOpcode()) {
   case Instruction::Add: {
      switch (cast<Add>(I)->getOverflowBehavior()) {
      case AddInstruction::Wrapping:
         // ...
      case AddInstruction::UndefinedSigned:
         // ...
      case AddInstruction::UndefinedUnsigned:

Sure, that is ugly. However, I think it would be much more common to look at these in "isa" flavored tests than switches:

if (isa<SAdd_OpenInst>(X))

is much nicer than:

if (BinaryOperator *BO = dyn_cast<BinaryOperator>(x))
   if (BO->getOpcode() == blah::Add && BO->getOverflow() == blah::Undefined)

However, we a) already suffer this just for Add, because we don't have an AddInst class, and b) don't care about the opcode anyway. IntrinsicInst is a good example of how we don't actually need opcode bits or concrete classes to make isa "work". It would be a nice cleanup to add new "pseudo instruction" classes like IntrinsicInst for all the arithmetic anyway.

If the switch case really does become important, we can just add a getOpcodeWithSubtypes() method that returns a new flattened enum.

-Chris

Why is this? If SubclassData can be modified, why not SubclassID too?
Having it const may help guard against something accidentally changing
it to an opcode that would require a different subclass, but it's a private
member, so modifications to it could be fairly effectively controlled.

I agree that isa/dyn_cast can be quite flexible, but they can handle
ranges of opcodes just as well as they can handle opcodes composed
from multiple fields. The big-switch idiom is a staple of compiler
construction; it would be nice to be able to continue to use it directly.

Dan

There is no technical reason, it just provides a more clear API and makes it easier to reason about. For example since SubclassID can't change, you don't have issues where you'd need to change the actual class of the impl.

-Chris