Meaning of `sub nsw`

In the following code.
  unsigned char x1 = atoi(argv[1]);
  unsigned char x2 = atoi(argv[2]);
  printf("%d\n", x1-x2);

The substraction of the last command is translated to the following IR code.

%18 = load i8, i8* %6, align 1
%19 = zext i8 %18 to i32
%20 = load i8, i8* %7, align 1
%21 = zext i8 %20 to i32
%22 = sub nsw i32 %19, %21

I don't follow the explanation of nsw in langref. Could anybody help
explain what nsw means here? Thanks.

In simple terms, nsw (for no signed wrap) means you invoke C-style undefined behavior should the operation have an overflow when interpreted as a signed operation.

Practically, this means (roughly) two things:

1. Optimisers can assume that overflow will not occur. For example, that %22 >= %19 and %22 >= %21. This can be used, for example, to reason about loop termination.

2. If, after later optimisations, we learn enough about the values of %19 and %21 to determine that the operation will overflow, then we can replace it with undef.

The latter is less useful, because this typically means the code is wrong (though it might also be dead, for example in C++ template instantiation where the programmer is relying on DCE removing unreachable code that would exhibit undefined behaviour). The former is a lot more useful for exposing later optimisation opportunities.

David

Hi,

If it can help, another view of it: this can be seen as a promise from the user (or the frontend language) that this code/instruction will never ever be executed at runtime with values that would trigger an overflow*. So this is a contract the IR offers to the user/frontend (and the optimizations David mentions take advantage of this promise to generate better/faster code).

*Actually this is not correct, and I intentionally simplified because the detailed explanation is likely why it isn’t easy to understand in LangRef. A slightly more elaborate explanation is that the instruction is allowed to overflow as long a nothing in the program depends on the the result of this instruction.
This is a way in LLVM to “delay” the effect of the overflow: there is no harm to overflow is the result of the instructions does not impact the program behavior anyway!
This is what the section on poison value tries to illustrate in LangRef: https://llvm.org/docs/LangRef.html#poisonvalues

If you want to understand more undefined behavior in LLVM, this is good read: https://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf (and blog post about this paper: https://blog.regehr.org/archives/1496 )
Also, a talk at the LLVM dev meeting: https://www.youtube.com/watch?v=_-3Iiads1EM

Best,