[RFC] Add nowrap flags to trunc

nikic · March 6, 2024, 4:19pm

Proposal

Add the nuw (no unsigned wrap) and nsw (no signed wrap) poison-generating flags to the trunc instruction, with the following semantics:

trunc nuw iN %x to iM returns poison if any of the truncated bits are non-zero.
trunc nsw iN %x to iM returns poison if any of the truncated bits are not the same as the top bit of the truncation result.

A corollary is that both zext iM (trunc nuw iN %x to iM) and sext iM (trunc nsw iN %x to iM) can be refined to %x.

Another corollary is that trunc nuw nsw implies that the truncation result is non-negative.

While sharing the same name, these would be separate flags on TruncInst and not share logic with OverflowingBinaryOperator (as trunc is not a binary operator).

Motivation

The motivation for these flags is similar to the recently introduced zext nneg and or disjoint flags. We have certain transforms that may introduce no-op truncations, but this knowledge is later lost.

The prime example of this is IV widening. When an induction variable is replaced with a wider one, widening tries hard to also widen its users. However, if it fails to do this and inserts a truncation, the fact that it is always either nuw or nsw is lost, and likely not recoverable by later transforms (that do not use SCEV). For example, this currently happens for widenable LCSSA phi users.

Another motivation is to support a proper IR encoding for non-byte-sized memory accesses. For example, the following IR

%v = load i1, ptr %p
store i1 %v, ptr %p2

has unspecified behavior, but in practice behaves like:

%v.load = load i8, ptr %p
%v = trunc nuw i8 %v.load to i1
%v.store = zext i1 %v to i8
store i8 %v.store, ptr %p2

We strongly prefer only having byte-sized accesses on the IR level, but the lack of trunc nuw makes it hard to encode them without losing important information (namely, that the top bits of the non-byte-sized access must be zero).

With trunc nuw, the above truncation and extension can be omitted, without introducing a masking operation:

%v = load i8, ptr %p
store i8 %v, ptr %p2

Alternatives

Don’t add flags

Adding new poison-generating flags adds some additional degree of complexity, which may not be justified.

The IV widening problem could be partially addressed by trying even harder to widen users, even those located outside the loop. Ultimately, this will never catch all cases due to mismatches in analysis capabilities.

The problem of non-byte-sized accesses can (and already is) partially addressed by using range metadata instead:

%v.load = load i8, ptr %p, !range !{i8 0, i8 2}
%v = trunc i8 %v.load to i1
%v.store = zext i1 %v to i8
store i8 %v.store, ptr %p2

Even without trunc nuw, the trunc/zext pair can be optimized away here. However, the !range metadata is more easily lost, e.g. because the memory access was eliminated.

Use different flag names

This proposal reuses the nuw and nsw flags that are already familiar from the overflowing binary operators add, sub and mul.

I think the flag names are quite fitting for trunc as well, but the reuse might cause confusion.

The main alternative I can think of would be something like trunc zero and trunc sign.

jdoerfert · March 6, 2024, 4:28pm

Names and approach seem very sensible to me.

scottmcm · March 6, 2024, 5:03pm

Yes please! Rust would use this all over the place because Option and Result have exactly the “we want an i1 for select but have to load/store it as i8 to meet other rules” problem. Making it obvious in the instruction stream — without depending on optimizations looking deeply enough to find !range on a load or soon parameter attributes — would I hope have it be much simpler for common 2-variant enum patterns to optimize well.

One thing the proposal might want to address directly:

It has seemed to me like opt today prefers to change trunc i8 %x to i1 to something involving icmp ne i8 %x, 0 https://llvm.godbolt.org/z/WKPTev64T – even if there’s !range metadata that the value is already in [0, 2) https://llvm.godbolt.org/z/3P3E6vc8P.

Once we have nowrap truncation, I wonder if the preferred canonicalization of that should be to trunc nuw instead of the icmp. That would presumably take backend work, but once there’s an icmp there it’s lost the information needed to undo it again, such as if it’s passed/returned in an register where it’s extended anyway.

jayfoad · March 6, 2024, 5:14pm

I like it.

Mohamed · March 6, 2024, 5:19pm

I will prepare a patch for it.

preames · March 7, 2024, 9:44pm

+1 from me. I think this could be very useful in IV widening as well.

Shrep16 · March 24, 2024, 4:25pm

IV widening

A newbie question - where does this happen ? Is it the function - visitIVCast

?

nikic · March 25, 2024, 12:08pm

The IV widening transform is implemented from llvm-project/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp at edcf65d40c4160db0a888e801560a671b10179d2 · llvm/llvm-project · GitHub to the end of the file.

Shrep16 · March 26, 2024, 11:44am

Okay, get it. Thank you so much!

nickdesaulniers · April 1, 2024, 5:26pm

Sounds like there’s a patch out for this: [IR] Add nowrap flags for trunc instruction by elhewaty · Pull Request #85592 · llvm/llvm-project · GitHub

Topic		Replies	Views
trunc nsw/nuw? LLVM Dev List Archives	16	122	July 7, 2017
[RFC] Add nusw and nuw flags for getelementptr IR & Optimizations	20	1338	July 24, 2024
nsw/nuw for trunc LLVM Dev List Archives	15	172	October 3, 2011
[RFC] Integer overflow flags support in `arith` dialect MLIR	20	860	February 21, 2024
Improving SCEV's behavior around IR level no-wrap flags LLVM Dev List Archives	13	122	September 24, 2016