[RFC] Add 3-way comparison intrinsics

nikic · January 31, 2024, 3:56pm

Background

A 3-way comparison between a and b returns 1 if a > b, 0 if a == b and -1 if
a < b. Such comparisons are used in C++ via operator<=> and in Rust via the
PartialOrd and Ord traits.

Currently, there is no canonical form for how such comparisons should be represented. There are a few typical classes of representations:

Comparison-select sequences, such as a == b ? 0 (a < b ? -1 : 1), which is what operator<=> currently generates. There are a lot of variations of this general pattern, using different predicates and different select orderings.
Comparison-select sequence with extension, such as a < b ? -1 : sext(a > b). This is the canonical form of (1) for certain choices of predicates and select orderings.
Comparison-arithmetic sequences, such as zext(a > b) + sext(a < b).

Which of these choices produces the best codegen depends on the target. For example, option 3 is ideal for riscv32, while for many other targets it’s some variant of option 2.

The patterns also behave differently and unreliably in the middle-end optimizer. There is generally no choice that will produce optimal optimization effects in all cases, you just get different subsets.

Especially important is that patterns like (a <=> b) <= 0 fold to a <= b etc very reliably. This is currently not the case (see e.g. folds for icmp-of-sum-of-extended-i1 aren't happening in more complex code · Issue #73417 · llvm/llvm-project · GitHub for one representative issue).

Proposal

I propose to introduce two intrinsics with the following signatures:

; Returns 1 if (%a ugt %b), -1 if (%a ult %b), 0 if (%a eq %b)
iN @llvm.ucmp.iN.iM(iM %a, iM %b)

; Returns 1 if (%a sgt %b), -1 if (%a slt %b), 0 if (%a eq %b)
iN @llvm.scmp.iN.iM(iM %a, iM %b)

These return the unsigned or signed three-way comparison result of two M-bit integers into an N-bit integer result. N must be greater or equal to 2, as the result is otherwise not representable. Vector overloads must have the same number of elements for both types.

These intrinsics would ensure that there is a canonical pattern for 3-way comparisons which transforms can recognize, and targets expand into their preferred form.

Alternatives

Don’t add intrinsics

An alternative to adding intrinsics is to instead to pick some canonical form and teach optimization passes and backends about it.

A key problem with doing this is that the patterns involved are complex and fragile. It is very simple for optimizations to transform it in a way that makes it no longer a recognizable three-way comparison.

For example, a representation that uses the slt and sgt predicates might have one of the predicates canonicalized into an unsigned variant, while the other is left alone. A representation that uses ult and eq might optimize the operands of the eq comparison while leaving the ult comparison alone.

Use only one type overload

The intrinsics as proposed specify the result type separately from the input type. An alternative would be to say that the input and result types are required to be the same.

A disadvantage of this would be that 3-way comparison of booleans can not be presented using the intrinsics.

Support other types

This is not so much an alternative as an extension. I believe that integer are the most important use-case, but three-way comparisons can in principle also be supported for other primitive types.

For pointers, the llvm.ucmp and llvm.scmp intrinsics could be supported as-is, simply by also allowing pointer types in place of iM.

For floating-point numbers, the existence of NaN makes the definition of a 3-way comparison ambiguous. We could define an intrisic that returns poison for NaN inputs, but I’m not sure whether this would actually be useful to anyone.

As such, I’m open to a pointer extension, but don’t think we should support floating-point types at this time.

Implementation

I haven’t done any implementation work on this. I’m considering to offer this as a GSoC project, as it seems like a good way to get familiar with a broad slice of LLVM.

dmlloyd · January 31, 2024, 4:54pm

Regarding treatment of floating point, if it is helpful as a reference: Java supports 3-way comparison for floating point values (in fact, that’s the only way to compare them in Java) using two opcodes (F|D)CMPL and (F|D)CMPG. The L/G suffix indicates the treatment of NaN.

To quote the spec:

[Assume] at least one of value1 or value2 is NaN. The dcmpg instruction pushes the int value 1 onto the operand stack and the dcmpl instruction pushes the int value -1 onto the operand stack.

And this note:

Note: The dcmpg and dcmpl instructions differ only in their treatment of a comparison involving NaN. NaN is unordered, so any double comparison fails if either or both of its operands are NaN. With both dcmpg and dcmpl available, any double comparison may be compiled to push the same result onto the operand stack whether the comparison fails on non-NaN values or fails because it encountered a NaN. For more information, see §3.5.

The asymmetry with LLVM’s “ordered” vs “unordered” comparison did add a little bit of difficulty for us to map these opcodes to simple compares, but there is an unambiguous and direct mapping.

nikic · January 31, 2024, 5:56pm

Thanks for the reference, very interesting! Neither of those formulations would be usable in Rust, which needs to return a different value (like 2 or -2, I don’t know for sure) if the value is NaN.

A potential way to cover these different cases in one intrinsic would be to have iN @llvm.fcmp.iN.fM(fM %a, fM %b, iN %nan), where %nan is returned if one of the operands is NaN. Then %nan == 1 would be dcmpg, %nan == -1 would be dcmpl and %nan == 2 (or so) would be PartialOrd<f32>.

I’d expect that such a float intrinsic would be nearly entirely orthogonal to the integer intrinsics in terms of implementation though.

jcranmer · January 31, 2024, 7:42pm

Integer comparisons are a total order, meaning the results are <, =, or >, and the mapping to -1/0/1 is fairly common.

Floating comparisons are a partial order, meaning the results are <, =, >, or unordered, and there’s no consistent representation of partial orderings. Rust has PartialOrd trait, where unordered is None, and I don’t believe that Rust has a stable ABI for Option. C++20 has std::partial_ordering, but the values of std::partial_ordering::unordered is unspecified [1]. At a quick glance, libstdc++ seems to use 2 here while libc++ is using -127. I do agree that trying to define a convention for partial ordering doesn’t seem worthwhile.

Another potential thing to look at would be if you get any benefits from having a branch variant that lets you a three-way branch. It’s not clear to me that the answer is “yes,” but it could be worth at least investigating.

[1] Incidentally, so are the other values for less, equal, and greater, but everyone seems to use -1, 0, 1 for these anyways.

scottmcm · January 31, 2024, 7:48pm

Thanks for proposing this, @nikic ! I’d love to be able to update rustc to use this.

Rustc and Clang are using different encodings of <=> right now (Representing `<=>` in IR), so it’d be really nice to have the one clearly-best way to emit this where all the frontends can be confident the backend will handle it well.

To elaborate, there’s at least 12 different seemingly-equivalent-to-the-user ways to write the select-of-select form of this (for each signedness), some of which end up with materially different codegen and optimization implications.

Option 3 can also be quite elegant on x64, for example generating u8::cmp as

	cmp	dil, sil
	seta	al
	sbb	al, 0

To emphasize this one, while current approaches work for a single scalar comparison, in practise it’s not reliable even for a basic 2-tuple of primitives.

For example, with (i16, u16), our best attempt in the Rust standard library so far is that (a <=> b) < 0 on the tuple optimizes well but (a <=> b) <= 0 doesn’t.

Option<Ordering> does not have a stable ABI, correct. I agree with not defining a convention for partial ordering.

(We’re currently using 2 for None on that type, but there’s a reasonable chance it’ll end up using -2 instead at some point, and it’s likely that it’ll never be guaranteed to be any specific value.)

nikic · February 5, 2024, 4:02pm

For the record, I’ve put up the corresponding GSoC project here: [LLVM] Add 3-way comparison intrinsics

Oh, that’s very nice. This form seems to be great for i8 result types on X86, which presumably is the only one that occurs in Rust. For larger results it becomes not quite as nice due to the need to zero registers first.

nikic · March 6, 2024, 3:29pm

Forwarding two questions from Add 3 way compare <=> integer intrinsics to Langref by miguelraz · Pull Request #83227 · llvm/llvm-project · GitHub :

Intrinsic naming: I proposed llvm.ucmp and llvm.scmp here. The PR uses llvm.uthreecmp and llvm.sthreecmp. Wondering if anyone has strong opinions on what the name should be. (An even more explicit possibility would be llvm.uthreewaycmp.)

Types: I proposed using two type overloads, allowing to separately specify the return and the argument type. So you can have both i32 @llvm.ucmp.i32.i32(i32, i32) and i8 @llvm.ucmp.i8.i32(i8, i8).

A possible alternative that came up is to instead specify a fixed return type. This could be something like i2 @llvm.ucmp.i32(i32, i32) if we’re fine with the illegal type, or i8 @llvm.ucmp.i32(i32, i32) if we want to stick to something byte-sized.

I think for the ISD opcodes we’d definitely want to have the result type overload, but for the IR intrinsic I’m not sure whether just fixing the type has any disadvantages. Wondering if anyone has thoughts on this.

jayfoad · March 6, 2024, 3:39pm

I like llvm.ucmp and llvm.scmp.

I like this flexibility (and I don’t see any downside except for slightly longer mangled intrinsic names in textual IR) but I wouldn’t insist on it.

nlopes · March 6, 2024, 3:52pm

me too.

dmlloyd · March 6, 2024, 4:11pm

I prefer llvm.[s|u]cmp as well. FWIW it matches with the Java convention (I forgot to mention that in addition to the aforementioned operations, Java also has LCMP which would be equivalent llvm.scmp for 64-bit integers).

miguelraz · March 7, 2024, 5:04pm

FWIW, I now lean strongly in favor of llvm.sthreewaycmp and llvm.uthreewaycmp.

These instructions are much more likely to be read by non-experts and having the explicitness of the name will help us with bug detection and issue filing in a much easier fashion if beginner eyes are the ones running into the intrinsic the most.

nhaehnle · March 7, 2024, 9:45pm

What’s the expected failure mode here?

I think there is a good chance that it’s not understood at all and people will have to lookup its meaning, but that applies to lots of stuff and some amount of reading documentation is to be expected. (And reading documentation would be required even with a name that includes “threeway”, because that in itself doesn’t tell you what the result value means.) What would be worse is if llvm.[su]cmp could be misunderstood, but I don’t see how that would happen?

miguelraz · March 9, 2024, 4:23am

Well, I’m imagining cases where Rust programmers use match foo.cmp() { ... } and try to look at the assembly and a threewaycmp isn’t there and they can report it.

I’m a (stubborn!) beginner and I wouldn’t have thought scmp meant it was a threeway comparison, so I’m extrapolating that others will run into the same issue and I want to save them some grief. Additionally, the ability for ESL speakers (like myself) to find the much more relevant threewaycmp related searches will help people have an easier time understanding these topics.

I know it’s common practice, but a plethora of well-known acronyms to expert compiler engineers is not user friendly to people looking into these internals for the first time. I’d rather start building bridges now for easier understanding, given that the bitcode representation is what will be understood by the machine in the end anyways.

nikic · March 9, 2024, 8:45am

I get your general point, but at the same time…

…why would a Rust programmer using cmp() be looking specifically for uthreewaycmp rather than ucmp in the resulting IR? I don’t think that Rust has even a single mention of the “3-way comparison” terminology in its documentation right now. The proposed MIR primitive is called BinOp::Cmp as well.

We use the term “3-way comparison” to give a clear verbal distinction to “comparison with predicate”, but when it comes to actual usage/symbols in programming languages, I think a “3-way comparison” is pretty universally just called a “comparison”. You’ll find some variation of cmp/compare/compareTo in Rust, OCaml, Haskell, Scala, etc, all the way back to C with its memcmp/strcmp style functions. Offhand, I can’t really think of a single case where I’ve seen a method being called some variation of threeWayCmp rather than some variation of cmp.

miguelraz · March 10, 2024, 2:41am

That’s fair.

I’ve changed the PR to reflect said standard practices then.

scottmcm · March 10, 2024, 3:39am

I don’t find this particularly persuasive, because today if a Rust programmer calls a.div(b) they’re looking for udiv or sdiv in the assembly, and if they call a.lt(b) they’re looking for icmp ult or icmp slt.

So if when they call x.cmp(y) they’re looking for llvm.ucmp or llvm.scmp, that seems about as direct as could be.

Topic		Replies	Views
[LLVM] Add 3-way comparison intrinsics GSoC gsoc2024	15	2475	March 30, 2024
comparison pattern trouble LLVM Dev List Archives	14	90	October 5, 2010
[RFC] Optimizing Comparisons Chains LLVM Dev List Archives	3	93	June 13, 2017
Library updates for C++20/C++2a Clang Frontend	3	126	November 30, 2018
[IR canonicalization] 6 ways to choose {-1,0,1} LLVM Dev List Archives	11	109	July 11, 2017