Unsigned integer underflow in HardwareLoops pass (PPC, perhaps ARM)

Our team is developing on a downstream target that utilizes the HardwareLoops pass and have found that it generates unexpected code with regards to a regression test that we have. I’ve not 100% vetted the test itself with regards to the specifics of the C standard, but logically it makes sense:

I have the test up on Compiler Explorer, and the offending code can be duplicated from a stock trunk clang on PowerPC: https://godbolt.org/z/KzW3nYjra

The test itself intends to ensure that small-width loop counters are not promoted. It does this by constructing a loop with an unsigned 8-bit value and purposefully underflowing line 20 with ‘–count’. What is expected to happen is that the 8-bit value underflows to 0xFF, and the loop goes on to execute 256 times, exiting the loop and returning 0. In the failure case where p increments past the end of buffer, the test returns 1. I believe this failure case is optimized out as undefined behavior.

In the PowerPC disassembly of the compiled test:

mr 30, 3

mtctr 30

.LBB0_1: # =>This Inner Loop Header: Depth=1

bdnz .LBB0_1

Yep, that doesn’t look good and deserves a PR and some more looking into.
Wondering why we haven’t seen this before: I guess at higher optimisations levels this problem is hidden by iteration count checks generated by the vectoriser or loop unroller.
It is a bit of a funny test, as also shown by the code produced with a higher opt level, but that shouldn’t be an excuse I think.


I am just curious, was a PR ever opened for this? I can certainly confirm that this causes 2^64 iterations of the loop on PPC64 so this is probably something we should fix.

I did not open a PR for this, myself. I believe our team has temporarily marked it as a known and lower priority edge case, since the test case is sufficiently convoluted.

I’m more than willing to help review and get a fix upstream, though.