Our team is developing on a downstream target that utilizes the HardwareLoops pass and have found that it generates unexpected code with regards to a regression test that we have. I’ve not 100% vetted the test itself with regards to the specifics of the C standard, but logically it makes sense:
I have the test up on Compiler Explorer, and the offending code can be duplicated from a stock trunk clang on PowerPC: https://godbolt.org/z/KzW3nYjra
The test itself intends to ensure that small-width loop counters are not promoted. It does this by constructing a loop with an unsigned 8-bit value and purposefully underflowing line 20 with ‘–count’. What is expected to happen is that the 8-bit value underflows to 0xFF, and the loop goes on to execute 256 times, exiting the loop and returning 0. In the failure case where p increments past the end of buffer, the test returns 1. I believe this failure case is optimized out as undefined behavior.
In the PowerPC disassembly of the compiled test:
mr 30, 3
…
mtctr 30
.LBB0_1: # =>This Inner Loop Header: Depth=1
bdnz .LBB0_1