BSWAP matching in codegen

I’m looking into problems with the function DAGCombiner::MatchBSwapHWord and had a couple questions on how to proceed (I’m new here, and to llvm). It looks like there was never any test coverage for this code, and it doesn’t match the patterns it claims to be looking for. I found a fix for this in our fork, but the fix is even worse in that it matches a whole bunch of invalid patterns as well as the canonical patterns. I have simple test cases that break both the existing code and our local fix.

I see equivalent matching code upstream in InstCombiner::MatchBSwap() (InstCombineAndOrXor.cpp), but not an easy way to re-use it AFAICT. Questions:

–Is this functionality even worth fixing in DAGCombiner (ie, what is the probability that this complex pattern would re-emerge in codegen after the InstCombiner transform, weighed against the obviously error-prone nature of the matching code)?

–If so, I can patch up and test the existing logic, but since the original code is quite old I wanted to reach out for any modernization advice before embarking on that effort.

Thanks,

Jim

Are you sure there isn’t any test coverage? As far as I can tell, the tests from are still in the tree. I don’t think there is any existing code to match this particular pattern in instcombine. I guess it wouldn’t be hard to add, though. -Eli

Are you sure there isn’t any test coverage? As far as I can tell, the tests from https://reviews.llvm.org/rL133503 are still in the tree.

I looked at those, but none of them include the full pattern that decomposes into bswap and rol. I debugged through the X86 bswap.ll test and verified none of those cases make it through MatchBSwapHWord (they get handled in MatchBSwapHWordLow instead).

I don’t think there is any existing code to match this particular pattern in instcombine. I guess it wouldn’t be hard to add, though.

Yes, I guess the InstCombiner::MatchBSwap() function is very similar but not identical. It looks for patterns that match the bswap exactly, whereas the DAGCombiner version is looking for halfword swap patterns that decompose to a bswap and a 16-bit rotate.

Jim

The x86 tests don’t, but the ARM tests do: see, for example, test6 in test/CodeGen/ARM/rev.ll. If you compile that for x86, you get the expected bswap+rol. -Eli

Thanks, that helps enormously! The issue is that the match is supposed to support both cascade and tree OR patterns, but there appears to be a problem with the tree matching. Both test1 and test6 in the ARM tests exercise the cascade pattern, and I remember now our fix is confined to the tree case.

I hesitate to claim now that there’s no coverage for the tree pattern, but it is failing to match the patterns we use in our tests, one of which looks like this:

define i32 @test_tree(i32 %x) {

%byte0 = and i32 %x, 255 ; 0x000000ff

%byte1 = and i32 %x, 65280 ; 0x0000ff00

%byte2 = and i32 %x, 16711680 ; 0x00ff0000

%byte3 = and i32 %x, 4278190080 ; 0xff000000

%tmp0 = shl i32 %byte0, 8

%tmp1 = lshr i32 %byte1, 8

%tmp2 = shl i32 %byte2, 8

%tmp3 = lshr i32 %byte3, 8

%or0 = or i32 %tmp0, %tmp1

%or1 = or i32 %tmp2, %tmp3

%result = or i32 %or0, %or1

ret i32 %result

}

I’m still investigating exactly how it’s failing on this one; let me know if I’m missing something else here.

Jim

Oh, I see… yes, the existing tree matching logic is complete nonsense. It somehow manages to completely ignore N1, which is where half the value is supposed to come from. -Eli

Reported via https://llvm.org/bugs/show_bug.cgi?id=31357