[clang] memcmp-esque functions generating different assembly on x64

Recently I found myself writing a fixed size version of

std::memcmp(,,) == 0;

Was surprised to find the assembly for each is in fact different. This seems like a recognizable pattern that could be picked up on as an optimization. There are different operator combinations which all accomplish the same goal, but all seem to be a 1 - 1 translation of the c++ code. Not benchmarked anything yet, but it seems like there may be a performance boost that may be here in general.

Also not an sse expert here, so I’d be curious to learn what would be the right way to do this. I’m of course assuming std::memcmp is already written with care.