Improve the performance of JamCRC

Lldb relies heavily on crc when loading shared libraries. The existing implementation is quite slow as it computes a byte at a time, creating a long dependency chain.

Unfortunately the polynomial is not the same as the one implemented by x86 processors in SSE 4.2, but there’s another way to make it faster by using more lookup tables.

Zlib implements this, but rather than require zlib, I instead added the relevant code to compute four bytes at a time in parallel.

A separate patch changes lldb to rely on JamCRC instead of its own implementation. This patch improves the performance, which brings my test (starting lldb, breaking at main) from 47 seconds down to 36 seconds.

jamcrc.patch (16.4 KB)

Sorry, that last patch didn’t handle endianness very well. Here’s an updated patch that uses llvm::support::endian. I assume unaligned input, which is safer. I have no idea whether one can expect aligned input to this function. It also wouldn’t take much to process the first <=3 bytes one at a time, then blast through assuming aligned reads, and then finish up with another <=3 bytes. Let me know if you prefer that.

jamcrc.patch (16.4 KB)

Hi Scott,

Usually patches are sent to llvm-commits (unless I missed a specific reason to send this patch to llvm-dev instead), see: http://llvm.org/docs/DeveloperPolicy.html#making-and-submitting-a-patch
(we also have a phabricator instance: http://llvm.org/docs/Phabricator.html

Best,

Ooops, I have no idea how I read that (and the preceding section) and determined I should send it to the *-dev list instead. Sorry!