Various targets have specialized crypto instructions for elliptic curve cryptography, and polynomial hash functions like CRC and GHASH. For instance, see x86/crypto in the kernel, or see the upcoming RISC-V vector crypto extension. There are innumerable implementations of crypto algorithms in a source language that is not assembly: this is an opportunity for LLVM to recognize these naively-written algorithms, and replace them with a more optimal version. An overwhelming majority of the work to be done to optimize these algorithms is to recognize them in middle-end IR: the transform component is relatively trivial.
To this effect, I propose a new analysis to recognize polynomial hashes. Most of the explanation can be found in a header comment, so I’ll paste it here:
//===----------------------------------------------------------------------===//
//
// The HashRecognize analysis recognizes unoptimized polynomial hash functions
// with operations over a Galois field of characteristic 2, also called binary
// fields, or GF(2^n): this class of hash functions can be optimized using a
// lookup-table-driven implementation, or with target-specific instructions.
// Examples:
//
// 1. Cyclic redundancy check (CRC), which is a polynomial division in GF(2).
// 2. Rabin fingerprint, a component of the Rabin-Karp algorithm, which is a
// rolling hash polynomial division in GF(2).
// 3. Rijndael MixColumns, a step in AES computation, which is a polynomial
// multiplication in GF(2^3).
// 4. GHASH, the authentication mechanism in AES Galois/Counter Mode (GCM),
// which is a polynomial evaluation in GF(2^128).
//
// All of them use an irreducible generating polynomial of degree m,
//
// c_m * x^m + c_(m-1) * x^(m-1) + ... + c_0 * x^0
//
// where each coefficient c is can take values in GF(2^n), where 2^n is termed
// the order of the Galois field. For GF(2), each coefficient can take values
// either 0 or 1, and the polynomial is simply represented by m+1 bits,
// corresponding to the coefficients. The different variants of CRC are named by
// degree of generating polynomial used: so CRC-32 would use a polynomial of
// degree 32.
//
// The reason algorithms on GF(2^n) can be optimized with a lookup-table is the
// following: in such fields, polynomial addition and subtraction are identical
// and equivalent to XOR, polynomial multiplication is an AND, and polynomial
// division is identity: the XOR and AND operations in unoptimized
// implmentations are performed bit-wise, and can be optimized to be performed
// chunk-wise, by interleaving copies of the generating polynomial, and storing
// the pre-computed values in a table.
//
// A generating polynomial of m bits always has the MSB set, so we usually
// omit it. An example of a 16-bit polynomial is the CRC-16-CCITT polynomial:
//
// (x^16) + x^12 + x^5 + 1 = (1) 0001 0000 0010 0001 = 0x1021
//
// Transmissions are either in big-endian or little-endian form, and hash
// algorithms are written according to this. For example, IEEE 802 and RS-232
// specify little-endian transmission.
//
//===----------------------------------------------------------------------===//
Now, cryptographic functions are usually operations in a Galois field of prime characteristic, but we limit ourselves to characteristic 2 for an initial narrowing of the problem space, and because these algorithms can be optimized with a table-lookup in the fallback case.
The analysis I propose currently recognizes the cyclic redundancy check (CRC) algorithm, and can be found at:
The canonical reference for CRC can be found in the kernel.
The implementation rests on a lot of existing infrastructure, including KnownBits, ConstantRange, ValueTracking, and ScalarEvolution, and roughly works by computing KnownBits over the iterations of the loop, and checking that the result is zero.
If the rationale is sound, we can begin reviewing the patch, and land it without any risk, since it is purely an analysis. The next steps would be to thoroughly test the analysis, and add a bit of code in LoopIdiomRecognize to use the results and optimize CRC algorithms so we can check it against llvm-opt-benchmark.
Footnote: GCC got a similar analysis last year, and optimizes CRC as of GCC 15.