The builtins library provides optimized implementations of this
and other low-level routines, either in target-independent C form,
or as a heavily-optimized assembly.
Compile the following source with both GCC -m32 -O3 and clang -m32 -O3,
run both programs at least on an AMD Ryzen and on different Intel Core
processors, and compare the measured execution times ... then remove
every occurance of the word "optimized" on the above web page.
--- 64-bit.c ---