Disastrous performance of 128/128-bit integer division

<https://compiler-rt.llvm.org/index.html> boasts:

The builtins library provides optimized implementations of this
and other low-level routines, either in target-independent C form,
or as a heavily-optimized assembly.

Compile the following source with both GCC -m64 -O3 and clang -m64 -O3,
run both programs at least on an AMD Ryzen and on different Intel Core
processors, compare the measured execution times ... then remove every
occurance of the word "optimized" on the above web page.

--- 128-bit.c ---