"Optimized implementations"?

<https://compiler-rt.llvm.org/index.html> boasts:

The builtins library provides optimized implementations of this
and other low-level routines, either in target-independent C form,
or as a heavily-optimized assembly.

Really?

Left: inperformant code shipped in # Right: proper code, just one or
      clang_rt.builtins-* # two bits faster and shorter

___paritysi2:
        mov eax, [esp+4] # mov ax, [esp+4]
        mov ecx, eax #
        shr ecx, 16 #
        xor ecx, eax # xor ax, [esp+6]
        mov eax, ecx #
        shr eax, 8 #
        xor eax, ecx # xor al, ah
        mov ecx, eax #
        shr ecx, 4 #
        xor ecx, eax #
        mov eax, 0x6996 #
        and cl, 15 #
        shr eax, cl # setnp al
        and eax, 1 # movzx eax, al
        ret # ret

___paritydi2:
        mov eax, [esp+8] # mov ax, [esp+4]
        xor eax, [esp+4] # xor ax, [esp+6]
        push eax # xor ax, [esp+8]
        call ___paritysi2 # xor ax, [esp+10]
        add esp, 4 # xor al, ah
                                      # setnp al
                                      # movzx eax, al
        ret # ret

The proper code needs 14 instead of 21 instructions in 48 instead of 57
bytes for both functions together, more than halving the instructions
executed per function call!

AGAIN:
Remove every occurance of the word "optimized" on the above web page.

'nuff said
Stefan