How does this affect freestanding implementations?
This transform is disabled with -fno-builtin or -ffreestanding.
Thinking a bit more about this, it looks like the way -ffreestanding is implemented in clang is too conservative. Currently -ffreestanding disables optimizing all builtin functions, just like -fno-builtin. However, the GCC manual says:
GCC requires the freestanding environment provide memcpy,memmove, memset and memcmp.
Should we enable those functions selectively in a freestanding environment? LLVM will start optimizing calls to them, and turn loops into one of those functions if possible. I'm not sure if that behavior is acceptable with -ffreestanding.
If GCC requires it, then pragmatically it's probably safe to do. Is
there any -ffreestanding code that LLVM compiles that wasn't already
being compiled with a GCC toolchain? My guess is that there isn't.
A relevant related question is what freestanding code does LLVM
currently compile in production? FreeBSD kernel and Darwin kernel come
to mind (and Linux is coming along). We also have a couple
microcontroller backends, but given the lack of maintenance (from a
quick look at git log), I'm not sure how much they are being used.
Maybe some embedded ARM and MIPS systems. AFAIK, all of these come
from a background of being built with a GCC toolchain.
I think it would be possible according to the gcc spec, but still a bad idea. We regularly get bug reports from people using freestanding and getting grumpy about getting calls to memcpy for struct copies...
I think that if someone is asking for freestanding we should just forget about any performance win for stuff like this and be as least surprising as possible.