In feedback from game studios a common issue is the replacement of
loops with calls to memcpy/memset. These loops are often
hand-optimised, and highly-efficient and the developers strongly want
a way to control the compiler (i.e. leave my loop alone).
The culprit is of course the loop-idiom recognizer. This replaces any
loop that looks like a memset/memcpy with calls. This affects loops
with both a variable and a constant trip-count. The question is, does
this make sense in all cases? Also, should the compiler provide a way
to turn it off for certain types of loop, or on a loop individually?
The standard answer is to use -fno-builtin but this does not provide
fine-grain control (e.g. we may want the loop-idiom to recognise
constant loops but not variable loops).
As an example, it could be argued that replacing constant loops always
makes sense. Here the compiler knows how big the memset/memcpy is and
can make an accurate decision. For small values the memcpy/memset
will be expanded inline, while larger values will remain a call, but
due to the size the overhead will be negligible.
On the other hand, the compiler knows very little about variable loops
(the loop could be used primarily for copying 10 bytes or 10 Mbytes,
the compiler doesn't know). The compiler will replace it with a call,
but as it is variable it will not be expanded inline. In this case
small values may see significant overhead in comparison to the
original loop. The game studio examples all fall into this category.
The loop-idiom recognizer also has no notion of "quality" - it always
assumes that replacing the loop makes sense. While it might be the
case for a naive byte-copy, some of the examples we've seen have been
So, to summarise, we feel that there's sufficient justification to add
some sort of user-control. However, we do not want to suggest a
solution, but prefer to start a discussion, and obtain opinions. So
to start, how do people feel about:
- A switch to disable loop-idiom recognizer completely?
- A switch to disable loop-idiom recognizer for loops with variable trip count?
- A switch to disable loop-idiom recognizer for loops with constant
trip count (can't see this being much use)?
- Per-function control of loop-idiom recognizer (which must work with LTO)?
Thanks for any feedback!