We have a bunch of code that determines the OS version it’s running on and makes decisions accordingly. We’re able to place a lower bound on that version at compile time, and I’m using __builtin_assume
to propagate that information to LLVM so it can optimize accordingly. Compiler Explorer is a simplified example, and Clang is able to use the assumption to determine that bar
can never call foo
and eliminate the branch, which is great.
However, we still leave the function call to getOSVersionInternal
in place even when its result is unused, which is unnecessary and can also throw off inlining thresholds at -Oz
in more complex cases. This call will have no side effects and will return the same value throughout the execution of the program, and marking it as [[gnu::pure]]
does allow calls to it to be eliminated, as I want: Compiler Explorer.
The problem is that getOSVersionInternal
calls an OS function to get the version that isn’t marked pure
, and it also caches the return value internally. Compiler Explorer is a possible implementation, for reference. The gcc documentation for pure says:
Calls to functions that have no observable effects on the state of the program other than to return a value may lend themselves to optimizations such as common subexpression elimination. Declaring such functions with the
pure
attribute allows GCC to avoid emitting some calls in repeated invocations of the function with the same argument values.The
pure
attribute prohibits a function from modifying the state of the program that is observable by means other than inspecting the function’s return value. However, functions declared with thepure
attribute can safely read any non-volatile objects, and modify the value of objects in a way that does not affect their return value or the observable state of the program.
It’s a bit vague, but I think my function meets this criteria, specifically the “modify the value of objects in a way that does not affect their return value or the observable state of the program”, since the internal cached version isn’t observable state IMO.
Unfortunately, Clang appears to have a stricter definition of pure
, because it translates it to memory(read)
: see attributes #1
in Compiler Explorer. According to the LangRef:
memory(read)
: May read (but not write) any memory.
Which pretty clearly rules out what I’m doing. My question is, how badly could things go if I marked my function as pure anyway? It’ll be compiled separately, and I can prevent the definition from getting LTO’d with anything else if need be. The definition itself doesn’t seem to be miscompiled when it’s marked pure (Compiler Explorer), but I can also have the annotation not visible to the definition if need be. Is there anything else I should be worried about in terms of potential misoptimizations if I went this route?
I found Deterministic function return attribute, which is pretty related, in particular:
Though one question interests me: what attributes can be given to a
lazy-init singleton or memoized function (which do access memory, but
does not change output and has no visible side-effects)?Short answer: None (right now).
Is that still the case, or do we have a better way to express what I want in LLVM now (and is it exposed through Clang)?