What can the optimizer assume about the memory a global function pointer points to?

A function declaration declares a function pointer to the memory where the machine code will be at runtime. Besides providing the ability to call the function, that pointer can also be used, after bitcasting it, to modify the machine code implementing the function. What does the optimizer assume about the memory containing the machine code?

The following is an example where Alive2 assumes that transforming @src to @tgt is correct (note that @f is marked readnone):

declare i32 @f() readnone
declare void @modify_f()

define i32 @src() {
   call void @modify_f()
   %r = call i32 @f()
   ret i32 %r

define i32 @tgt() {
   %r = call i32 @f()
   call void @modify_f()
   ret i32 %r

Is this actually a correct transformation?

If yes, what is the exact rule?

Several possibilities come to my mind:

* The memory at @f is assumed to be constant. If this is the case, how can it be communicated to the optimizer that the memory is modified?
* In the following part of the definition of the "readnone" attribute, "memory" includes the machine code of the callee: "On a function, this attribute indicates that the function computes its result (or decides to unwind an exception) based strictly on its arguments, without dereferencing any pointer arguments or otherwise accessing any mutable state (e.g. memory, control registers, etc) visible to caller functions.". However, then the following part would be inconsistent (if executing machine code is considered reading): "If a readnone function reads or writes memory visible to the program, or has other side-effects, the behavior is undefined.".

I am not sure if the IR spec that explicitly talks about this, but I’m under the impression that the code memory is assumed to be constant or abstracted out at the IR level and the IR optimizer does not need to think about the code getting modified or just treats it as undefined behavior.

One case where the Language Reference explicitly mentions the possibility of modifying machine code is Prologue Data: http://llvm.org/docs/LangRef.html#prologue-data

So at least the prologue part is seemingly not considered constant. The question of what ordering restrictions are placed between calls and code potentially modifying the callee machine code is still unclear, though.

Hi Manuel,

So I haven’t thought about this much but here some initial reactions:

I guess if you want to modify the code of a definition you need to mark it as naked or similar. We analyze the code after all, if it changes whatever we derived is pretty much wrong. However, you could also not reliably change the code if we can modify it so to make changes “sane” you need naked anyway.

I don’t think the declaration should not be __attribute__((pure)) ~ readnone if its code, can change. One could argue similar to the above case, e.g., it needs to be naked and therefore cannot be readnone (not that we have the restriction). Or one could say you break the implicit property of pure/readnone that is used all over the place: No matter when or where it is called the result is the same.

I agree that the LangRef is pretty light on this topic.