Maybe OT -- what is this compiler optimization formally known as?

This is actually not a LLVM issue, but more an issue of my higher language,that I’m trying to target LLVM on. I’m seeing that if I do a naive translation of my language, it generates additional load stores.

For example (in my high level language):

var1 = expression1;
var2 = expression2;
var3 = call(var1, var2);
var4 = expression3;
var5 = expression4;
var6 = call(var4, var5);
var7 = call(var6);
var8 = call(var3, var7);

This can be effectively represented in my high level language as

var8 = call(call(expression1, expression2), call(call(expression3, expression4));

When I think about it naively. I would write code that would start from var8 and traverse and try to substitute all “vars” with expressions.

I can imagine not always wanting to do this, for example if the expression is duplicated, maybe I do want to pay for the load/store. And maybe sometimes the semantics are such that I need to reuse it.

Is there a formal algorithm or classical technique that describes this?


Real CPUs don’t have the ability to directly calculate expressions such as your nested set of calls. If you write it that way, the internals of the compiler, and the eventual machine code, will be much more like the first version.

But (unless you have “volatile”, or -O0) there are no loads or stores there. On any reasonable modern machine the values will all be in registers. Only var3 will typically even need any special handling to save it for later, but that will be in a register.

e.g. on Thumb, and assuming this is the entire function, you want to return var8 as the result, and the vars can be stored in integer registers, and the expressions are not too complex:

push r4,lr
mov r0,expression1 // more likely an add, sub, etc with r0 as the destination
mov r1,expression2
bl call // leaves result in r0
mov r4,r0
mov r0,expression3
mov r1,expression4
bl call
bl call
mov r1,r0
mov r0,r4
bl call
pop r4,pc

The only instructions here which do loads or stores to memory are the push and the pop. Even the calls don’t touch memory (unless those functions are not self-contained and need to call other functions).

Note that in Thumb and ARM code any function is free to do anything it wants with r0,r1,r2,r3 (“caller save” or “volatile”). They do not need to be saved, and you can’t expect anything you leave there to still be there after you call another function. r4,r5,r6,r7 (“callee save”) you can assume will still have their original values after you call a function, and if you use them then you have to save and restore the value you found there,

This is typical of RISC and newer CPU conventions. PowerPC, MIPS, SPARC and even x86_64 are essentially the same, other than some not using a “link register” style of function call.