copy instructions

This is a simple SSA code generation 101 question.

If I follow the IR code generation techniques in the Dragon book the
statement
  x = y + z
would translate into something like this in SSA/LLVM
  %0 = add %y, %z
  %x = %0
Obviously "copy instructions" like %foo = %bar are senseless in SSA
since %foo and %bar are immutably fixed to the same value and there
is no need for two aliases for the same thing (this is my own observation,
please tell me if my thinking is off).

What are the general code generation techniques to avoid "copy instructions"?

For example, the simple code generation methods that yield the translation
above might look like the following:

Value *AddExpression::codeGen() {
  Value *l = left->codeGen();
  Value *r = right->codeGen();
  Value *result = new TempValue; // get unique temporary
  emit(result->str() + " add " + l->str() + ", " r-str());
  return result;
}

Value *assignExpression::codeGen() {
  Value *rval = rvalue->codeGen();
  Value *lval = new NameValue(ident);
  emit(lval->str() + " = " + rval->str()); // emit (silly) copy instruction
  return lval;
}

What I have suggested to my students is to omit the (non-existent) copy instruction
and use the "rval" above as a replacement for all future occurrences of "ident."
i.e., something like the following:

Value *assignExpression::codeGen() {
  Value *rval = rvalue->codeGen();
  update symbol table so that all future reference to "ident" are replaced with rval
  return rval;
}

Using this scheme, the following
  x = y + z
  u = x * y + foo(x)
would be translated into
  %0 = add %y, %z
  %1 = mul %0, %y
  %2 = call foo(%0)
  %3 = add %1, %2

Is there a more obvious approach to avoiding "copy instructions"?

--w

Wayne O. Cochran
Clinical Assistant Professor, Computer Science
Washington State University Vancouver
wcochran@vancouver.wsu.edu
http://ezekiel.vancouver.wsu.edu/~wayne

The recommended approach to generating LLVM IR is simply not to try to
generate code in SSA form; see
http://llvm.org/docs/tutorial/LangImpl7.html#memory .

-Eli

Wayne,

The short answer is “copy instruction” (of which LLVM has none, outside of ADDs with 0, ORs with 0, etc.) never show up in SSA forms. You would have to specifically force your compiler to generate such instructions. I’ve never read the dragon book, but I expect such instruction sequences are there only for instructional purposes, not as anything a real compiler would ever generate. The only place copy instructions should be inserted is in lowering PHI instructions - which is often heavily coupled with register operations.

It looks as if you are giving exactly the right advice, as far as replacing the entry in the symbol table. As far as I know, this is not just the most obvious way to do it, but the only way when generating SSA form. The only time an assignment expression should involve generating a copy (move) is if you are directly emitting a linear IR (or machine code). Java and .NET both do this.

Hope this helps.

-Joshua

It is my understanding, the alloca memory routines are used
for forcing variables to be allocated on the stack frame – which
you would want for source level debugging.
When SSA registers are used, LLVM will decide what goes into
registers and what will spill over to the stack frame.
I want the latter.
–w

Wayne O. Cochran
Assistant Professor Computer Science
wcochran@vancouver.wsu.edu

The mem2reg pass will rewrite allocas and loads and stores to SSA
virtual registers. Essentially it's a transformation from non-SSA to
SSA form. That said, I don't know if you want your students to
implement their own SSA transformation.

Reid

That’s cool. The students grok the SSA thing fine – especially since all the var’s
in the language we’re translating from are immutable (functional style) which makes SSA fairly
straight forward; the phi instruction is needed in a few places where branching occurs.

Wayne O. Cochran
Assistant Professor Computer Science
wcochran@vancouver.wsu.edu