Assignment of large objects, optimization?

Hi,

My fronted generates (bad) code, which I see that LLVM is unable to optimize. For example, code similar to:

%a = type [32 x i16]

declare void @set_obj(%a*)
declare void @use_obj(%a*)

define void @foo() {
entry:
%a1 = alloca %a
%a2 = alloca %a
call void @set_obj(%a* %a2)
%a3 = load %a* %a2
store %a %a3, %a* %a1
call void @use_obj(%a* %a1)
ret void
}

(Or with load/store replaced with memcpy).

In C pseudo-code this is similar to:

a a1;
a a2 = set_obj();
a1 = a2;
use_obj(a1);

and the corresponding LLVM IR in foo() can be simplified to:

%a1 = alloca %a
call void @set_obj(%a* %a1)
call void @use_obj(%a* %a1)

Is it unreasonable to expect LLVM to do this kind of simplifications?

On a side note: Why isn’t there an assignment operator in the LLVM IR? Other compilers I have seen have some kind of assignment operator in the IR.

/Patrik Hägglund

Hi Patrik,

My fronted generates (bad) code, which I see that LLVM is unable to optimize.
For example, code similar to:
%a = type [32 x i16]
declare void @set_obj(%a*)
declare void @use_obj(%a*)
define void @foo() {
entry:
%a1 = alloca %a
%a2 = alloca %a
call void @set_obj(%a* %a2)
%a3 = load %a* %a2
store %a %a3, %a* %a1
call void @use_obj(%a* %a1)
ret void
}
(Or with load/store replaced with memcpy).
In C pseudo-code this is similar to:
a a1;
a a2 = set_obj();
a1 = a2;
use_obj(a1);
and the corresponding LLVM IR in foo() can be simplified to:
%a1 = alloca %a
call void @set_obj(%a* %a1)
call void @use_obj(%a* %a1)

no it can't. That's because set_obj may have remembered the address passed to
it, for example by storing it in a global variable. Then use_obj might compare
the address passed to it with the address that set_obj stashes away, and make
decisions based on whether they compare equal or not.

Is it unreasonable to expect LLVM to do this kind of simplifications?

Try adding the nocapture attribute to the argument of set_obj.

On a side note: Why isn't there an assignment operator in the LLVM IR? Other
compilers I have seen have some kind of assignment operator in the IR.

That's because LLVM IR is always in SSA form. SSA form makes assignments
pointless. For example, suppose you could write
   %x := %y
(assignment). Thanks to SSA form, you know that %x can only get a value
once, and thus %y is that value: %x is equal to %y throughout the function.
But then what's the point of %x? You might as well just use %y wherever
you see %x.

Ciao, Duncan.

Hi Duncan,

Try adding the nocapture attribute to the argument of set_obj.

Thanks! My fault. However, that don't seems to make any difference in this example. Adding nocapture in use_obj as well does the trick, but I don't think that can be applied to the code from my front-end. (And it don't seems to work when replacing load+store with memcpy).

Here is corresponding C code:

typedef struct obj {
  unsigned arr[32];
} obj_t;

void use_obj(obj_t *a1);
obj_t set_obj(void);

void foo() {
  obj_t a1, a2 = set_obj();
  a1 = a2;
  use_obj(&a1);
}

(Both clang-trunk and gcc-4.6.2 retain the a1 = a2 copying. But one of our other compilers, partly developed in-house, seems to remove the copying.)

SSA form makes assignments pointless.

At the LLVM assembler level (the interface for the front-end), redundancies are sometimes helpful, and therefore not completely pointless. For example, being able to do such things as %x = add %y, 0 (which implies %x := %y), may be convenient.

However, in this case, I mostly thought of memory objects, i.e. *a := *b instead of using memcpy (or load+store). (For example, byval parameters seems to be constructed by targets using memcpy nodes.)

I found another example (simple, but contrived), using Clang, where the reasoning about memory copies seems suboptimal:

typedef struct obj {
  unsigned arr[32];
} obj_t;

obj_t a;

obj_t bar(void) {
  obj_t b = a, c = b;
  return b; // ignoring c!
}

Using clang -c -Os (on x86-64) I get far from space optimized code (due to memcpy replaced with load+store).

/Patrik Hägglund