alloca combining, not (yet) possible ?

Hello

since my broad RFC request didn't catch any responses, let me get a bit more into the nitty-gritty:

I tried to get llvm (3.7) to optimize superflous allocas away, but so far I haven't figured out how. Is there no optimizer for this ? Should I adorn something with some attributes ? As far as I can tell no, but I am no llvm expert...

For what I want to do, i will probably have a few dozen combinable allocas in my produced code each function. So it would be worthwhile to reduce stack space needs.

Ciao
    Nat!

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of Nat! via llvm-dev
Subject: [llvm-dev] alloca combining, not (yet) possible ?

I tried to get llvm (3.7) to optimize superflous allocas away, but
so far I haven't figured out how. Should I adorn something with
some attributes ?

Yes.

void g( void)
{
struct a_b x;
struct a_b y;

x.a = 1;
x.b = 2;
f( &x);

// x no longer needed
// expect y to reuse x space
y.a = 1;
y.b = 3;
f( &y);
}

You have not provided us with the declaration for f(). Unless its argument is marked with the nocapture attribute, the compilation of g() cannot assume that f() has not retained a pointer to the x struct and is using it in the second call.

- Chuck

Caldarale, Charles R schrieb:

You have not provided us with the declaration for f(). Unless its argument is marked with the nocapture attribute, the compilation of g() cannot assume that f() has not retained a pointer to the x struct and is using it in the second call.

thanks a lot for the input. Yes, I forgot to that. The C function declaration would have been

  void f( struct a_b *p);

which compiled into

  declare void @f(%struct.a_b*) #2

with

  attributes #2 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" "unsafe-fp-math"="false" "use-soft-float"="false" }

HI Nat,

LLVM currently only performs stack coloring to merge allocas if you
use lifetime intrinsics to tell it exactly where the lifetimes of the
alloca start and end. With your code, the lifetimes of both x and y
cover the entire function. Introducing a lexical scope to limit the
lifetime of x gives clang the necessary information to emit the
lifetime.end intrinsic, and declaring y after that scope makes it emit
the lifetime.start intrinsic appropriately as well.

struct a_b {
  long a;
  long b;
};

void f(struct a_b*);

void g(void)
{
  { // Lifetime of x starts here
    struct a_b x;

    x.a = 1;
    x.b = 2;
    f(&x);
  } // Lifetime of x ends here

  // Lifetime of y starts here
  struct a_b y;
  y.a = 1;
  y.b = 3;
  f(&y);
  // Lifetime of y ends here
}

It would be nice if LLVM could do this for non-escaping allocas
without the need for those intrinsics, but currently, this is the way
to go.

Cheers,
Björn

Björn Steinbrink schrieb:
> ...

   // Lifetime of y starts here
   struct a_b y;
   y.a = 1;
   y.b = 3;
   f(&y);
   // Lifetime of y ends here
}

Nice, thanks very much. This does the alloca combining (even without having to specify "nocapture"). Wrapping my clang output with lifetime calls shouldn't be a problem.

The code that does that optimization, is I assume:

  http://www.llvm.org/docs/doxygen/html/StackColoring_8cpp_source.html

I would like to take the alloca combining a step further still, which is the combining of allocas across functions, at least on tail calls.
My current idea would be to

* invent an attribute to mark my parameter. Lets say "reusealloca"

* at the beginning of the optimization pass, collect all parameters of type reusealloca and place them in the alloca map with lifetimes ending before the tail call (figure out how to find it)

Just to wrap this up from two previous threads.

I decided against doing alloca combining (with a passed in alloca) in llvm altogether. Instead I do some AST analysis in the codegen stage and output code in non-combined or combined form, depending on the result. This works fine and keeps all my stuff contained within clang.

Ciao
    Nat!

P.S. TailCallElim, pretty much only does call elimination as the header and the name suggest (and not the optimization) but it is an important analysis step, marking calls as tail calls. So I would have had to put by llvm optimization pass AFTER TailCallElim.