repeated recursion with "frozen" arguments

Hi llvm-devels,

there is a very simple using case concerning LLVM and I am wondering whether it might be optimized at LLVM layer:

//-----------
int rec_func(int x, int y) {
  if(y<0) return x;
  return y + rec_func(x, y-1); // should we really push x?
}
void main() {
  rec_func(1, 1000);
  rec_func(2, 2000);
}
//-----------

Guys, don't focus on a stupid mathematics beyond this function :slight_smile:
Please focus your attention at the fact, that argument `x'
is once supplied at the top level calls in `main' and
never changed anymore from within `rec_func' function.

This example is just quite close to a "lexical closures" specifics.

The similar things we have when we pass `this' pointer
to non-static member functions in C++ (why should we
pass `this' pointer always if it was not changed?).

However, I guess in classical code generation approaches
argument `x' (like `this' pointer) is always pushed into
a stack with _every_ call.

But, it is possible to generate individual code implementations
for every top-level call of `rec_func' to avoid big number of
extra `push'.

In terms of C++: there should be a possibility to regenerate
non-static member functions for a given instantiated object
of the classes where performance is critical, e.g. for classes
like valarray.

So, my questions are:

Q1. should/might the optimization like this be resolved at LLVM layer?

if "yes" then

Q2. is LLVM capable of doing things like that today?

Thanks.

there is a very simple using case concerning LLVM and I am wondering
whether it might be optimized at LLVM layer:

//-----------
int rec_func(int x, int y) {
  if(y<0) return x;
  return y + rec_func(x, y-1); // should we really push x?
}

Probably not, at least not in the near future. At some point I have had
thoughts about implementing a tail recursion pass, but without prior
transformation, this function would not be a candidate.

Please focus your attention at the fact, that argument `x'
is once supplied at the top level calls in `main' and
never changed anymore from within `rec_func' function.
This example is just quite close to a "lexical closures" specifics.

The problem with "lexical closures", at least in this instance, is that
you have to pass a pointer to the closure around: if you only have one
parameter which can use it, you aren't saving anything.

The similar things we have when we pass `this' pointer
to non-static member functions in C++ (why should we
pass `this' pointer always if it was not changed?).

In C++, the 'this' object pointer can actually be thought of as a closure
in some sense already...

However, I guess in classical code generation approaches
argument `x' (like `this' pointer) is always pushed into
a stack with _every_ call.

Absolutely.

But, it is possible to generate individual code implementations
for every top-level call of `rec_func' to avoid big number of
extra `push'.

In terms of LLVM you could certainly do this. For example you could
translate it to use an explicit stack:

int rec_func(int x, int y) {
  if(y<0) return x;
  return y + rec_func(x, y-1); // should we really push x?
}

Or you could just perform an aggressive dead code elimination pass which
would realize that this function returns the same value for X no matter
what the value of Y is. There are other transformations that you can do
to work on this kind of thing, the functional language community has a lot
of ways to deal with this.

In terms of C++: there should be a possibility to regenerate
non-static member functions for a given instantiated object
of the classes where performance is critical, e.g. for classes
like valarray.

Sure, but you still need the 'this' pointer, to know WHICH valarray you
are talking about... ?

So, my questions are:
Q1. should/might the optimization like this be resolved at LLVM layer?

Yes, LLVM is designed for spiffy optimizations like this. :slight_smile:

if "yes" then
Q2. is LLVM capable of doing things like that today?

Nope, not yet. Contributions are always appreciated though. :slight_smile:

-Chris

Hello Chris,

Tuesday, August 26, 2003, 11:02:45 PM, you wrote:

there is a very simple using case concerning LLVM and I am wondering
whether it might be optimized at LLVM layer:

//-----------
int rec_func(int x, int y) {
  if(y<0) return x;
  return y + rec_func(x, y-1); // should we really push x?
}

Probably not, at least not in the near future. At some point I have had
thoughts about implementing a tail recursion pass, but without prior
transformation, this function would not be a candidate.

wait... if you break this C-example at the place as you did then it is
absolutely not what I have meant. Indeed, sense of my example comes
iff the top-level call is given, i.e. it might be optimized in
the context of concrete external call.

The problem with "lexical closures", at least in this instance, is
that you have to pass a pointer to the closure around: if you only
have one parameter which can use it, you aren't saving anything.

generally you are right. But only generally :slight_smile:
In particular, my example showed a feature of typical lexical closure.

The similar things we have when we pass `this' pointer
to non-static member functions in C++ (why should we
pass `this' pointer always if it was not changed?).

In C++, the 'this' object pointer can actually be thought of as a
closure in some sense already...

this is exactly what I have meant. If the C example I gave might be
optimized with LLVM engine, than `this' and lexical closures might be
boosted as well.

But, it is possible to generate individual code implementations
for every top-level call of `rec_func' to avoid big number of
extra `push'.

In terms of LLVM you could certainly do this. For example you could
translate it to use an explicit stack:

int rec_func(int x, int y) {
  if(y<0) return x;
  return y + rec_func(x, y-1); // should we really push x?
}

then I have to dig LLVM docs about explicit stack :slight_smile:

Or you could just perform an aggressive dead code elimination pass
which would realize that this function returns the same value for
X no matter what the value of Y is.

you misunderstood this function a bit, but it is not important, dead
code elimination was out of my scope.

In terms of C++: there should be a possibility to regenerate
non-static member functions for a given instantiated object
of the classes where performance is critical, e.g. for classes
like valarray.

Sure, but you still need the 'this' pointer, to know WHICH valarray you
are talking about... ?

sure. But what if this `this' was set once for 1mio calls?
Today I have a lot heavy math function working with two spectra and
every time I have to think twice before making another member function
which is invoked in deeper stack frames.

In other words, just imagine the situation, when for a given object
really a lot of *its* members function are called without changing
`this'. Today compilers can't eliminate this _extra_ `this'. And real
C++ code has tendency to be slower.

I'd say it is not just a problem of current compilers. It is problem
of ultimate static compilation. In cases when `this' comes in run-time
even a super-smart static compilers can't recompile a function with
"built-in" `this'.

So, my questions are:
Q1. should/might the optimization like this be resolved at LLVM layer?

Yes, LLVM is designed for spiffy optimizations like this. :slight_smile:

nicccce.

if "yes" then
Q2. is LLVM capable of doing things like that today?

Nope, not yet. Contributions are always appreciated though. :slight_smile:

now I have to prepare for sleeping, that's for sure. Probably this
nice patch from me doesn't come today :))

Have a nice day!

> Probably not, at least not in the near future. At some point I have had
> thoughts about implementing a tail recursion pass, but without prior
> transformation, this function would not be a candidate.

wait... if you break this C-example at the place as you did then it is
absolutely not what I have meant. Indeed, sense of my example comes
iff the top-level call is given, i.e. it might be optimized in
the context of concrete external call.

Yup, I think I completely missed your point. :slight_smile:

generally you are right. But only generally :slight_smile:
In particular, my example showed a feature of typical lexical closure.

Can you explain more about what you mean. I don't think I understand.

> In terms of LLVM you could certainly do this. For example you could
> translate it to use an explicit stack:

then I have to dig LLVM docs about explicit stack :slight_smile:

I just mean you have an explicit stack data structure to store just the
elements the recursive call needs.

> Sure, but you still need the 'this' pointer, to know WHICH valarray you
> are talking about... ?

sure. But what if this `this' was set once for 1mio calls?
Today I have a lot heavy math function working with two spectra and
every time I have to think twice before making another member function
which is invoked in deeper stack frames.

varray's do not have any recursive function calls, and the methods tend to
be simple. For that reason, they are typically all inlined, eliminating
the parameter passing completely...

In other words, just imagine the situation, when for a given object
really a lot of *its* members function are called without changing
`this'. Today compilers can't eliminate this _extra_ `this'. And real
C++ code has tendency to be slower.

I really am not sure I understand what transformation you are proposing.
Can you elaborate more? :slight_smile:

Thanks,

-Chris

The optimization you are describing is called procedure cloning, where the
body of a procedure is copied and then specialized for particular call
sites. It is usually driven by interprocedural constant propagation, and
often by profiling to find which cases are frequent enough to make it
worthwhile (here the recursive calls would make both cases -- x=1 and x=2 --
worthwhile).

LLVM would be very well suited to this though we don't have any of the above
capabilities yet.

--Vikram
http://www.cs.uiuc.edu/~vadve

Hi LLVM-devels

The optimization you are describing is called procedure cloning,
[...]
LLVM would be very well suited to this though we don't have any
of the above capabilities yet.

oh, anyway nice to here that it might come.
I am looking towards LLVM abilities like that, because I believe
they could really boost the family of languages for functional
programming. And I do believe that with a help of LLVM those
languages should bring better performance then classically
used imperative languages with single static compilation.

Hi LLVM-devels,

Yup, I think I completely missed your point. :slight_smile:

funny, that if even so, I got two similar and concrete
answers already:
"it is suitable for LLVM, though not implemented yet"
:slight_smile:

> generally you are right. But only generally :slight_smile:
> In particular, my example showed a feature of typical lexical closure.

Can you explain more about what you mean. I don't think I understand.

I will try. But then perhaps I should separte statements in
order to know where I was unclear.

S1. Example just shows simple phenomenon when every recursive
    (i.e. not from main) call of function `rec_func' does
    *not* change variable `x' anymore. It means, that every
    recursive call just makes extra/redundant copy of the
    `x'.

S2. Theoretically this redundant copy might be eliminated.
    The idea is simple: at first (i.e. non-recursive) call
    generate `rec_func' code which does not push `x' on
    the stack because it will remain the same from the
    first non-recursive call. For example initial
    non-recursive call `rec_func(1, 1000)' should be
    instantiated as:

    int rec_func(int y) {
      if(y<0) return 1;
      return y + rec_func(y-1);
    }

    so, this instantiation of function `rec_func' doesn't
    need to push `x' on stack anymore.

S3. This example is just a C-analogy of how pushing `this'
    onto stack might be eliminated. Just think of `x' as
    it would be `this', and think of `rec_func' as it
    would be a non-static member function. For example:

struct Math {

  Math(int _y) : y(_y) {}

  int rec_func() {
    if(y<0) return 42;
    y--;
    return y
           + rec_func(); // here `this' will be the same
                            // up to destruction of the object
  }

};

void main() {
  Math m1(1000), m2(2000);

  m1.rec_func(); // initial `this' is supplied here and will
                 // be the same for evry next call of rec_func
                 // from this object

  m2.rec_func(); // as above,
                 // initial `this' is supplied here and will
                 // remain the same for evry next call of
                 // rec_func from this object
}

S4. in optimization of C-example (see S2.) we just made a
    lexical closure in respet to variable `x'. (And by analogy,
    in C++ example we could consider similar optimization which
     in a sense is a lexical closure in respect to `this')

Now, Chris, please locate the point where I am unclear :slight_smile:

varray's do not have any recursive function calls, and the
methods tend to be simple. For that reason, they are
typically all inlined, eliminating
the parameter passing completely...

you are right.

However, my classes for spectra analysis I deal with should be as
fast as valarray is, but they may not be so oversimplified
as valarray... I have to break my fat non-static member functions
into smaller pieces and those simple pieces are called a lot
of times with the same `this'. So, the `valarray'-like classes
in real life are not always that nice.

I really am not sure I understand what transformation you are proposing.
Can you elaborate more? :slight_smile:

perhaps, I showed the point above. It looks like
Vikram S. Adve got my point already, and if he is near
you then probably he could reformulate my thoughts in
correct English just orally without typing :slight_smile:

Kind regards,

funny, that if even so, I got two similar and concrete
answers already:
"it is suitable for LLVM, though not implemented yet"
:slight_smile:

:slight_smile:

S2. Theoretically this redundant copy might be eliminated.
    The idea is simple: at first (i.e. non-recursive) call
    generate `rec_func' code which does not push `x' on
    the stack because it will remain the same from the
    first non-recursive call. For example initial
    non-recursive call `rec_func(1, 1000)' should be
    instantiated as:

Ok, Vikram is right. This is known as procedure cloning, where the
compiler makes a specialized version of the function for an invocation
with constant arguments (thus making it so you don't have to pass the
arguments).

S3. This example is just a C-analogy of how pushing `this'
    onto stack might be eliminated. Just think of `x' as
    it would be `this', and think of `rec_func' as it
    would be a non-static member function. For example:

Ok, I see why I was confused here. The problem is that a particular
"this" is not, in general, a compile time constant: if the object is
allocated dynamically on the stack or heap, the static compiler doesn't
know which value to specialize with. Obviously a dynamic compiler could
though...

S4. in optimization of C-example (see S2.) we just made a
    lexical closure in respet to variable `x'. (And by analogy,
    in C++ example we could consider similar optimization which
     in a sense is a lexical closure in respect to `this')

Ok, I understand now. When I think of lexical closures, I think if a
piece of memory which holds the variables used by a block of code, coupled
with a generic version of the code. Converting a "this" pointer to use
this style of closure means that you have to add a closure pointer :slight_smile:

So, in sum, yes, LLVM is a good representation to do this kind of thing
on. :slight_smile:

-Chris

Hello Chris,

Wednesday, August 27, 2003, 4:27:21 PM, you wrote:

Ok, Vikram is right. This is known as procedure cloning, where the
compiler makes a specialized version of the function for an invocation
with constant arguments (thus making it so you don't have to pass the
arguments).

just one remark: it would be better to say "known arguments" instead
of "constant arguments" :slight_smile:

Ok, I see why I was confused here. The problem is that a particular
"this" is not, in general, a compile time constant: if the object is
allocated dynamically on the stack or heap, the static compiler doesn't
know which value to specialize with. Obviously a dynamic compiler could
though...

exactly :slight_smile:

Ok, I understand now. When I think of lexical closures, I think if a
piece of memory which holds the variables used by a block of code, coupled
with a generic version of the code. Converting a "this" pointer to use
this style of closure means that you have to add a closure pointer :slight_smile:

:slight_smile:

So, in sum, yes, LLVM is a good representation to do this kind of thing
on. :slight_smile:

is LLVM a kind of paradise promises?..
I should definitely check this out :slight_smile: