why is llvm.stacksave() necessary?

Hi all,

the language reference for the alloca instruction states:
The ‘alloca‘ instruction allocates memory on the stack frame of the
currently executing function, to be automatically released when this
function returns to its caller.

when using come c code like
void myfunc(void){
int i=4;
double d[i];
}

the ir shows enclosing llvm.stackSave & restore constructs, enclosing
the alloca related to 'd'.

thellvm.stacksafe reference explains:
This intrinsic is used to remember the current state of the function
stack...
In practice, this pops any alloca blocks from the stack that were
allocated after the llvm.stacksave was executed.

shouldn't the alloca block related to 'd' be automatically released when
the function returns to its caller?
isn't the 'stack frame of the currently executing function'=='the
function stack'?

Can I programmatically decide when to insert calls to stacksafe/restore?
Suppose I would have a pass which sometimes would introduce such dynamic
alloca's - do I need to explicitely track this and insert
stacksave/restore? Or could I check for dynamic alloca (one ore several)
afterwards, and then introduce stacksave/restore once (enclosing the
complete function code) after the completed pass?

Thx
Alex

Clang does this because clang intentionally generates IR naïvely and relies on LLVM optimisation passes to clean it up.

In the case of C VLAs, the size of the alloca changes whenever the declaration goes into scope. By emitting a stack save and restore for d, clang doesn’t need to have different IR generating code for your example and for this one:

void myfunc(void
{
  for (int i=0 ; i<4; i++)
  {
    double d[i];
    doSomethingWith(d);
  }
}

In this case, each time that the loop is entered you will get a new d with a different size. Clang emits code that handles this case, and an early optimisation pass will remove the stack save and stack restore intrinsics if they’re not actually needed.

David

Clang does this because clang intentionally generates IR naïvely and relies on LLVM optimisation passes to clean it up.

In the case of C VLAs, the size of the alloca changes whenever the declaration goes into scope. By emitting a stack save and restore for d, clang doesn’t need to have different IR generating code for your example and for this one:

void myfunc(void
{
  for (int i=0 ; i<4; i++)
  {
    double d[i];
    doSomethingWith(d);
  }
}

In this case, each time that the loop is entered you will get a new d with a different size. Clang emits code that handles this case, and an early optimisation pass will remove the stack save and stack restore intrinsics if they’re not actually needed.

David

Hi David,
good example, thank you.
This means than, when generating VLA's, one would need to check for the
'minimal' part which needs to be enclosed by safe/remove (in this
example the related calls need to enclose the inner part of the loop).

what would happen if stack safe/remove would be neglected?
at the end of the function everything should get cleaned-up - right?;
the only problem could be that one consumes much more stack than
necessary - not?

Thx
Alex

Hi David,
good example, thank you.
This means than, when generating VLA's, one would need to check for the
'minimal' part which needs to be enclosed by safe/remove (in this
example the related calls need to enclose the inner part of the loop).

In C, the ‘minimal’ part is called the ‘scope’. Variables are always destroyed in the inverse order to the order in which they are created and so it’s always trivial to translate each into a stack save followed by an alloca when the variable comes into scope and a stack restore when the variable goes out of scope.

what would happen if stack safe/remove would be neglected?
at the end of the function everything should get cleaned-up - right?;
the only problem could be that one consumes much more stack than
necessary - not?

Exactly. Each alloca would allocate more memory. gcc used to have a bug that did this, which bit me in the past (about 15 years ago), where using a VLA in the form from my example would consume stack space in proportion to the sum of all values of i, rather than in proportion to the maximum value of i.

David

In C, the ‘minimal’ part is called the ‘scope’. Variables are always destroyed in the inverse order to the order in which they are created and so it’s always trivial to translate each into a stack save followed by an alloca when the variable comes into scope and a stack restore when the variable goes out of scope.

what would happen if stack safe/remove would be neglected?
at the end of the function everything should get cleaned-up - right?;
the only problem could be that one consumes much more stack than
necessary - not?

Exactly. Each alloca would allocate more memory. gcc used to have a bug that did this, which bit me in the past (about 15 years ago), where using a VLA in the form from my example would consume stack space in proportion to the sum of all values of i, rather than in proportion to the maximum value of i.

David

The 'scope'in should be related to the block in IR - not?
This would then be easy: just insert a stackSafe right infront of the
alloca (as you suggested) and a restore at the end of the block...

Alex

In C, the ‘minimal’ part is called the ‘scope’. Variables are always destroyed in the inverse order to the order in which they are created and so it’s always trivial to translate each into a stack save followed by an alloca when the variable comes into scope and a stack restore when the variable goes out of scope.

what would happen if stack safe/remove would be neglected?
at the end of the function everything should get cleaned-up - right?;
the only problem could be that one consumes much more stack than
necessary - not?

Exactly. Each alloca would allocate more memory. gcc used to have a bug that did this, which bit me in the past (about 15 years ago), where using a VLA in the form from my example would consume stack space in proportion to the sum of all values of i, rather than in proportion to the maximum value of i.

David

The 'scope'in should be related to the block in IR - not?

No. LLVM IR has no notion of nested scopes (global and function are the only two scopes that exist), only of liveness. A value is considered live.

This would then be easy: just insert a stackSafe right infront of the
alloca (as you suggested) and a restore at the end of the block…

Unless your source language permits flow control, in which case you must insert the stack restore in the block that unifies flow control at the end of the scope. Consider this example:

void myfunc(void)
{
  for (int i=0 ; i<4; i++)
  {
    double d[i];
    if (i % 2)
      doSomethingWith(d);
    else
      doSomethingElseWith(d);
  }
}

The IR that clang generates for this is quite hard to read, but the key point is that the stack save and stack restore are in different basic blocks.

David

Unless your source language permits flow control, in which case you must insert the stack restore in the block that unifies flow control at the end of the scope. Consider this example:

void myfunc(void)
{
  for (int i=0 ; i<4; i++)
  {
    double d[i];
    if (i % 2)
      doSomethingWith(d);
    else
      doSomethingElseWith(d);
  }
}

The IR that clang generates for this is quite hard to read, but the key point is that the stack save and stack restore are in different basic blocks.

David

Another good example, thanks.
I had a look at the ll and found that the restore is in the if.end block.
I think it could also go into the for.inc block (just before jumping
back to cond).

Is there any way (from AST in C++) to easily identify 'the block that
unifies flow control at the end of the scope'? does it have any special
attribute one could search for?

Alex

Another good example, thanks.
I had a look at the ll and found that the restore is in the if.end block.
I think it could also go into the for.inc block (just before jumping
back to cond).

Either would work, but the part of clang that generates the IR for the if statement doesn’t know that the variable goes out of scope after the if statement. The variable goes out of scope at the end of the loop and so that’s where the stack restore gets generated. Various optimisation passes may move it later (if you give the loop a small fixed trip count, you’ll see that at -O2 it gets unrolled and a single alloca used for all iterations).

Is there any way (from AST in C++) to easily identify 'the block that
unifies flow control at the end of the scope'? does it have any special
attribute one could search for?

The C++ AST directly maps to scopes, because the structure of C/C++ source code is a tree[1]. Clang’s CodeGen (really IRGen) layer generates new basic blocks for each branch in a divergent flow control situation and generates code for them independently. On exit from generating the divergent flow path, it generates a new block to unify the flow. There isn’t a direct correspondence between IR blocks and AST nodes (for an example of why there can’t be, consider exceptions and C++ destructors).

David

[1] This is common across the vast majority of programming languages and unfortunately is one of the things that makes programming difficult to learn for most people. Only about 10% of people naturally think in terms of this kind of hierarchical structure (which is also why programmers tend to hate iTunes).