Promoting malloc to alloca

I have a compiler which generates a lot of malloc instructions for
basically all data. I was rather hoping that these mallocs would be
converted to allocas if the pointers didn't escape. This would require
an escape analysis, and I'm not sure if LLVM has such a thing.

For instance, consider this code (typical of the output of my compiler):

define i32 @dontescape(i32 %x) {
entry:
%t = tail call i8* @malloc(i32 4)
%t1 = bitcast i8* %t to i32*
store i32 %x, i32* %t1
%t2 = load i32* %t1
ret i32 %t2
}

Running with opt --std-compile-opts, LLVM is smart enough to figure
out that the result is actually %x. So it generates:

define i32 @dontescape(i32 %x) nounwind {
entry:
%t = tail call i8* @malloc(i32 4) ; <i8*> [#uses=1]
%t1 = bitcast i8* %t to i32* ; <i32*> [#uses=1]
store i32 %x, i32* %t1
ret i32 %x
}

This saves a load, but I was rather hoping to eliminate the malloc and
store altogether. I was hoping it would realise that %t doesn't
escape, and generate this code:

define i32 @dontescape(i32 %x) nounwind {
entry:
%t1 = alloca i32, i32 1 ; <i32*> [#uses=1]
store i32 %x, i32* %t1
ret i32 %x
}

which would then get optimised to:

define i32 @dontescape(i32 %x) nounwind readnone {
entry:
ret i32 %x
}

and in general the allocas would go on to become registers due to
mem2reg, which would be wonderfully efficient.

So firstly, is there anything I'm doing wrong which is preventing this
from working? For example, I tried declaring malloc as "readnone" (is
this safe?). That would partly fix the problem (if you allocate memory
but don't return it, it can optimise away the malloc), except that it
isn't clever enough to realise the "store" instruction is redundant.
Again, I think this would require an escape analysis to fix. Maybe
there's an optimisation pass I'm missing here.

Secondly, if there is currently no way to magically make this work,
has this proposal been discussed before? Are there any known gotchas
which prevented it from being implemented?

I've found this file. I'm not sure if it's part of the LLVM
distribution, or a branch:
https://llvm.org/svn/llvm-project/llvm/tags/Apple/llvmCore-2092/lib/Analysis/EscapeAnalysis.cpp
It contains a comment: "Returns allow the return value to escape.
This is mostly important for malloc to alloca promotion."

I'm using LLVM 2.7.

I'm guessing that you're using or planning on using a garbage
collector. IIRC Evan Phoenix wanted something similar for Rubinius.
Through constant propagation, all uses of allocated memory would be
deleted, but the malloc would be left behind. I think he may have
implemented a custom pass that just eliminates mallocs with no uses
except as the target of stores.

I haven't heard anything about escape analysis in LLVM, though.

Reid

We don't have any such optimization.

Firstly, the pointer has to not make it into any function call at all, since any function might in turn call free(). Then we need to do escape analysis as you pointed out, but that's not difficult. We do similar analysis to determine pointer capture already.

Matt Giuca wrote:

I have a compiler which generates a lot of malloc instructions for
basically all data. I was rather hoping that these mallocs would be
converted to allocas if the pointers didn't escape. This would require
an escape analysis, and I'm not sure if LLVM has such a thing.

For instance, consider this code (typical of the output of my compiler):

define i32 @dontescape(i32 %x) {
entry:
   %t = tail call i8* @malloc(i32 4)
   %t1 = bitcast i8* %t to i32*
   store i32 %x, i32* %t1
   %t2 = load i32* %t1
   ret i32 %t2
}

When do you free it? Why not just use an alloca?

Running with opt --std-compile-opts, LLVM is smart enough to figure
out that the result is actually %x. So it generates:

define i32 @dontescape(i32 %x) nounwind {
entry:
   %t = tail call i8* @malloc(i32 4) ;<i8*> [#uses=1]
   %t1 = bitcast i8* %t to i32* ;<i32*> [#uses=1]
   store i32 %x, i32* %t1
   ret i32 %x
}

This saves a load, but I was rather hoping to eliminate the malloc and
store altogether. I was hoping it would realise that %t doesn't
escape, and generate this code:

define i32 @dontescape(i32 %x) nounwind {
entry:
   %t1 = alloca i32, i32 1 ;<i32*> [#uses=1]
   store i32 %x, i32* %t1
   ret i32 %x
}

LLVM doesn't realize that @malloc has no other side-effects. It's treating it like a function that might perform I/O.

which would then get optimised to:

define i32 @dontescape(i32 %x) nounwind readnone {
entry:
   ret i32 %x
}

and in general the allocas would go on to become registers due to
mem2reg, which would be wonderfully efficient.

So firstly, is there anything I'm doing wrong which is preventing this
from working? For example, I tried declaring malloc as "readnone" (is
this safe?).

No. If it's readnone then two calls with the same argument can be folded to one.

  That would partly fix the problem (if you allocate memory

but don't return it, it can optimise away the malloc), except that it
isn't clever enough to realise the "store" instruction is redundant.
Again, I think this would require an escape analysis to fix. Maybe
there's an optimisation pass I'm missing here.

Secondly, if there is currently no way to magically make this work,
has this proposal been discussed before? Are there any known gotchas
which prevented it from being implemented?

Doing this in general requires either that the free can't be found or that it really is right before the function exit, or else there might be a good reason the memory is on the heap. Then we have to make sure that we don't transform malloc calls inside loops (unless we also prove they're independent and Just Leaked).

How large a block of memory are you willing to move from heap to stack? What if it's a variable, how hard should we work to try to determine its upper bound (we probably can't with any reasonable amount of effort). And what type is the alloca? We could take the 4 from malloc(i32 4) and create an alloca i8, 4 and let instcombine try to sort it out.

Where should we look for malloc calls? Note that we only do stack to register transformation on alloca's in the first basic block if they're stacked together in a row as the very first instructions in the function.

I've found this file. I'm not sure if it's part of the LLVM
distribution, or a branch:
https://llvm.org/svn/llvm-project/llvm/tags/Apple/llvmCore-2092/lib/Analysis/EscapeAnalysis.cpp
It contains a comment: "Returns allow the return value to escape.
This is mostly important for malloc to alloca promotion."

That does exist in head, but strangely enough I can't find any implementation of it? It looks like it was removed when we straightened out value tracking + capture tracking.

OK thanks for the replies.

Yes, I was planning to use a garbage collector. This is for a
functional language, so there's no real way to determine when memory
needs to be freed without one.

Firstly, the pointer has to not make it into any function call at all, since any function might in turn call free(). Then we need to do escape analysis as you pointed out, but that's not difficult. We do similar analysis to determine pointer capture already.

OK. Is there any way to access the results of this pointer capture
analysis (eg. have LLVM tell me which pointers will escape and which
won't). That way I could manually change the mallocs into allocas.

Now I can see why LLVM doesn't do this -- it would be unsafe in
general. In my language (and many other high-level garbage-collected
languages which don't generate free() calls), it would be useful, but
LLVM itself can't guarantee that it's safe.

define i32 @dontescape(i32 %x) {
entry:
%t = tail call i8* @malloc(i32 4)
%t1 = bitcast i8* %t to i32*
store i32 %x, i32* %t1
%t2 = load i32* %t1
ret i32 %t2
}

When do you free it? Why not just use an alloca?

Well I can't use an alloca because the code is generated by a compiler
which just knows that it has to allocate memory, and expects it to be
freed by a garbage collector. I would like to generate an alloca
instead of a malloc in some situations, such as this, so the purpose
of this email was to work out whether I can get LLVM to promote it for
me, or whether I have to do the analysis myself.

LLVM doesn't realize that @malloc has no other side-effects. It's treating it like a function that might perform I/O.

Right, which is why I was trying to change its annotations to show
LLVM that @malloc doesn't have I/O side-effects, so it can be removed
if its result is unused.

So firstly, is there anything I'm doing wrong which is preventing this
from working? For example, I tried declaring malloc as "readnone" (is
this safe?).

No. If it's readnone then two calls with the same argument can be folded to one.

Good point. Is there any annotation which says LLVM can remove the
call if the result is unused (i.e., there are no side-effects), but it
can't combine two calls into one (i.e., the result might be different
each time). That seems like what "readonly" is for, but I thought I'd
check.

In other words, is it safe to annotate malloc as "readonly"?

Doing this in general requires either that the free can't be found or that it really is right before the function exit, or else there might be a good reason the memory is on the heap. Then we have to make sure that we don't transform malloc calls inside loops (unless we also prove they're independent and Just Leaked).

How large a block of memory are you willing to move from heap to stack? What if it's a variable, how hard should we work to try to determine its upper bound (we probably can't with any reasonable amount of effort). And what type is the alloca? We could take the 4 from malloc(i32 4) and create an alloca i8, 4 and let instcombine try to sort it out.

Where should we look for malloc calls? Note that we only do stack to register transformation on alloca's in the first basic block if they're stacked together in a row as the very first instructions in the function.

Thanks, a useful summary of the problems.

It sounds like this analysis is best implemented in a higher-level
compiler, before generating LLVM code.

Good point. Is there any annotation which says LLVM can remove the
call if the result is unused (i.e., there are no side-effects), but it
can't combine two calls into one (i.e., the result might be different
each time). That seems like what "readonly" is for, but I thought I'd
check.

In other words, is it safe to annotate malloc as "readonly"?

You can think of readonly as meaning "purely functional", which malloc
certainly isn't.

Doing this in general requires either that the free can't be found or that it really is right before the function exit, or else there might be a good reason the memory is on the heap. Then we have to make sure that we don't transform malloc calls inside loops (unless we also prove they're independent and Just Leaked).

How large a block of memory are you willing to move from heap to stack? What if it's a variable, how hard should we work to try to determine its upper bound (we probably can't with any reasonable amount of effort). And what type is the alloca? We could take the 4 from malloc(i32 4) and create an alloca i8, 4 and let instcombine try to sort it out.

Where should we look for malloc calls? Note that we only do stack to register transformation on alloca's in the first basic block if they're stacked together in a row as the very first instructions in the function.

Thanks, a useful summary of the problems.

It sounds like this analysis is best implemented in a higher-level
compiler, before generating LLVM code.

If you are programming in C++ and using the C++ API, then it would
probably be best to implement a custom LLVM pass that does this
transformation.

However, I'm going to go out on a limb and guess that since you're
writing a compiler for a functional language, you're programming in a
functional language and generating text LLVM assembly files to feed to
llc. In that case, then you're probably better off doing your own
escape analysis. :slight_smile:

Reid

Matt Giuca wrote:

OK thanks for the replies.

Yes, I was planning to use a garbage collector. This is for a
functional language, so there's no real way to determine when memory
needs to be freed without one.

Firstly, the pointer has to not make it into any function call at all, since any function might in turn call free(). Then we need to do escape analysis as you pointed out, but that's not difficult. We do similar analysis to determine pointer capture already.

OK. Is there any way to access the results of this pointer capture
analysis (eg. have LLVM tell me which pointers will escape and which
won't). That way I could manually change the mallocs into allocas.

That depends what you mean by "escape". The LLVM concept of 'capture' means, "did the pointer outlive the function call?" We use this in an interprocedural context, for example, you can call strlen() with a pointer and rest assured that no copies of the pointer were made.

While this is useless for malloc elimination in the general case (you care whether the callee might call free() not just whether it makes a copy) in your case with no explicit frees I think it's probably equivalent. The API is in include/llvm/Analysis/CaptureTracking.h, and link to libAnalysis.

Now I can see why LLVM doesn't do this -- it would be unsafe in
general. In my language (and many other high-level garbage-collected
languages which don't generate free() calls), it would be useful, but
LLVM itself can't guarantee that it's safe.

define i32 @dontescape(i32 %x) {
entry:
   %t = tail call i8* @malloc(i32 4)
   %t1 = bitcast i8* %t to i32*
   store i32 %x, i32* %t1
   %t2 = load i32* %t1
   ret i32 %t2
}

When do you free it? Why not just use an alloca?

Well I can't use an alloca because the code is generated by a compiler
which just knows that it has to allocate memory, and expects it to be
freed by a garbage collector. I would like to generate an alloca
instead of a malloc in some situations, such as this, so the purpose
of this email was to work out whether I can get LLVM to promote it for
me, or whether I have to do the analysis myself.

LLVM doesn't realize that @malloc has no other side-effects. It's treating it like a function that might perform I/O.

Right, which is why I was trying to change its annotations to show
LLVM that @malloc doesn't have I/O side-effects, so it can be removed
if its result is unused.

So firstly, is there anything I'm doing wrong which is preventing this
from working? For example, I tried declaring malloc as "readnone" (is
this safe?).

No. If it's readnone then two calls with the same argument can be folded to one.

Good point. Is there any annotation which says LLVM can remove the
call if the result is unused (i.e., there are no side-effects), but it
can't combine two calls into one (i.e., the result might be different
each time). That seems like what "readonly" is for, but I thought I'd
check.

In other words, is it safe to annotate malloc as "readonly"?

Again, no. Sorry! LLVM is smart enough to know that if it sees two readonly calls with the same arguments -- and there's nothing in between that could write to memory -- then they can once again be folded as the calls must return the same thing.

The attribute you're looking for, "delete if result is unused" doesn't exist in LLVM. I've considered it in the past, but the truth is that most of the time I want to eliminate dead malloc's, they *do* have a use: the matching free. At some point I expect I'm going to teach LLVM to remove dead malloc+free / new+delete / new+delete pairings, but I haven't gotten around to it yet. Teaching it to kill an allocation with no uses at all will be dead simple too.

Doing this in general requires either that the free can't be found or that it really is right before the function exit, or else there might be a good reason the memory is on the heap. Then we have to make sure that we don't transform malloc calls inside loops (unless we also prove they're independent and Just Leaked).

How large a block of memory are you willing to move from heap to stack? What if it's a variable, how hard should we work to try to determine its upper bound (we probably can't with any reasonable amount of effort). And what type is the alloca? We could take the 4 from malloc(i32 4) and create an alloca i8, 4 and let instcombine try to sort it out.

Where should we look for malloc calls? Note that we only do stack to register transformation on alloca's in the first basic block if they're stacked together in a row as the very first instructions in the function.

Thanks, a useful summary of the problems.

It sounds like this analysis is best implemented in a higher-level
compiler, before generating LLVM code.

You could try implementing it as an LLVM pass which is specific to your language. That way you would benefit from any effect where LLVM has managed to optimize away the uses of a malloc to let you remove the malloc later on.

Nick

Hi Nick,

The attribute you're looking for, "delete if result is unused" doesn't
exist in LLVM. I've considered it in the past, but the truth is that
most of the time I want to eliminate dead malloc's, they *do* have a
use: the matching free. At some point I expect I'm going to teach LLVM
to remove dead malloc+free / new+delete / new+delete pairings, but I
haven't gotten around to it yet. Teaching it to kill an allocation with
no uses at all will be dead simple too.

LLVM already removes malloc+(perhaps compare with null)+free.

Ciao,

Duncan.

Duncan Sands wrote:

Hi Nick,

The attribute you're looking for, "delete if result is unused" doesn't
exist in LLVM. I've considered it in the past, but the truth is that
most of the time I want to eliminate dead malloc's, they *do* have a
use: the matching free. At some point I expect I'm going to teach LLVM
to remove dead malloc+free / new+delete / new+delete pairings, but I
haven't gotten around to it yet. Teaching it to kill an allocation with
no uses at all will be dead simple too.

LLVM already removes malloc+(perhaps compare with null)+free.

What?

   declare noalias i8* @malloc() nounwind
   declare void @free(i8* nocapture) nounwind
   define i8 @test() {
     %A = call i8* @malloc()
     call void @free(i8* %A)
     ret i8 undef
   }

doesn't optimize at all. While we're at it, we also don't get:

   declare noalias i8* @malloc() nounwind
   define i8 @test() {
     %A = call i8* @malloc()
     %B = load i8* %A
     ret i8 %B
   }

either, though that's fixable by inserting llvm.lifetime.start right after the malloc.

Nick

Hi Nick,

The attribute you're looking for, "delete if result is unused" doesn't
exist in LLVM. I've considered it in the past, but the truth is that
most of the time I want to eliminate dead malloc's, they *do* have a
use: the matching free. At some point I expect I'm going to teach LLVM
to remove dead malloc+free / new+delete / new+delete pairings, but I
haven't gotten around to it yet. Teaching it to kill an allocation with
no uses at all will be dead simple too.

LLVM already removes malloc+(perhaps compare with null)+free.

What?

declare noalias i8* @malloc() nounwind
declare void @free(i8* nocapture) nounwind
define i8 @test() {
%A = call i8* @malloc()

^ malloc with wrong signature

call void @free(i8* %A)
ret i8 undef
}

doesn't optimize at all. While we're at it, we also don't get:

declare noalias i8* @malloc() nounwind
define i8 @test() {
%A = call i8* @malloc()
%B = load i8* %A
ret i8 %B
}

either, though that's fixable by inserting llvm.lifetime.start right
after the malloc.

Take a look at:

Instruction *InstCombiner::visitMalloc(Instruction &MI) {
   // If we have a malloc call which is only used in any amount of comparisons
   // to null and free calls, delete the calls and replace the comparisons with
   // true or false as appropriate.
   if (IsOnlyNullComparedAndFreed(MI)) {
...

Ciao,

Duncan.

Hi Nick,

The attribute you’re looking for, “delete if result is unused” doesn’t
exist in LLVM. I’ve considered it in the past, but the truth is that
most of the time I want to eliminate dead malloc’s, they do have a
use: the matching free. At some point I expect I’m going to teach LLVM
to remove dead malloc+free / new+delete / new+delete pairings, but I
haven’t gotten around to it yet. Teaching it to kill an allocation with
no uses at all will be dead simple too.

LLVM already removes malloc+(perhaps compare with null)+free.

What?

declare noalias i8* @malloc() nounwind
declare void @free(i8* nocapture) nounwind
define i8 @test() {
%A = call i8* @malloc()

^ malloc with wrong signature

Duh, thanks!

Then at some point, I should generalize this to handle C++ allocators too. :slight_smile:

Nick