allocating an array

How can I allocate an array with a size that is not known at compile time?

The language reference says that the array size must be a constant
integer value. It also says that variable sized arrays are represented
by using zero as the number of elements. Obviously I cannot use zero
in the array type when it is allocated. llvm-gcc seems to use a
pointer type instead of an array type, so compiling a simple C program
that uses an array with a runtime size and looking at the resulting
bitcode didn't provide an answer to my question.

Regards,
Kevin André

Call malloc()?

--Owen

Do you mean a variable length array (as in C99) or something else?
LLVM will convert VLAs into an "alloca":

int foo(int X) {
  int a;
...
}

define i32 @foo(i32 %X) nounwind {
entry:
  %tmp23 = alloca i32, i32 %X ; <i32*> [#uses=2]
...
}

If you are doing the zero-sized array hack, then you'll have to call
malloc on it anyway. Something like:

struct a {
  char c;
  int d[0];
};

And then in your program do something like this:

void bar(int z) {
  struct a *fnord = malloc(sizeof(struct a) + z * sizeof(int));
...
}

LLVM will generate the obvious code for that.

-bw

I guess I should have explained what I'm trying to do. I'm working on
a front end for my programming language and using the LLVM libraries
as back end. The arrays in my language are currently represented by an
LLVM structure type that contains integer fields to hold the size for
each array dimension, and then an array type with the actual data for
the array.

A simple example, an array with 3 integers:

  var x : [3] int

For this example the following code is generated:

%x = alloca { i32, [3 x i32] } ; <{ i32, [3 x i32] }*> [#uses=4]
getelementptr { i32, [3 x i32] }* %x, i32 0, i32 0 ; <i32*>:1 [#uses=1]
store i32 3, i32* %1

(and followed by a loop that initializes the three elements)

But I only tested this with arrays that have an integer constant as
size. Now if I try it with a size that comes from an expression like
this:

  var size : nat = 9
  var x : [size] int

Then my compiler does not know the size of the array and uses a zero
for the type:

%size = alloca i32 ; <i32*> [#uses=2]
store i32 9, i32* %size
%x = alloca { i32, [0 x i32] } ; <{ i32, [0 x i32] }*> [#uses=4]
load i32* %size ; <i32>:1 [#uses=1]
getelementptr { i32, [0 x i32] }* %x, i32 0, i32 0 ;
<i32*>:2 [#uses=1]
store i32 %1, i32* %2

Obviously this is wrong, but I don't see how I can use the value from
'size' to allocate the necessary space for this array with alloca. I
could use a pointer to an array in the struct type, instead of a
direct array, and use two allocas (one for the struct and one for the
array data). But then there is no real reason to use the array type
anymore, a pointer to the first element (like llvm-gcc generates for
C/C++ arrays) would be much simpler. Or am I missing something?

Regards,
Kevin André

Okay, so it's more like the second example I gave. The equivalent C
program would be something like this:

int x = 37;

struct a {
  int f1;
  int a1[0];
};

int foo(int b, int c) {
  struct a *s = malloc(sizeof(struct a) + x * sizeof(10));
  int i;

  for (i = 0; i < x; ++i)
    s->a1[i] = i;

  return s->a1[b] * c;
}

int bar(int b, int c, int d) {
  struct a *s = malloc(sizeof(struct a) + b * sizeof(10));
  int i;

  for (i = 0; i < b; ++i)
    s->a1[i] = i;

  return s->a1[c] * d;
}

Compile this program at -O0 to see the initial code. Then compile it
at -O3 to see what it is after optimizations. At -O3, there's still
the calls to malloc, but you might not be able to get around that. One
thing that gets rid of the malloc in "foo" is to mark x as static. The
compile then knows it's size and that it's not modified in the current
file, so it can use the value it's been assigned. If your programming
language has this capacity, it would improve your code generation.

Hope this helps.

-bw

Hi,

Obviously this is wrong, but I don't see how I can use the value from
'size' to allocate the necessary space for this array with alloca. I
could use a pointer to an array in the struct type, instead of a
direct array, and use two allocas (one for the struct and one for the
array data). But then there is no real reason to use the array type
anymore, a pointer to the first element (like llvm-gcc generates for
C/C++ arrays) would be much simpler. Or am I missing something?

if the length is %n, you can alloca %n i8's:
  %x = alloca i8, i32 %n
[Well, I guess you need to allocate a bit more to include
the overhead of the length field in your struct.]
Then bitcast %x (which is an i8*) to a pointer to
your struct type.

Ciao,

Duncan.

if the length is %n, you can alloca %n i8's:
  %x = alloca i8, i32 %n

If the array element type is i32 (like the size
field), then you may as well use an alloca i32;
if you use an alloca i8 then you would have to
use 4*%n rather than %n in order to get the right
amount of memory.

Ciao,

Duncan.

Thanks, Bill and Duncan, for your suggestions. I've changed my
approach, and use a pointer instead of an array now.

So this code:

  var size : nat = 9
  var x : [size] int

now becomes:

%size = alloca i32 ; <i32*> [#uses=2]
store i32 9, i32* %size
%x = alloca { i32, i32* } ; <{ i32, i32* }*> [#uses=6]
load i32* %size ; <i32>:1 [#uses=1]
getelementptr { i32, i32* }* %x, i32 0, i32 0 ; <i32*>:2 [#uses=1]
store i32 %1, i32* %2
%arr_storage = alloca i32, i32 %1 ; <i32*> [#uses=1]
%arr_dataptr_addr = getelementptr { i32, i32* }* %x, i32 0, i32 1 ;
<i32**> [#uses=1]
store i32* %arr_storage, i32** %arr_dataptr_addr

(did some manual optimizations on this code, as my front end generates
really dumb code)

So I need two allocas now, one for the structure and one for the
storage of the elements.

Now I'm wondering: do you think it's a bad idea to use alloca for
allocating an unknown number of elements? In that case, should I just
always use the malloc instruction instead, or do a runtime decision to
choose between alloca and malloc?

Regards,
Kevin André

Hi,

Now I'm wondering: do you think it's a bad idea to use alloca for
allocating an unknown number of elements? In that case, should I just
always use the malloc instruction instead, or do a runtime decision to
choose between alloca and malloc?

on some operating systems there is a limit to how much you can allocate
on the stack. This is 16k on windows and something bigger but not huge
on Sun OS (IIRC). On the other hand, allocating things on the stack is
usually more efficient than using malloc.

Ciao,

Duncan.

PS: Not to mention that if you use malloc then
you will need to explicitly reclaim memory using
free when the variable goes out of scope, while
this happens automatically if you use alloca.