Building aggregate types with dynamically sized elements

Hi,

I’m looking into ways to represent complex aggregate types whose members and/or sub-members may be dynamically sized. This would conceptually correspond to a C struct containing other structs and arrays, and some of those arrays would have a length that is not known in compile time.

LLVM IR allows me to do this for a top-level array (using 0 as its LLVM data type length and manually computing its needed allocation size).

As far as I understand, if I want to represent structs containing dynamic arrays containing structs and so on, I will have to roll my own data size computation, element pointer computation and so on. This however seems to be a significant risk to portability and alignment handling.

Does anyone here have any experience with similar use cases, or any advice on how to handle this?

Thanks

I'm looking into ways to represent complex aggregate types whose members and/or sub-members may be dynamically sized. This would conceptually correspond to a C struct containing other structs and arrays, and some of those arrays would have a length that is not known in compile time.

Note: C does not support this. GCC has an extension (variable length arrays in structs), but clang doesn’t support it and it’s a terrible idea.

LLVM IR allows me to do this for a top-level array (using 0 as its LLVM data type length and manually computing its needed allocation size).

As far as I understand, if I want to represent structs containing dynamic arrays containing structs and so on, I will have to roll my own data size computation, element pointer computation and so on. This however seems to be a significant risk to portability and alignment handling.

Does anyone here have any experience with similar use cases, or any advice on how to handle this?

You will have to do this manually, via pointer casts of the results of GEPs.

David

Note: C does not support this. GCC has an extension (variable length arrays in structs), but clang doesn’t support it and it’s a terrible idea.

It might be complex but it is not terrible. Depends on the language capabilities implementing it. Delphi for example allows exactly that and finalizes its records through RTTI which allows to automatically free up the memory.

The Pascal type system, unlike C, does not erase array lengths in various places, so doing the same thin in Pascal would be far less problematic than C. That doesn’t alter the fact that it’s a terrible idea for C, where it encourages to error-prone code.

David

I would agree that such a capability is a questionable fit for C.

However as others point out I think it depends on the language in question. In this case the language has runtime information about the array lengths and enforces the bounds strictly. The language’s semantics for these aggregate, occasionally dynamically sized types I think are good; my problem is how best to map it onto LLVMs type model.

My concern as I said is about portability and alignment, if I roll my own type layout and offset computations. Any advice regarding this is much appreciated!

Thanks

As I said, you will need to handle accesses via a sequence of GEPs and bitcasts of pointers (i.e. use a GEP to get the address of the variable-sized pointer field, then cast it to a pointer type for the array, then use a GEP on that, and so on). For allocas, you will need to compute the size yourself and probably allocate an i8 array that you can then bitcast to the type of your top-level struct. Getting the address of elements after the first variable-sized element is more tricky, and I’d generally suggest that you lower something like (C-style syntax):

struct foo
{
  int a;
  int array[somevariable];
  int b;
  int c;
};

to something more like:

struct foo
{
  int a;
  int array[1];
  struct __internal
  {
    int b;
    int c;
  } rest;
}

You can then do a fixed GEP to get the size of array. Then a GEP with a variable index to get the address of the end of array. You can then bitcast this to a pointer to struct __internal and do a GEP to get at the later elements.

This will give the wrong answer if the alignment of struct __internal is such that there is padding. To work around this, you can do a GEP to get the offset of array and a GEP to get the address of rest, add the size to the offset of array and subtract this from the offset of rest. This will give you the size of any padding, and if it is non-zero then you can do a bitcast to i8* in the middle and do a GEP to realign the pointer.

Note: If your algorithm for accessing the layouts is more complex than this, then you may be better off writing it in C, compiling it with clang, and inserting it into the module as an always_inline function that you just call. The LLVM optimisers will then inline it and handle the constant propagation for you. If it refers to type introspection structures that are in constant globals, then constant propagation should handle all of this correctly.

David