Order of fiels and structure usage

I'd like to be able to make use of a structure type and its fields before
it is completely defined. To be specific, let me ask detailed questions
at various stages in the construction of a recursive type. I copy from

http://llvm.org/docs/ProgrammersManual.html#TypeResolve

    // Create the initial outer struct
    PATypeHolder StructTy = OpaqueType::get();

Is it possible to declare variables of type StructTy at this point?
Is it possible to define other structured types that have fields of type
StructTy at this point? I'm guessing the answers to these questions are
"Yes".

    std::vector<const Type*> Elts;
    Elts.push_back(PointerType::get(StructTy));

Is it possible to build an expression that uses the newly generated Elt
as field-selector at this point? I'm hoping yes, but I suspect No,
because the elments of Elts* are clearly Type* instead of being a field.
In particular, if I use the same type twice to make two fields, the
corresponding elements of Elts will be indistinguishable.

    Elts.push_back(Type::Int32Ty);
    StructType *NewSTy = StructType::get(Elts);

Presumably at this point is is definitely possible to declare variables
of type NewsTy and use field-selectors from NewSTy. But it's a little
too late for my purposes.

The types I'm dealing with are not recursive, but of course I'll have to
perform the rest of the steps so that the variables I declared long ago
finally get well-defined types.

// At this point, NewSTy = "{ opaque*, i32 }". Tell VMCore that
// the struct and the opaque type are actually the same.
cast<OpaqueType>(StructTy.get())->refineAbstractTypeTo(NewSTy);

// NewSTy is potentially invalidated, but StructTy (a PATypeHolder) is
// kept up-to-date
NewSTy = cast<StructType>(StructTy.get());

// Add a name for the type to the module symbol table (optional)
MyModule->addTypeName("mylist", NewSTy);

If the answers to my questions are "yes", I can generate code easily for
all the variations on the source language I'm compiling. If any are "no"
I'll be significantly constrained in what I'll be able to do easily
(i.e., without extra code generation passes and intermediate data
structures).

-- hendrik

I'd like to be able to make use of a structure type and its fields before
it is completely defined. To be specific, let me ask detailed questions
at various stages in the construction of a recursive type. I copy from

http://llvm.org/docs/ProgrammersManual.html#TypeResolve

   // Create the initial outer struct
   PATypeHolder StructTy = OpaqueType::get();

Is it possible to declare variables of type StructTy at this point?

I think you can, although you have to be careful; if you don't make
sure the variable eventually has a computable size, the module won't
be valid.

Declaring variables of type pointer to StructTy is completely safe.

   std::vector<const Type*> Elts;
   Elts.push_back(PointerType::get(StructTy));

Is it possible to build an expression that uses the newly generated Elt
as field-selector at this point? I'm hoping yes, but I suspect No,
because the elments of Elts* are clearly Type* instead of being a field.
In particular, if I use the same type twice to make two fields, the
corresponding elements of Elts will be indistinguishable.

I'm not following; are you trying to access the first member of NewSTy
here? You can't use a type that hasn't been created yet. You might
be able to pull some tricks with incomplete types or casts, though.

http://llvm.org/docs/LangRef.html#i_getelementptr and
http://llvm.org/docs/GetElementPtr.html might be useful here.

   Elts.push_back(Type::Int32Ty);
   StructType *NewSTy = StructType::get(Elts);

Presumably at this point is is definitely possible to declare variables
of type NewsTy and use field-selectors from NewSTy. But it's a little
too late for my purposes.

Basically, the rule for opaque types is that in a valid module, you
can do anything you could do with a declaration like "struct S;" in C.
And in a module under construction, I'm pretty sure you can pull some
more tricks, like declaring variables with types of unknown size, or
accessing structs with members of unknown size.

-Eli

I'd like to be able to make use of a structure type and its fields
before it is completely defined. To be specific, let me ask detailed
questions at various stages in the construction of a recursive type. I
copy from

http://llvm.org/docs/ProgrammersManual.html#TypeResolve

   // Create the initial outer struct
   PATypeHolder StructTy = OpaqueType::get();

Is it possible to declare variables of type StructTy at this point?

I think you can, although you have to be careful; if you don't make sure
the variable eventually has a computable size, the module won't be
valid.

Of course, eventually they type will ba fully defined.

Declaring variables of type pointer to StructTy is completely safe.

   std::vector<const Type*> Elts;
   Elts.push_back(PointerType::get(StructTy));

Is it possible to build an expression that uses the newly generated Elt
as field-selector at this point? I'm hoping yes, but I suspect No,
because the elments of Elts* are clearly Type* instead of being a
field. In particular, if I use the same type twice to make two fields,
the corresponding elements of Elts will be indistinguishable.

I'm not following; are you trying to access the first member of NewSTy
here? You can't use a type that hasn't been created yet. You might be
able to pull some tricks with incomplete types or casts, though.

What I want is to be able to use the fields that have already been
defined, even though the type isn't complete yet. The vector<const
Type*> is all I have at that moment, and it isn't a type. But by the
time I have a type it's frozen and I can't add new fields to it.

Do I gather that I keep making new types, each slightly larger than the
previous ones, cast each pointer to my growing type to the type-of-the-
moment, and field-select from it; then finally complete the type when
all is known? That might just work, if field-allocation is independent
of later fields, but it is ugly.

The trouble is that llvm won't believe in fields until the structure is
complete, and then it believes in all of them. While that's fine for
semantics of the completed module, it makes less sense while the module
is under construction. I view type-declaration syntax as being syntax,
and I'd like it to be as flexible and modifiable as syntax anywhere else
in the parse tree. The time to interpret type declarations as actually
defining specific types with known semantics is after the syntax has been
constructed, not before. If it's possible to do some of it statically
during parse tree construction, that's fine, but it shouldn't be
*required*. But it's evidently not the way llvm thinks.

http://llvm.org/docs/LangRef.html#i_getelementptr and
http://llvm.org/docs/GetElementPtr.html might be useful here.

These operations also require a completed tyle.

   Elts.push_back(Type::Int32Ty);
   StructType *NewSTy = StructType::get(Elts);

Presumably at this point is is definitely possible to declare variables
of type NewsTy and use field-selectors from NewSTy. But it's a little
too late for my purposes.

Basically, the rule for opaque types is that in a valid module, you can
do anything you could do with a declaration like "struct S;" in C.

Which is, basically, nothing but point to it and statically know that
it's the same or different from (possibly) other types.

And in a module under construction, I'm pretty sure you can pull some
more tricks, like declaring variables with types of unknown size, or
accessing structs with members of unknown size.

But not their fields, because llvm doesn't believe in fields of a
structure until it has them all. If the elements of Elts were fields of
a yet-to-be-identified structure, I'd be able to use them; it would make
sense then to use field explicitly in the getelementptr istruction
instead of the integers which have to be indexed into a type to obtain
them.

Field-allocation is guaranteed to be independent of later fields, so
the casting solution would work.

It might be slightly cleaner to define the types recursively... for
example, define a struct as { i32 { float { i32* } } }. That way, you
wouldn't have a bunch of partial types floating around.

-Eli

What I want is to be able to use the fields that have already been
defined, even though the type isn't complete yet. The vector<const
Type*> is all I have at that moment, and it isn't a type. But by the
time I have a type it's frozen and I can't add new fields to it.

Do I gather that I keep making new types, each slightly larger than the
previous ones, cast each pointer to my growing type to the type-of-the-
moment, and field-select from it; then finally complete the type when
all is known? That might just work, if field-allocation is independent
of later fields, but it is ugly.

Field-allocation is guaranteed to be independent of later fields, so the
casting solution would work.

Thanks for the idea. I was starting to despair about making the compiler
as flexible as I wanted it without abandoning llvm.

It might be slightly cleaner to define the types recursively... for
example, define a struct as { i32 { float { i32* } } }. That way, you
wouldn't have a bunch of partial types floating around.

Just curious -- would struct{struct{i32, i8} i8} take just 6 bytes on the
usual architectures?

-- hendrik

No... it would take 12 bytes. Assuming i32 has 4 byte ABI alignment,
struct {i32, i8} has to have alignment 4 and size 8, and therefore
struct{struct{i32, i8} i8} would have to have alignment 4 and size 8.
The rules for LLVM struct alignment and size come from the usual C ABI
for structs.

struct {i32,i8,i8}, on the other hand, only takes 8 bytes.

-Eli

Looking at this again, the conceptual problem is this.

It's natural in writing a code generator to want to generate code out of
order.

Given a suitable representation of strings, or temporary files, it's
rather easy to do this if the generated fore is text. You just make sure
you can make insertions where you want, or stratify the code into
different temporary files that are later concatenated, or something like
that. With today's gigabyte RAM chips, this isn't a big deal. So all
would be well generating llvm assembler.

But then I see the API to llvm that allows one to build the llvm parse
tree directly, without making a huge string that has to be written out
and parsed. It seems designed for the typical case -- that code will be
generated out of order. You can remember insertion points into the parse
tree, and inject things as needed.

Except that this does not work with types. llvm assembler has type
declarations, which are as jugglable and expandable as any other piece of
text -- until thep generated code is complete and everything is written
out for reading and parsing. The parse tree, however, doesn't seem to
have a syntax for type declarations -- it only has types, It is not a
parse tree for the llvm assembler. It is something else, something
slightly different, but different enough to cause trouble.

And that's the whole difference.

-- hendrik