code generation order

One more question. I hope you're not getting tired of me already. Does
generating LLVM code have to proceed in any particular order?

Of course, if I am writing LLVM assembler by appending characters to the
end of a sequential file, I'd have to write everything in the order
prescribed by the assembler syntax.

But if I'm using the C interface to build an LLVM parse tree, does that
have to be in any particular time-order? Can I, for example, define a
few functions, start scattering code into them, decide I' like to declare
some more local variables in one of them, generate code for another,
return to the first one and stick in a new basic block at its start,
discover I should have declared some more global variables, and so forth?

That could be very convenient.

-- hendrik

Yes, you can absolutely do this.

— Gordon

Great. You probably have a very good idea how much easier this makes
life.

-- hendrik

I think I may have found an exception to this -- the API seems to require
me to have all the fields for a struct ready before I construct the
struct. I don't have the ability to make a struct type, use it to
declare some variables, and still contribute fields to it during the rest
of the compilation.

Is there a reason for this limitation other than no one thinking of it?
Does it need to have all the type information early in building the
parser tree? I can't really imagine that. I for one could do without
this limitation.

I won't even ask to be able to contribute more fields at link time,
though that would be useful, too. Such link-time-assembled structures
ould resemble the DXD dummy control sections sections that PL/1 used on
OS/360.

-- hendrik

One more question. I hope you're not getting tired of me already. Does generating LLVM code have to proceed in any particular order?

Of course, if I am writing LLVM assembler by appending characters to the end of a sequential file, I'd have to write everything in the order prescribed by the assembler syntax.

But if I'm using the C interface to build an LLVM parse tree, does that have to be in any particular time-order? Can I, for example, define a few functions, start scattering code into them, decide I' like to declare some more local variables in one of them, generate code for another, return to the first one and stick in a new basic block at its start, discover I should have declared some more global variables, and so forth?

That could be very convenient.

Yes, you can absolutely do this.

I think I may have found an exception to this -- the API seems to require me to have all the fields for a struct ready before I construct the struct. I don't have the ability to make a struct type, use it to declare some variables, and still contribute fields to it during the rest of the compilation.

Is there a reason for this limitation other than no one thinking of it? Does it need to have all the type information early in building the parser tree? I can't really imagine that. I for one could do without this limitation.

You really can't do this since LLVM types are shape isomorphic. Observe what happens to the types of @x and @y:

     gordon$ cat input.ll
     %xty = type {i32}
     %yty = type {i32}
     @x = external constant %xty
     @y = external constant %yty

     gordon$ llvm-as < input.ll | llvm-dis
     ; ModuleID = '<stdin>'
             %xty = type { i32 }
             %yty = type { i32 }
     @x = external constant %xty ; <%xty*> [#uses=0]
     @y = external constant %xty ; <%xty*> [#uses=0]

(This is not a side-effect of llvm-as or llvm-dis, but a fundamental property of the LLVM 'Type' class.)

The only type that is not shape-isomorphic is 'opaque'. Each mention of 'opaque' in LLVM IR is a distinct type:

     gordon$ cat input2.ll
     %xty = type opaque
     %yty = type opaque
     @x = external constant %xty
     @y = external constant %yty

     gordon$ llvm-as < input2.ll | llvm-dis
     ; ModuleID = '<stdin>'
         %xty = type opaque
         %yty = type opaque
     @x = external constant %xty ; <%xty*> [#uses=0]
     @y = external constant %yty ; <%yty*> [#uses=0]

I won't even ask to be able to contribute more fields at link time, though that would be useful, too. Such link-time-assembled structures ould resemble the DXD dummy control sections sections that PL/1 used on OS/360.

This is absolutely possible:

     @Type.field.offs = external constant i32
...
     %Type.field.offs = load i32* @Type.field.offs
     %obj.start = bitcast %object* %obj to i8*
     %obj.field = getelementptr i8* %obj.start, i32 0, i32 %Type.field.offs
     %field.ptr = bitcast %obj.field to %field*
     %field.val = load %field* %field.ptr

This is completely analogous to opaque data types in C. You can use any of the following techniques:

     typedef struct OpaqueFoo *FooRef; /* like %object = type opaque in LLVM */

     typedef void *FooRef; /* like %object = type i8 in LLVM */

     typedef struct {
       struct Vtable *VT;
     } Base;
     typedef Base *FooRef; /* like %object = type { %vtable* } in LLVM */

— Gordon

I think I may have found an exception to this -- the API seems to
require me to have all the fields for a struct ready before I
construct the struct. I don't have the ability to make a struct
type, use it to declare some variables, and still contribute fields
to it during the rest of the compilation.

Is there a reason for this limitation other than no one thinking of
it? Does it need to have all the type information early in building
the parser tree? I can't really imagine that. I for one could do
without this limitation.

You really can't do this since LLVM types are shape isomorphic.
Observe what happens to the types of @x and @y:

     gordon$ cat input.ll
     %xty = type {i32}
     %yty = type {i32}
     @x = external constant %xty
     @y = external constant %yty

     gordon$ llvm-as < input.ll | llvm-dis
     ; ModuleID = '<stdin>'
             %xty = type { i32 }
             %yty = type { i32 }
     @x = external constant %xty ; <%xty*> [#uses=0]
     @y = external constant %xty ; <%xty*> [#uses=0]

(This is not a side-effect of llvm-as or llvm-dis, but a fundamental
property of the LLVM 'Type' class.)

The only type that is not shape-isomorphic is 'opaque'. Each mention
of 'opaque' in LLVM IR is a distinct type:

     gordon$ cat input2.ll
     %xty = type opaque
     %yty = type opaque
     @x = external constant %xty
     @y = external constant %yty

     gordon$ llvm-as < input2.ll | llvm-dis
     ; ModuleID = '<stdin>'
         %xty = type opaque
         %yty = type opaque
     @x = external constant %xty ; <%xty*> [#uses=0]
     @y = external constant %yty ; <%yty*> [#uses=0]

So it appears that types are processed for identity the moment they are
made during parse tree construction? This means that a type has to be
completely known on creation. Presumably there's some mechanism tor a
type that isn't completely known yet -- or is thet avoided by having a
type 'pointer' instead of 'poimter-to-foo'?

-- hendrik

So it appears that types are processed for identity the moment they are made during parse tree construction?

Yes.

This means that a type has to be completely known on creation.

Yes.

Presumably there's some mechanism tor a type that isn't completely known yet -- or is thet avoided by having a type 'pointer' instead of 'poimter-to-foo'?

Partially opaque types can be refined. This section of the programmer's manual is applicable:

http://llvm.org/docs/ProgrammersManual.html#BuildRecType

— Gordon

Here it is:

: // Create the initial outer struct
: PATypeHolder StructTy = OpaqueType::get();
: std::vector<const Type*> Elts;
: Elts.push_back(PointerType::get(StructTy));
: Elts.push_back(Type::Int32Ty);
: StructType *NewSTy = StructType::get(Elts);

Here NewSTy is a pointer to StructType.

: // At this point, NewSTy = "{ opaque*, i32 }". Tell VMCore that
: // the struct and the opaque type are actually the same.
: cast<OpaqueType>(StructTy.get())->refineAbstractTypeTo(NewSTy);

: // NewSTy is potentially invalidated, but StructTy (a PATypeHolder) is
: // kept up-to-date
: NewSTy = cast<StructType>(StructTy.get());

At first I couldn't see the purpose of this statement, because presumably
NewSTy is already that StructType. But then I noticed that here we are
assigning a StructType to a variable that is pointer-to-StructType
instead. So I don't even know what I should be wondering about.

: // Add a name for the type to the module symbol table (optional)
: MyModule->addTypeName("mylist", NewSTy);

-- hendrik

And another question:

: // Create the initial outer struct
: PATypeHolder StructTy = OpaqueType::get();
: std::vector<const Type*>
Elts;

Is it possible to start generating parse tree for code that accesses the
fields of the structure-to-be at this point, knowing that everything will
be there by the time parse-tree generation is complete?

: Elts.push_back(PointerType::get(StructTy));
: Elts.push_back(Type::Int32Ty);
: StructType *NewSTy = StructType::get(Elts);

...