Correct way to resolve recursive type information?

Hi,

I’ve followed the instructions on constructing recursive types (http://llvm.org/docs/ProgrammersManual.html#BuildRecType) and I can succesfully create simple recursive types using the C bindings (e.g. struct Test { struct Test *t };). I want to generalize this to get type information from my language into generated LLVM code. My language allows arbitrary forward type declarations that I resolve using two passes - first all type names are entered into the symbol table in turn and then all type structures are built in turn with references to other types being resolved from the symbol table.

To make this work with LLVM I plan to:

  • in the first type resolution pass, for every structured type in the compiled source create a type handle referencing an opaque type with LLVMCreateTypeHandle(LLVMCreateOpaqueType() and store it in the type’s symbol table entry
  • in the second type resolution pass, create an LLVM structured type for every structured type in the program. The element types for any referenced types will be those types’ opaque types
  • in a third pass, for every structured type in the program, resolve its opaque type to its structured type with LLVMRefineType

Will I have a problem with TypeRefs becoming invalid underneath me as I repeatedly call LLVMRefineType in the third pass? If so how can I construct a web of mutually recursive types - is there some kind of atomic LLVMRefineType alternative that can refine the whole lot in one go?

I’d be grateful for any advice,
– James Williams

In my compiler, the multiple passes for resolving type names is done first, using high-level type classes, before any LLVM IR is involved. The high level classes are needed anyway since the LLVM type model does not capture concepts such as const, up-casting, and other aspects of high-level types. (Actually, I don’t have discrete passes, what I have is a work-queue of symbols to be processed and analysis tasks to be performed on those symbols, with the ability to perform certain analysis tasks “just in time” if needed by other tasks. My friend jokingly refers to this as the “breadth-first compiler”. However, that level of complexity is only required if you are doing things like type inference.)

Only when the types are fully resolved do I convert to LLVM types. This only requires a single pass:

for each type:
if the type has not yet been constructed:
set the ‘under construction’ bit for that type.
create an opaque type as a placeholder
for each member type (recursively) construct the type
(using the member type’s placeholder if the member type is still under construction.)
create the type from the member types.
clear the ‘under construction’ bit.
replace the placeholder with the constructed type.

2010/1/6 Talin <viridia@gmail.com>

In my compiler, the multiple passes for resolving type names is done first, using high-level type classes, before any LLVM IR is involved. The high level classes are needed anyway since the LLVM type model does not capture concepts such as const, up-casting, and other aspects of high-level types. (Actually, I don’t have discrete passes, what I have is a work-queue of symbols to be processed and analysis tasks to be performed on those symbols, with the ability to perform certain analysis tasks “just in time” if needed by other tasks. My friend jokingly refers to this as the “breadth-first compiler”. However, that level of complexity is only required if you are doing things like type inference.)

Only when the types are fully resolved do I convert to LLVM types. This only requires a single pass:

for each type:
if the type has not yet been constructed:
set the ‘under construction’ bit for that type.
create an opaque type as a placeholder
for each member type (recursively) construct the type
(using the member type’s placeholder if the member type is still under construction.)
create the type from the member types.
clear the ‘under construction’ bit.
replace the placeholder with the constructed type.

That’s much cleaner than what I’m doing (actually one pass over the syntax tree to declare all namespaces and type names, then recursively specialize templates and infer types for variable definitions, then a pass to construct structured types including template specializations). However, I want to get LLVM in quickly and with the minimum upheaval so I don’t want to start a major refactoring of this code if I don’t need to. Having read read the manual I think I’ll be OK so long as I use type handles for any type I want to hold references to until all refineAbstractTypeTo()'s are done.

– James