Unique shape and types

Hi all,

In its implementation of types, LLVM only has one instance of a given shape, mostly for type equality (and I suppose projects like pool allocation requires it).

However, this leads to a somehow misleading bytecode representation. For example, consider the C++ program compiled with llvm-g++:

class AAA {
  int b;
};

class FFF {
  int a;
};

extern int foo(AAA * aaa);
extern int bar(FFF * aaa);

This gets compiled to

%struct.AAA = type { i32 }
%struct.FFF = type { i32 }

declare i32 @foo(AAA*)(%struct.AAA*)

declare i32 @bar(FFF*)(%struct.AAA*)

This is misleading because the bytecode tells us the argument type of bar is AAA. I suppose the name "@bar(FFF*)" helps the programmer finding that bar takes actually a FFF struct pointer.

Is this a non-issue for LLVM? Are types just considered as layouts?

The issue I'm facing is object inheritance and how to find from a LLVM type what types it inherits. Currently this can't be implemented in LLVM and I need to implement a higher representation for types.

Best,
Nicolas

In its implementation of types, LLVM only has one instance of a given
shape, mostly for type equality
Is this a non-issue for LLVM? Are types just considered as layouts?

LLVM uses a structural type system, which is different than many source languages. This is useful for the optimizer, but is not so useful if you want source level names.

The issue I'm facing is object inheritance and how to find from a LLVM
type what types it inherits. Currently this can't be implemented in LLVM
and I need to implement a higher representation for types.

Yep. Sorry :(. Depending on your application, you could read debug info, which captures all of this.

-Chris

Chris Lattner wrote:

The issue I'm facing is object inheritance and how to find from a LLVM
type what types it inherits. Currently this can't be implemented in LLVM
and I need to implement a higher representation for types.
    
Yep. Sorry :(.

"Sorry", like "this can really not be integrated in LLVM" or like "it is possible but it requires a lot of work to integrate it"? :slight_smile:

Depending on your application, you could read debug info,
  
Actually I am not using llvm-gcc. I'm just targeting a new language and it would have been a lot easier implementing the compiler with LLVM if it had this kind of feature (string <-> type). But if it's not feasible, well I'll just stick with my higher type representation.

This leads to another question of type inference for dynamic languages (I refer to your slides from the LLVM meeting day). I do not see how you can integrate type inference on objects in LLVM without knowning inheritance between types. Maybe you only target type inference to dissociate floating point values from integer or object values?

Nicolas

Can't this association be established with the new annotation feature?

Sandro

Yes, it probably could. THis would require the programmaer manually annotating things though. The recently added annotation capability is only to be used to propagate source level annotations, not to capture arbitrary information from the front-end.

-Chris

Yep. Sorry :(.

"Sorry", like "this can really not be integrated in LLVM" or like "it is
possible but it requires a lot of work to integrate it"? :slight_smile:

it is not a desired feature for the LLVM type system itself.

Depending on your application, you could read debug info,

Actually I am not using llvm-gcc. I'm just targeting a new language and
it would have been a lot easier implementing the compiler with LLVM if
it had this kind of feature (string <-> type). But if it's not feasible,
well I'll just stick with my higher type representation.

If you control the front-end, you have many options available to you. For example, you could emit your own tables along the same lines as debug info but specialized to capture the metadata you need.

This leads to another question of type inference for dynamic languages
(I refer to your slides from the LLVM meeting day). I do not see how you
can integrate type inference on objects in LLVM without knowning
inheritance between types. Maybe you only target type inference to
dissociate floating point values from integer or object values?

I don't propose to do that analysis on the LLVM IR itself. :slight_smile:

-Chris

Chris Lattner wrote:

If you control the front-end, you have many options available to you. For example, you could emit your own tables along the same lines as debug info but specialized to capture the metadata you need.

Yes, but this always turns into having two representations for one type: the LLVM representation for its layout,
and a personal representation for inheritance. Consider this example from a dynamic language which creates
a new list of size four:

(define a (new List 4))

If LLVM had type inheritance integrated, all that would be needed is to emit the (new List 4) into LLVM instructions,
get the returned type, and set it to the GlobalVariable a. This would simplify a lot of things (for me! :))

This leads to another question of type inference for dynamic languages
    
I don't propose to do that analysis on the LLVM IR itself. :slight_smile:

Ah, OK. So you propose to do type inference in (all) front-ends? Or maybe on a higher IR?

Again, wouldn't it be easier for dynamic language front-end implementations if type inference was just a pass on the LLVM IR? I would surely _love_ to have thins kind of feature :slight_smile:

Nicolas

Chris Lattner wrote:

If you control the front-end, you have many options available to you. For
example, you could emit your own tables along the same lines as debug
info but specialized to capture the metadata you need.

Yes, but this always turns into having two representations for one type:
the LLVM representation for its layout,
and a personal representation for inheritance. Consider this example
from a dynamic language which creates
a new list of size four:

(define a (new List 4))

If LLVM had type inheritance integrated, all that would be needed is to
emit the (new List 4) into LLVM instructions,
get the returned type, and set it to the GlobalVariable a. This would
simplify a lot of things (for me! :))

Unfortunately, that comes at a high cost. The optimizers would then be tasked with maintaining this representation and not invalidating it. Also, the types would get in the way of many low-level optimizations.

This leads to another question of type inference for dynamic languages

I don't propose to do that analysis on the LLVM IR itself. :slight_smile:

Ah, OK. So you propose to do type inference in (all) front-ends? Or
maybe on a higher IR?

Yes, I'm proposing a common higher-level IR for languages like ruby, python, perl, etc. I'll be giving a talk about it at OSCON this year. After the talk, I'll put slides up.

Again, wouldn't it be easier for dynamic language front-end
implementations if type inference was just a pass on the LLVM IR? I
would surely _love_ to have thins kind of feature :slight_smile:

It certainly would be easier for some people :). That said, the LLVM IR shouldn't try to be all things to all people. That is a sure recipe for doing all things really poorly for all people.

-Chris