Reflection

I'm looking to possibly add support for reflection to clang & llvm.
I'm thinking it would work something like the following:

- clang inserts reflection information into the compiled bitcode
  - does debugging symbols provide enough information already?
  - should this be a new symbol table, or an extension of the debugging symbols?
- I write a c library which using the symbols from the compiled
binary, allows you to do reflection
  - A typeof() or similar builtin will be necessary
- I'll need to eventually modify llvm's optimizer somehow so that it
doesn't break the reflection information

Thoughts?
This is going to be my introduction to the llvm & clang code base, so
any advice on where to start?

-Russell Harmon

This is actually very interesting, and I was also looking to achieve something similar. However, I chose to do the things differently.

As C++ is statically typed, why bother with dynamic reflection, while you can achieve a more efficient, cleaner static reflection?
I rather thought of kind of preprocess phase that would add traits on classes containing their members, methods, etc… The reflection would be implemented by using these generated traits. We then could add a post reflection processed target that would simply prints the traits, thus allowing to have a standard C++ code buildable with any compiler.

struct Foo
{
void f();
int a;
};

The generated traits may look like (very roughly):

reflect_trait::attribute::count
reflect_trait::attribute<1>::type
reflect_trait::attribute<1>::name
reflect_trait::method<1>::type
reflect_trait::method<1>::signature
reflect_trait::method<1>::name
reflect_trait::method<1>::param::count
etc…

Just my 2 cents

Static reflection seems to be very limited. Reflection is a powerful mechanism for being able to analyse a 3rd party executable or dynamic library and interact with it at runtime, dynamically invoking APIs. Objective C already has this, so you should be able to to leverage the same concepts for a C++ version. You already have a great example in Clang; Objective-C provides all it’s reflection through the runtime APIs, and therefore a C++ extension could do the same thing. No built-in’s, alterations to LLVM or changes to the C++ language.

The reflection information is merely metadata generated at compile time to describe the symbols in a specific module. A lot of the work is around the runtime.

I imagine some of the things you would need are:

  • A runtime, similar to Objective-C’s, which needs some of the following capabilities
  • collection of metadata classes which describe methods, types, etc
  • APIs to read over data sections to extract metadata and generate aforementioned classes
  • Public APIs to query for this metadata, create instances of classes via name, etc- Changes to compiler
  • Generation of metadata into special data section
  • Generation of entry points / APIs to compiled libraries, executables to allow you to do things like create instances of classes / types by name, etc
    Cheers,

Stu

Stuart Carnie, CTO
manomio | in retro we trust!

2010/12/13 Aurélien Vallée <vallee.aurelien@gmail.com>

Static reflection is useful, but I agree it is a (completly) different problem. However, your proposal still seems quite complex and invasive. Why not juste store the reflection data in C++ structures and expose them by the mean of global const static object?

something like:

**** input
struct Foo
{
   void f();
   int a;
};

**** generated by clang
// Contain functions for decoding the reflexion information
struct reflection::class_data_base {
   void EnumMembers(Fct callback);
   int GetValue(const char* name);
   void Call(const char* name);
   //...
};
// for name mangling
struct reflection::class_data<Foo>:reflection::class_data_base {
   static const int num_members=2;
   // obviously pseudo code :slight_smile:
   static const datatable=[ "f", &f, "a", &a ... ];
};

This would allow to not modify llvm and C++ code could be generated by clang for other compiler. Clang does not need to generate the reflexion code, it could create an ast or codegen it directly.

This would also make it portable, no need to access metadata in an executable...

Hope this makes sense.
Anyways just my two cents. Good luck on your project, it is really interesting and useful (especially with the rise of fast scripting language).

regards,
Cédric

Cédric,

Your suggestion is quite reasonable, however in your scenario you must load the shared object / library into memory and execute code in order to enumerate the metadata. You will incur all the performance penalties for loading shared libraries for executable purposes, such as rebasing, etc. A particularly useful scenario for reflection is to scan a directory of libraries and determine available features or plugins, that would be quite costly if they all had to be dynamically loaded. It would be fairly easy to abstract the reading of the metadata from a specific data section.

My scenario still does not require changes to LLVM or the C++ language, only changes to Clang to generate the metadata.

Good discussion, either way, as it would make it easy to generate scriptable C++ libraries in a non-invasive way.

Cheers,

Stu

Stuart Carnie, CTO
manomio | in retro we trust!

Hi,

Good point, i didn’t think of this. However, when not loading the shared library, do you need the full reflection information? or could some metadata (custom attributes) and perhaps a list of class be enough (in fact this is already available from the symbole list)? The main drawbacks I can see for your approach is that:

  • executable format dependent data loader
  • you must generate the data yourself (what about member offset, function adress and interaction with relocation,…)
  • some functions refered by your data could be pruned by the linker because unused or inlined, but still have reflection reference

However, i do not know what your use case is. Mine would be things like script binding, generic property exploreur, (de)serialisation, data dumping/printing for debug… In all these case the dll would be loaded anyways. But since I do not have times to work on it, do wathever you want. I was just thinking your way needed more work.

regards,
Cédric

Ideally, you would use reflection as a dynamic API, and your use of
the API would be optimized into static reflection if possible.

My goal with this is to make it as language-agnostic as possible so as
to make it relatively easy to add generics to another language.

It's worth noting that C++ already has some of this via RTTI, although not all of it is public. The classes declared in <typeinfo> give you some information about an object via public interfaces (e.g. what its superclasses are and so on). They also include a name field which contains an encoding of the field types.

Unlike Objective-C, this encoding is defined by the ABI, rather than by the language, although you can use the typeid operator to get a type info object for a specific type and then use dynamic_cast<> to see which subclass of std::type_info it is. The (BSD-licensed) libelf-tc library in the elftoolchain project has code for parsing these encodings for the old-GNU, new-GNU (Itanium), and ARM ABIs, so it wouldn't be too much effort to create something that gives you more useful introspection metadata.

This information does not, as far as I am aware, give you type encodings for methods, so that would need to be added, which would modify the ABI somewhat (although not necessarily in an incompatible way).

One question though: When you say reflection, do you actually mean reflection, or do you just mean introspection? For example, true reflection would allow you to add methods to a C++ class, which would require modifying the vtable. If you're only replacing methods, this is quite simple (although, again, ABI-dependent), but adding methods would be much harder. Creating new types at run time would also be relatively difficult. Simply extending C++ RTTI to provide useful introspection would be a lot easier.

David

I was talking about Java-style reflection, which is apparently really
introspection.

I didn't really think it mattered since C++ is (mostly) just a
superset of C, but I had actually intended to do this for C, not C++.

David,

I believe you are correct, we are talking about introspection; I don’t believe anyone has mentioned extending or modifying existing types. Example use cases would be to inspect existing libraries and examine types of interest, such as plugins, or those conforming to specific interfaces to be used in a generic way. At the risk of stating the obvious, a good example would be to drop a library into a plugin folder of a paint application, and be able to enumerate all the types derived from “ImageReader”, enabling my fictitious application to load several new image file types with zero configuration.

Clang could benefit from this greatly too, by having an extensions path where you could drop shared libraries, and no longer have to specify -load -plugin, simply specifying switches to enable desired plugins. There are elegant ways to cache metadata extracted from a plugins folder so you don’t have to inspect the libraries on every invocation of the application.

Cheers,

Stu

Stuart Carnie, CTO
manomio | in retro we trust!

David

-- Sent from my Cray X1

What? The struct definition doesn't provide an authoritative definition?

David's saying that there isn't a unique translation unit responsible for defining the struct. This is true even in C++, unless the class in question has a key function.

John.

The initial way I'd think to deal with this is to add introspective
metadata for structs into every compilation unit and have the linker
discard redundant introspection metadata (iff the data is equal, if
not the linker should probably fail).

Alternatively, the introspective metadata could be spit out into a
separate file which is pulled in by the linker at link time.

The struct definition can be in a header, and very often is. The header can be included in, potentially, hundreds of compilation units. For example, any file that includes Cocoa.h or Foundation.h (i.e. pretty much any Objective-C source file) will contain definitions of the NSRange structure. Where would you emit the metadata? One copy in every single compilation unit that included this header?

In contrast, every Objective-C class has exactly one @implementation directive for the class, which contains all of the instance variable metadata and all of the method / property metadata for ones that are declared on the class. It can have additional @implementation directives for categories, but these contain abridged metadata (i.e. just the additions made by the category).

For C++, the vtable and RTTI data is emitted in the compilation unit that contains the definition of the first virtual member function (I think that's specified by C++, but I'm not overly familiar with the spec, so it may just be specified by the Itanium ABI).

David

Another possibility just occurred to me: for headers which include
structs which introspection will be used on, the compiler could
require that when compiling with that header that the header be a
precompiled header. Then, the introspective metadata could be inserted
into the pch.

It's just the Itanium ABI. It's enabled by a language rule which says that non-inline virtual functions must be defined in exactly one place, though.

John.

Introspecting C++/Objective-C-classes is what i am currently trying to do
with clang. The idea of adding reflections to C/C++/ObjC is in mind for
quite a long time. My main purpose is the serialization/deserialization
of more-or-less complex objects into different formats, without having
the need of using boost.serialization or something similar. For this
purpose, traits would be sufficient and i think they should be included
in an approach of realizing introspection. Of course, there should also
be a dynamic runtime for the use of doing some scripting or other
use-cases, which already have been mentioned.
Also, the use of reflection-specific attributes should bear in mind.

Thus, i wouldn't choose the approach of Russel, as you couldn't use
the introspection-information neither in compile- nor in runtime,
but only by the use of a third tool that reads the compiled bitcode -
am i right?
Nor i would chose a kind of preprocessing-phase, as purposed by
Aurélien, as a preprocessing-phase sounds like to much overhead.
My approach aims on generating the introspections immediately after
clang has build the AST of a class/method/and so on. I currently don't
know in what form the introspection should be generated - if it
possible to generate an AST and inject it somewhere into clang itself,
or if it has to be written in code temporary and then processed by
clang, or something else - i have no idea what approach is possible
and reasonable. The introspections should be accessible as classes in
their own namespace - maybe configurable by the user by a macro or an
attribute. You should be able to use them inside the same
translationunit as the class/method/... itself got defined.

These are my thoughts so far, still at the very beginning.
It would be a great pleasure to see something going on in this area,
and i think i could participate when those thoughts would reach in a
project.

Objective-C classes already contain introspection information. In the EtoileSerialise framework, we use the information about instance variable layout to automatically serialise most Objective-C classes (anything that doesn't contain pointers to C types - those need some manual assistance, since we can't tell if they're arrays, and if they are what their size is). LanguageKit, similarly, uses this information when compiling classes from Smalltalk or EScript to access instance variables in the superclass and to determine the correct types when calling Objective-C methods.

If you want to add metadata to Objective-C classes, be careful that you're not duplicating something that's already there...

David

-- Sent from my brain

My suggestion is:

1) Take the C++ elements of a compilation and put them in a database. Whether the database must be a separate file or somehow connected directly to a C++ output is for you to study. Once the linker runs, take all the separate database information and put it together, again either in its own file or somehow within an executable, shared library, or static library output.

2) Write once a separate library which at run-time can introspect C++ type information using the database format you created based on the type of C++ information you have collected. This library should work with any information you have collected for any module.

3) Add a new hook into clang so that a keyword extension allows the programmer to introspect run-time information based on a type or an expression ( not easy at all, of course ). Something like 'extended_typeinfo' which can be applied to any type or expression. Using something like this should enable a programmer to introspect the type at run-time.

A further difficulty of 3) is that introspecting any type should respect the access level of that type. What I mean is that if the code is within the type itself, then 'private','protected', and 'public' information for that type is accessible; if the code is within a derived type of that type then 'protected' and 'public' information is accessible; if the code is outside the type then only 'public' information is accessible.

Another goal is to allow the programmer to actually create an object of a particular type at run-time based on the name of the type as a C++ string. For that your separate library needs a function in order to do this and needs to create the object for the caller and return it in some way.

4) Interest others in your run-time reflection facilities and your clang extension to allow it to work, and try to interest the C++ standard committee in it also if it works well and finds excellent usages.

Yes, I have thought about this very much, but it is a huge undertaking and needs to be done in the right way. Someday with clang I may try myself as I believe that a C++ RAD environment can only be built with adequate run-time reflection. I wish you the best of luck in your pursuit.