Crefl - a Clang plug-in and C-type-reflection-API

Hi Folks,

Writing to let you know about a component I have been working on that could be of interest to the LLVM and Clang community:

> "Crefl - a Clang plug-in and C-type-reflection-API"

The Crefl API and plugin provide access to runtime reflection metadata for C interfaces supporting arbitrarily nested combinations of: intrinsic, enum, struct, union, field, array, constant, and function.

Crefl focuses on addressing the following areas:

- a clang plug-in that outputs portable reflection metadata.
- a reflection database format for portable reflection metadata.
- an API that provides task-oriented access to reflection metadata.

I am aware that the C++ standards committee is focusing on compile-time type reflection for C++ and I am aware of similar work in-tree for Clang AST serialization, this work is intended to be complementary. Crefl is a small runtime dependency and I am focusing on a portable format and exposing reflection metadata to C. Crefl itself is written in C++.

The Crefl plugin is nearing beta level interface stability and reliability. I have been ironing out bugs and API usability issues. There is now a rudimentary reflection metadata linker that performs de-duplication using recursive tree hash sums. Linking still needs work regards incomplete types, modules, name indexing and namespaces.

The Crefl repo contains samples/example2_embed which shows reflection metadata embedding into a binary. An ASN.1 implementation exists in the tree and I am currently working on structure packing and alignment. The intention is to write an ASN.1 serializer for C structures and then use that to read and write the reflection metadata itself. The metadata is currently stored using C native structure packing and alignment. The reflection API is nearly complete and I am now starting on structure serialization which is the major planned use of the API.

There exists a cmake macro that handles invoking the crefl plugin, merging the metadata and embedding it into a linkable object file.

     include(cmake/crefl_macro.cmake)

     add_executable(example2_embed samples/example2_embed/main.c)
     crefl_target_reflect(example2_embed example2_embed_refl)
     target_link_libraries(example2_embed example2_embed_refl cmodel)

Here is a sample showing how to access the embedded metadata:

     int main(int argc, const char **argv)
     {
         decl_db *db = crefl_db_new();
         crefl_db_read_mem(db, __crefl_main_data, __crefl_main_size);

         size_t nsources = 0;
         crefl_archive_sources(crefl_root(db), NULL, &nsources);
         assert(nsources == 1);
         decl_ref *_sources = calloc(nsources, sizeof(decl_ref));
         assert(_sources);
         crefl_archive_sources(crefl_root(db), _sources, &nsources);

         size_t ntypes = 0;
         crefl_source_decls(_sources[0], NULL, &ntypes);
         decl_ref *_types = calloc(ntypes, sizeof(decl_ref));
         assert(_types);
         crefl_source_decls(_sources[0], _types, &ntypes);

         for (size_t i = 0; i < ntypes; i++) {
             _print(_types[i], 0);
         }

         crefl_db_destroy(db);
     }

I would also like to implement a reference counted allocator wrapper providing for serialization of arbitrary C object graphs. Handling arrays would need some sort of `alloc(T)(n)` for typed array buffers perhaps using negative array indices to find object metadata containing count. That would be to support serialization of strings and arrays ...

   ((struct _alloc*)ptr)[-1].rc

... using some structures to support reference counting:

   struct _alloc { size_t count; dtor_notify_t dtor_notify; rc_t rc; };
   struct _ref { void* obj; size_t base; }
   struct _weakref { void* obj; size_t base; dtor_notify_t dtor_notify; }

Support for arrays and references has not been implemented. To keep the memory overhead down it may be necessary to compress array dimensions using a scheme similar to ASN.1 object identifiers, a reverse LEB encoding or the sds string library scheme for compressed array sizes.

The idea to use destructor notifiers is to support zeroing weak references, and to avoid needing to maintain a secondary weak reference count and separate allocations for the reference count. This assumes code using this library to serialize object graphs would use a reference counting object allocator consistently. i.e. a shared_from_this would conceptually be a no-op and references could degrade to pointers. The challenge would be cramming what we need into say 16 bytes and minimizing alloc/unref overhead. destructor notifiers might need 32-bit relative addresses to sufficiently compress function pointers. One might even need the support of some special relocations in the linker.

The ultimate goal is to expose something like the following to C:

   obj_write(T)(stream, obj)
   obj_read(T)(stream, obj)

and a reference counting interface with destructor notification:

   obj_alloc(T)(obj)
   obj_ref(T)(obj)
   obj_unref(T)(obj)
   obj_dtor_notify(T)(obj,target,func)
   obj_dtor_denotify(T)(obj,target,func)
   obj_weakref(T)(obj)
   obj_weakunref(T)(obj)

The plan is to make something that can be used from C++ as a foundation for simulation object state serialization, but also making sure that components using this architecture can be written in C. It could be that we model simplified classic inheritance and Rust or Zig style traits and interfaces, the main rationale being that whatever we implement, we expose a mapping to C. The ideas regards arrays and references are still somewhat sketchy. Might be that we need compiler support for references, bounded arrays and closure scoped destructors in C. Pie in the sky?

In any case, this is mainly a heads up to see if folk are interested and would care to give feedback or collaborate. The Git repository is here:

- https://github.com/michaeljclark/crefl/

Regards,
Michael