======================================== Mapping High-Level Constructs to LLVM IR ======================================== .. contents:: :local: :depth: 2 Introduction ============ In this document we will take a look at how to map various classic high-level programming language constructs to LLVM IR. The purpose of the document is to make the learning curve less steep for LLVM so that more may come to use it. Background ========== The language constructs that will initially be covered here are those that are found in most contemporary *imperative* languages. The document consists of two parts: #. The first parts show how to map advanced constructs into plain C. #. The second part shows how to map plain C constructs into LLVM IR. Part 1: Mapping Object-Oriented Constructs to C =============================================== In this chapter we'll look at various object-oriented constructs and see how they can be mapped to C. I personally feel that the vast majority of compiler construction books skip lightly and all too easy over this part as it may initially seem complicated, even though it is in fact very simple. Classes ------- A class is basically nothing more than a structure with an associated set of functions that take an implicit first parameter, namely a pointer to the structure. Therefore, is is very trivial to map a class to C: .. code-block:: cpp #include class Foo { public: Foo() { _length = 0; } size_t GetLength() const { return _length; } void SetLength(size_t value) { _length = value; } private: size_t _length; }; We first transform this code into two separate pieces: #. The structure definition. #. The list of methods, including the constructor. .. code-block:: c struct Foo { size_t _length; }; void Foo_Create(struct Foo *this) { this->_length = 0; } size_t Foo_GetLength(struct Foo *this) { return this->_length; } void Foo_SetLength(struct Foo *this, size_t value) { this->_length = value; } Then we make sure that the constructor (``Foo_Create``) is invoked whenever an instance of the structure is created: .. code-block:: cpp Foo foo; .. code-block:: c struct Foo foo; Foo_Create(&foo); Virtual Methods --------------- A virtual method is basically no more than a compiler-controlled function pointer. Each virtual method is recorded in the ``vtable``, which is a structure of all the function pointers needed by a given class: .. code-block:: cpp class Foo { public: virtual int GetLengthTimesTwo() const { return _length * 2; } }; Foo foo; foo.SetLength(4); return foo.GetLengthTimesTwo(); .. code-block:: c struct Foo { struct Foo_vtable *_vtable; size_t _length; }; struct Foo_vtable_type { int (*GetLengthTimesTwo)() const; }; int Foo_GetLengthTimesTwo(struct Foo *this) const { return _length * 2; } struct Foo_vtable_type Foo_vtable_data = { &Foo_GetLengthTimesTwo }; void Foo_Create(struct Foo *this) { this->_vtable = &Foo_vtable_data; this->_length = 0; } Foo foo; Foo_Create(&foo); Foo_SetLength(&foo, 4); return foo->_vtable.GetLengthTimesTwo(&foo); Please notice that some C++ compilers store ``_vtable`` at a negative offset into the structure so that things like ``memcpy(this, 0, sizeof(*this))`` work, even though such commands should always be avoided in an OOP language. Interfaces ---------- An interface is basically nothing more than a base class with no data members. As such, we've already described how to convert an interface to C. Run-Time Type Identification (RTTI) ----------------------------------- As far as I know, RTTI is simply done by adding two fields to the ``_vtable`` structure: ``parent`` and ``signature``. The former is a pointer to the vtable of the parent class and the latter is the mangled (encoded) name of the class. To see if a given class is another class, you simply compare the ``signature`` fields. To see if a given class is a derived class of some other class, you simply walk the chain of ``parent`` fields, while checking if you have found a matching signature. I am by no means an expert on RTTI, so I may be wrong about the above. The ``new`` Operator -------------------- The ``new`` operator is generally nothing more than a type-safe version of the C ``malloc`` function - in some implementations of C++, they may even be called interchangably without causing unseen or unwanted side-effects. All calls of the form ``new X`` are mapped into: .. code-block:: c X *tmp = (X *) malloc(sizeof(X)); X_Create(tmp); Calls of the form ``new X(Y, Z)`` are mapped into: .. code-block:: c X *tmp = (X *) malloc(sizeof(X)); X_Create(tmp, Y, Z); Exceptions ---------- Exceptions can be implemented in one of two ways: #. The easy way (by using a propagated return value). #. The hard way (by using zero overhead stack unwinding). As I only know how the first is implemented, that is the method I'll describe here: .. code-block:: cpp void Bar() { Foo foo; try { foo.SetLength(17); throw new Error("Out of sensible things to do!"); } catch (Error *that) { foo.SetLength(24); delete that; } } Very simply maps to the following code: .. code-block:: c struct Exception *Bar() { struct Exception *status = NULL; struct Foo foo; Foo_Create(&foo); /* "try" statement becomes this: */ struct Exception *status = NULL; /* Body of "try" statement becomes this: */ Foo_SetLength(&foo, 17); status = (struct Exception *) malloc(sizeof(struct Exception)); Exception_Create(status, "Out of sensible things to do!"); /* "catch" block becomes this: */ if (inheritsfrom(status, "Exception")) { Foo_SetLength(&foo, 24); Exception_Delete(status); free(status); status = NULL; } Foo_Delete(&foo); return status; } Part 2: Mapping C Constructs to LLVM IR ======================================= In this chapter we'll take a closer look at how to map the simple C constructs into equivalent LLVM IR. Epilogue ======== If you discover any errors in this document or you need more information than given here, please write to the friendly `LLVM developers `_ and they'll surely help you out or add the requested info to this document.