Lost in the documentation

In http://llvm.org/docs/FAQ.html, when taking about writing a compiler
that uses LLVM (at least I think that's what the FAQ question is asking),
the FAQ recommends

# Call into the LLVM libraries code using your language's FFI (foreign
function interface).

    * for: best tracks changes to the LLVM IR, .ll syntax, and .bc
           format
    * for: enables running LLVM optimization passes without a
           emit/parse overhead
    * for: adapts well to a JIT context
    * against: lots of ugly glue code to write

Now, which particular libraries would that be, and where are their API(s)
documented?

-- hendrik

In http://llvm.org/docs/FAQ.html, when taking about writing a compiler
that uses LLVM (at least I think that's what the FAQ question is asking),
the FAQ recommends

# Call into the LLVM libraries code using your language's FFI (foreign
function interface).

   * for: best tracks changes to the LLVM IR, .ll syntax, and .bc
          format
   * for: enables running LLVM optimization passes without a
          emit/parse overhead
   * for: adapts well to a JIT context
   * against: lots of ugly glue code to write

Now, which particular libraries would that be

With the exception of the 'util' and 'tools' directories, the entire LLVM source tree consists of libraries.

where are their API(s) documented?

http://llvm.org/docs/
http://llvm.org/doxygen/
http://llvm.org/docs/tutorial/
etc etc etc.

— Gordon

In http://llvm.org/docs/FAQ.html, when taking about writing a compiler
that uses LLVM (at least I think that's what the FAQ question is
asking),
the FAQ recommends

# Call into the LLVM libraries code using your language's FFI
(foreign
function interface).

   * for: best tracks changes to the LLVM IR, .ll syntax, and .bc
          format
   * for: enables running LLVM optimization passes without a
          emit/parse overhead
   * for: adapts well to a JIT context
   * against: lots of ugly glue code to write

Now, which particular libraries would that be

With the exception of the 'util' and 'tools' directories, the entire
LLVM source tree consists of libraries.

Indeed, quite a lot of them. Most of them appear to be internal. I'm
trying to identify the ones that are intended for use by LLVM users.

I have to say I missed the crucial paragraph:

: If you go with the first option, the C bindings in include/llvm-c
: should help a lot, since most languages have strong support for
: interfacing with C. The most common hurdle with calling C from managed
: code is interfacing with the garbage collector. The C interface was
: designed to require very little memory management, and so is
: straightforward in this regard.

Evidently I have to go look in include/llvm-c, since I stronlgly suspect
you didn't go to the trouble of writng a C wrapper for anything that
wasn't needed by an LLVM user. Anything internal you'd have left in C++.

So the API for a C++ *user* could be described as "those parts of the
internals API that happen to be used in implementing llvm-c.

What I found in llvm-c was core.h. Is that what I need to know for
writing a compiler front-end? Let's see. core.h seems to describe
building the LLVM code. BitWriter says how to write it to a file, should
that be desired. It's not clear what lto.h, Analysis.h. c/
ExecutionEngine.h do or why I'd need them. Target.h looks useful if I
have to include machine-dependencies into my code generator. Some things
I do may depend on the size of pionters and the like.

Putting this together with the tutorial, LLVM Tutorial: Table of Contents — LLVM 16.0.0git documentation,
which uses CAML instead of C, I think I may be able to get a clue.

where are their API(s) documented?

http://llvm.org/docs/
http://llvm.org/doxygen/
LLVM Tutorial: Table of Contents — LLVM 16.0.0git documentation
etc etc etc.

— Gordon

The doxygen page describes the complete internal structure of LLVM. It
explicitly says,

; This documentation describes the internal software that makes up LLVM,
; not the external use of LLVM. There are no instructions here on how to
; use LLVM, only the APIs that make up the software. For usage
; instructions, please see the programmer's guide or reference manual.

I haven't yet found a "programmer's guide".

The only reference manual I've found so far was "LLVM Language Reference
Manual", linked from the llvm.org/docs page. It describes a programming
language with a syntax. No doubt it is a textual representation of the
information to be transmitted using the API I'm looking for, but it
doesn't document the API. I can probably find what I'm looking for by
prowling the source code that implements this LLVM language, and seeing
what it calls, then looking those classes and methods in the doxygen
stuff. That's another way, complementary to guessing the realtionship
between the ocaml tutorial and Core.h.

-- hendrik

In http://llvm.org/docs/FAQ.html, when taking about writing a compiler
that uses LLVM (at least I think that's what the FAQ question is
asking),
the FAQ recommends

# Call into the LLVM libraries code using your language's FFI
(foreign
function interface).

  * for: best tracks changes to the LLVM IR, .ll syntax, and .bc
         format
  * for: enables running LLVM optimization passes without a
         emit/parse overhead
  * for: adapts well to a JIT context
  * against: lots of ugly glue code to write

Now, which particular libraries would that be

With the exception of the 'util' and 'tools' directories, the entire
LLVM source tree consists of libraries.

Indeed, quite a lot of them. Most of them appear to be internal. I'm trying to identify the ones that are intended for use by LLVM users.

include/llvm is all public (modulo some implementation details as required by the nature of C++). Private includes are in lib. But realize that not all users are front-end compilers. A back-end code generator is also a user of the framework; as is an IR optimization or analysis. The C++ interfaces support all of these clients equally.

VMCore and BitWriter are the libraries absolutely necessary for any static compiler that outputs bitcode. You'll likely want Analysis for the verifier; and Target for memory layout information. That's the basics.

I have to say I missed the crucial paragraph:

: If you go with the first option, the C bindings in include/llvm-c
: should help a lot, since most languages have strong support for
: interfacing with C. The most common hurdle with calling C from managed
: code is interfacing with the garbage collector. The C interface was
: designed to require very little memory management, and so is
: straightforward in this regard.

Evidently I have to go look in include/llvm-c, since I stronlgly suspect
you didn't go to the trouble of writng a C wrapper for anything that
wasn't needed by an LLVM user. Anything internal you'd have left in C++.

So the API for a C++ *user* could be described as "those parts of the
internals API that happen to be used in implementing llvm-c.

That's a rather poor definition. Only bindings for such features as have been required are authored. Still, if this helps you make sense of the framework, then that's fantastic; but remember that it is an imperfect rule.

Using the C bindings, it's still very important to understand the underlying C++ object model; otherwise, the type rules for the bindings will appear to be rather capricious.

Putting this together with the tutorial, LLVM Tutorial: Table of Contents — LLVM 16.0.0git documentation,
which uses CAML instead of C, I think I may be able to get a clue.

If you're not using ocaml, the C++ tutorial (the first one on that page) is probably more pertinent, even if you do intend to use the C bindings. Searching the implementation of the bindings (lib/VMCore/Core.cpp, etc.) is helpful for "going backwards" from C++ to C once you begin to understand the object model.

where are their API(s) documented?

http://llvm.org/docs/
http://llvm.org/doxygen/
LLVM Tutorial: Table of Contents — LLVM 16.0.0git documentation
etc etc etc.

— Gordon

The doxygen page describes the complete internal structure of LLVM. It
explicitly says,

; This documentation describes the internal software that makes up LLVM,
; not the external use of LLVM. There are no instructions here on how to
; use LLVM, only the APIs that make up the software. For usage
; instructions, please see the programmer's guide or reference manual.

I haven't yet found a "programmer's guide".

http://llvm.org/docs/ProgrammersManual.html

— Gordon

In http://llvm.org/docs/FAQ.html, when taking about writing a
compiler
that uses LLVM (at least I think that's what the FAQ question is
asking),
the FAQ recommends

# Call into the LLVM libraries code using your language's FFI
(foreign
function interface).

  * for: best tracks changes to the LLVM IR, .ll syntax, and .bc
         format
  * for: enables running LLVM optimization passes without a
         emit/parse overhead
  * for: adapts well to a JIT context
  * against: lots of ugly glue code to write

Now, which particular libraries would that be

With the exception of the 'util' and 'tools' directories, the entire
LLVM source tree consists of libraries.

Indeed, quite a lot of them. Most of them appear to be internal. I'm
trying to identify the ones that are intended for use by LLVM users.

include/llvm is all public (modulo some implementation details as
required by the nature of C++). Private includes are in lib. But realize
that not all users are front-end compilers. A back-end code generator is
also a user of the framework; as is an IR optimization or analysis. The
C++ interfaces support all of these clients equally.

VMCore and BitWriter are the libraries absolutely necessary for any
static compiler that outputs bitcode. You'll likely want Analysis for
the verifier; and Target for memory layout information. That's the
basics.

I have to say I missed the crucial paragraph:

: If you go with the first option, the C bindings in include/llvm-c :
should help a lot, since most languages have strong support for :
interfacing with C. The most common hurdle with calling C from managed
: code is interfacing with the garbage collector. The C interface was :
designed to require very little memory management, and so is :
straightforward in this regard.

Evidently I have to go look in include/llvm-c, since I stronlgly
suspect
you didn't go to the trouble of writng a C wrapper for anything that
wasn't needed by an LLVM user. Anything internal you'd have left in
C++.

So the API for a C++ *user* could be described as "those parts of the
internals API that happen to be used in implementing llvm-c.

That's a rather poor definition. Only bindings for such features as have
been required are authored. Still, if this helps you make sense of the
framework, then that's fantastic; but remember that it is an imperfect
rule.

Using the C bindings, it's still very important to understand the
underlying C++ object model; otherwise, the type rules for the bindings
will appear to be rather capricious.

Putting this together with the tutorial, LLVM Tutorial: Table of Contents — LLVM 16.0.0git documentation
,
which uses CAML instead of C, I think I may be able to get a clue.

If you're not using ocaml, the C++ tutorial (the first one on that page)
is probably more pertinent, even if you do intend to use the C bindings.
Searching the implementation of the bindings (lib/VMCore/ Core.cpp,
etc.) is helpful for "going backwards" from C++ to C once you begin to
understand the object model.

where are their API(s) documented?

http://llvm.org/docs/
http://llvm.org/doxygen/
LLVM Tutorial: Table of Contents — LLVM 16.0.0git documentation
etc etc etc.

— Gordon

The doxygen page describes the complete internal structure of LLVM. It
explicitly says,

; This documentation describes the internal software that makes up
LLVM,
; not the external use of LLVM. There are no instructions here on how
to
; use LLVM, only the APIs that make up the software. For usage ;
instructions, please see the programmer's guide or reference manual.

I haven't yet found a "programmer's guide".

LLVM Programmer’s Manual — LLVM 16.0.0git documentation

Here's what I have in mind to do with LLVM. Thanks. I have a few
languages to compile; all of them require garbage collection. I'll be
looking at the ocaml experience with some interest. How far I get into
implementing them depends on the available time. and the state of my
enthusiasm. It has been known to go missing, and it often gets diverted
to so-called real life.

One of these languages, Algol 68, I was working on about 35 years ago.
It was not finished mainly because at some point the machinery I was
developing it on became unavailable. It correctly ran over half of a
demanding test suite when the project stopped. It's now something I'd
like to finish more for old time's sake than any serious use. 35 years
ago, this compiler would run in about 900K memory. That was a dream
machine back then. Using an overlay linker, it could be crammed into
400K. It was written in Algol W, and could use a new portable code
generator. It used garbage collection at compile time, but on today's
machines I could probably get away with wholesale memory leakage.

To get it working, of course I need something that implements Algol W.
I've tinkered with translating Algol W to C or something similar. I
originally intended to translate the Algol 68 compiler into Algol 68, to
make it self-supporting, but I never got that far. I have an Algol W
parser, and at least one ancient attribute grammar that (too slowly)
translates it to something else. Since I'll only be using it to develop
Algol 68, which runs in 900K, I can probably dispense with garbage
collection and just use my 4 gigabyte RAM instead.

I also have a self-implementing program-transformation tool. It consists
of a recursive-descent parser generator, a tree-rewriting system, and an
unparser. In principle, it needs garbage collection. In practise, well,
I've said it before. Memories are large these days.

-- hendrik