Dynamic configuration for llvmc2

I've been working on some minor enhancements for llvmc2, but before I submit them, I'd like to know more about where the driver is going. Right now, llvmc2 uses TableGen at build time to statically create C++ files with hardcoded paths and command lines. The llvmc2 documentation seems to imply that the static TableGen-based configuration will eventually be replaced by some kind of dynamically loadable configuration system: "At the time of writing LLVMC does not support on-the-fly reloading of configuration..."

It seems to me that a dynamic configuration not based on TableGen would be more in line with the "one-binary-fits-all" spirit of LLVM: that is, it seems strange that LLVM and clang don't have to be recompiled in order to run under different architectures, but llvmc2 does have to be recompiled in order to cross-assemble or cross-link, since the paths to the assembler and linker are hard-coded.

Is there work being done on this?

Thanks,
Patrick

Hi, Patrick

First of all, thanks for your interest in llvmc2! Your feedback is very valuable.

The llvmc2 documentation
seems to imply that the static TableGen-based configuration will
eventually be replaced by some kind of dynamically loadable
configuration system: "At the time of writing LLVMC does not support
on-the-fly reloading of configuration..."

Actually, that line was removed from the documentation not very long ago:-)
We now support dynamic plugins (via .so files and the -load option). So TableGen
is going to stay, but you won't have to recompile the whole driver to alter
llvmc2's behaviour (only your plugin).

The plan now is to make this functionality easier to use, so you'll be able to
just say:

llvmc2 --load MyPlugin.td MyFile.cpp

and have llvmc2 build and load MyPlugin.td behind the scenes.

If you still find this scheme cumbersome, I'll be glad to hear your comments.

Thanks for the reply!

The plan now is to make this functionality easier to use, so you'll be able to
just say:

llvmc2 --load MyPlugin.td MyFile.cpp

and have llvmc2 build and load MyPlugin.td behind the scenes.

That sounds like a good idea.

Part of the reason I suggested making the system less dependent on TableGen is that in the process of developing a plugin I've found some things to be difficult to express in the current framework. For instance, when parsing options, I've found that often I want to save some kind of information in a variable and then use that variable when constructing the command line. This is not possible without hooks, but hooks are limited in what they can do (they can only do simple string substitution).

Basically, my concern with TableGen is that a lot of work has gone into creating a small domain-specific scripting language in TableGen, when maybe a real scripting language would be easier to use (not to mention to maintain), and would have the added benefit of not having to recompile.

How would you feel about a plugin that reads its configuration from an external specs file written in, say, Lua? It looks like the design of llvmc2 allows plugins that dynamically set up nodes and edges instead of statically reading them, and it could be a good solution for drivers with complex requirements. I could start work on such a thing if you thought it was a good idea.

Patrick

Hello, Patrick

Basically, my concern with TableGen is that a lot of work has gone into
creating a small domain-specific scripting language in TableGen, when
maybe a real scripting language would be easier to use (not to mention
to maintain), and would have the added benefit of not having to recompile.

One of the main design concerns is that llvmc2 should be completely
self-hosted. No extra perl/python/lua/php/whatever_language_your_know
interpreter. Thus is generated c++ sources from .td descriptions.

Allowing you to write plugins in arbitrary language, which will
dynamically populate compilation graphs, etc is just nice "side
effect" :slight_smile:

Why is this though? It doesn't seem to me that creating a new domain-specific language buys us much if anything over using a well-tested, small embedded language. I keep hitting bugs in the llvmc2 TableGen implementation. Correct me if I'm wrong here, but it looks to me like TableGen was never really intended to be a programming language, which is what llvmc2 uses it as in some cases.

Patrick

Hi,

I've found that often I want to save some kind of information in a
variable and then use that variable when constructing the command
line. This is not possible without hooks, but hooks are limited in
what they can do (they can only do simple string substitution).

Would it help if it was allowed to pass arguments to hooks? So that
you could write, for example:

(cmd_line "$CALL(MyHook, $INFILE, $OUTFILE)")

Basically, my concern with TableGen is that a lot of work has gone
into creating a small domain-specific scripting language in
TableGen, when maybe a real scripting language would be easier to
use.

As Anton said, that was intentional. We wanted to minimize the number
of dependencies and keep the driver lean and mean.

How would you feel about a plugin that reads its configuration from
an external specs file written in, say, Lua? It looks like the
design of llvmc2 allows plugins that dynamically set up nodes and
edges instead of statically reading them, and it could be a good
solution for drivers with complex requirements. I could start work
on such a thing if you thought it was a good idea.

I think that such a project could be useful - as long as it is
implemented strictly as a plugin (of course, some things could be
changed/added to the core to make life easier for you - but we're not
going to add a dependency on the whole lua VM).

However, to make this work you will need more than just the ability to
modify the compilation graph. One thing that comes to mind is the need
to provide Lua implementations for the various Edge* classes
(otherwise you can't use edges with non-default weights). You'll also
need to interface with LLVM's CommandLine library somehow (easy to do
in C++, where we just auto-generate the appropriate objects).

Would it help if it was allowed to pass arguments to hooks? So that
you could write, for example:

(cmd_line "$CALL(MyHook, $INFILE, $OUTFILE)")

Well, what I found myself wanting was a dynamic (strconcat) dag that could join together strings and (call MyHook, INFILE, OUTFILE) dags.

As Anton said, that was intentional. We wanted to minimize the number
of dependencies and keep the driver lean and mean.

Definitely a good idea, which is why I wouldn't suggest Python or Perl :slight_smile: For my plugin I would probably just add the Lua VM into the tree, so that there wouldn't be a dependency at all. It's under a compatible MIT/X11 license and is only 17k lines of ANSI C that should add around 150k to the driver. For me the driver is about 350k, so that would mean a driver around 500k, which doesn't seem that big of a difference.

I think that such a project could be useful - as long as it is
implemented strictly as a plugin (of course, some things could be
changed/added to the core to make life easier for you - but we're not
going to add a dependency on the whole lua VM).

However, to make this work you will need more than just the ability to
modify the compilation graph. One thing that comes to mind is the need
to provide Lua implementations for the various Edge* classes
(otherwise you can't use edges with non-default weights). You'll also
need to interface with LLVM's CommandLine library somehow (easy to do
in C++, where we just auto-generate the appropriate objects).

Right. I'll work on a proof of concept when I get some time. I anticipated this would be a bit of a hard sell, but I really think that a scriptable llvmc2 would be the right thing for several use cases.

Patrick

Hi,

Patrick Walton <pcwalton <at> cs.ucla.edu> writes:

Well, what I found myself wanting was a dynamic (strconcat) dag that
could join together strings and (call MyHook, INFILE, OUTFILE) dags.

Can you give an example of how you would use (strconcat)? Wouldn't
you also need (strcmp)?

:slight_smile: For my plugin I would probably just add the Lua VM into the tree,
so that there wouldn't be a dependency at all.

Okay, but please make sure that llvmc2 still compiles with lua
scripting disabled.

I.e. plain "make" should build a bare-bones version of llvmc2, and

make BUILTIN_PLUGINS="lua ..."

should build llvmc2 with lua scripting enabled.

Right. I'll work on a proof of concept when I get some time.

Good luck!

Hello, Patrick

As Anton said, that was intentional. We wanted to minimize the number
of dependencies and keep the driver lean and mean.

Definitely a good idea, which is why I wouldn't suggest Python or Perl :slight_smile:
For my plugin I would probably just add the Lua VM into the tree, so that
there wouldn't be a dependency at all. It's under a compatible MIT/X11
license and is only 17k lines of ANSI C that should add around 150k to the
driver. For me the driver is about 350k, so that would mean a driver around
500k, which doesn't seem that big of a difference.

Honestly speaking, size does not matter much. We do care about speed,
that's why we always prefer to generated sources to some scripting
languages. Consider pretty typical situations, when compiler is
invoked on bunch of small files at -O0 level, or during generation of
PCH. In such situations the overhead of compiler driver is pretty
visible and having extra VM will lead to funny situation, when pure
compiler driver time will dominate over compilation time itself.

If you don't care about such things - go ahead and think about sane
design proposal, how one can hook any extra scripting language for
llvmc2 without slowdown of 'main path'. This surely will be accepted!

If you don't care about such things - go ahead and think about sane
design proposal, how one can hook any extra scripting language for
llvmc2 without slowdown of 'main path'. This surely will be accepted!

Okay, I'll focus on this. The idea is that the TableGen-based configuration would be preloaded into llvmc2, and then user-specified Lua scripts, if supplied, would be able to augment or modify the options, edges, and tools. This would give users an easy way to customize llvmc2 to fit their needs.

Another interesting use case would be with out-of-tree front ends. They could supply a Lua script so that when the user installs a new language that uses the LLVM framework, their llvmc2 driver automatically gains support for it without having to recompile. Cross-compiling would be another example (and this is actually why I proposed this in the first place): the user could augment the stock llvmc2 with a script that replaces the system assembler and linker with the assembler and linker for the target.

How does this sound?

Patrick

Hello, Patrick

How does this sound?

You need to be really careful and separate different aims. Currently
you will definitely sacrifice speed for scriptability. As for
cross-compilers - all $ENV & hooks stuff was introduced to
transparently switch from one compiler tree into another. Maybe this
approach needs to be generalized / rethought.

In general, I'd not go this way due to reasons I already mentioned.
Maybe it will be better to formulate the aims you want to achieve and
think how they can be solved within current infrastructure, or, if
this won't be enough - think how it can be extended in the full
generality.

Ok, so runtime scripting languages are out, owing to speed concerns.

I still don't think that TableGen is good as a programming language though. I think it's inevitable that a simple graph description is not enough for an industrial-grade compiler driver, with the plethora of options it needs to support.

IMO a tiny domain-specific programming language would be better. Maybe such a language could be implemented with LLVM itself. That is, the llvmc configuration language could be compiled into .so files with the aid of LLVM. I keep running into bugs in TableGen because it uses token pasting to create C++ program logic. This works fine for data description, but LLVMC tries to make function bodies this way. I think this approach is more trouble than it's worth.

Would a minimalist custom language that statically compiles configuration into LLVMC be the right design, or would you suggest something else?

Patrick

Yeah, lets call it specs. :frowning: