That seems an interesting concept, more or less like guarded headers
in C or import mechanisms in Java or Python. However, that goes a bit
off the track with regards to IR.
The more semantics you add to IR, the more complex the middle-end
needs to be to deal with the idiosyncrasies and the less powerful is
the compiler. I for one always welcome changes in the IR regarding
readability and correct representation. I have proposed to (and failed
to convince) the list on a few modifications (unions type, more
complex structures, bit-fields, meta-attributes, sub-target
properties, etc) and I failed to convince why it would be worthy: IR
has to work with any language and any target.
Note that that's not the same as to say that *the same* IR has to work
across languages and targets (as I originally thought). LLVM fails to
accomplish that, and it's clear how that hinders PNaCl's model.
But adding more semantics to your particular problem will complicate
things for others. Ultimately, I think there are only two feasible
approaches to change the IR (except clear representation changes, like
exception handling and debug information):
1. Domain-specific wrappers: In your case, having a domain-specific
header-engine would enable you to distribute pieces of non-functional
IR and grab them via this mechanism to join in the target and execute
code in a less cumbersome way. That is less than ideal, but it doesn't
push deployment capabilities to a clearly focused intermediate
representation, and provide you with the state-machine you wanted.
2. Meta-IR: The IR that is compiled by the back-end doesn't
necessarily need to be the one your front-end generates. You can have
some meta-features (not metadata) on your IR in the form of
instructions that you think it's the right way of doing things, but
don't work. Than, just before the middle-end starts, you create a few
*correction* passes, that understands your meta-features and transform
the IR into a less-readable, less-semantic IR that the middle-end and
the back-end understand.
You can do both, and we have considered (but not implemented) the
second approach for our front-end representation. For now, we generate
the same thing as Clang and llvm-gcc, which is less than ideal.
In essence, I'm proposing to wrap semantics around the IR, because
that gives you the freedom to implement your functionality without
loss of semantics, but that also probably means the IR will never grow
out of its scope. But the more I think of it, the more it makes sense
not to. It's the same as run-time optimisation, once you had done it
to one machine, it doesn't make sense to transport that IR to another
machine, even if it's of the same type, because its use of it will be
I hope not to have created more doubts, but that more or less answer
why the IR hasn't changed much for a while. Now, for the costs of
keeping a third party wrapper, it depends. I'd try to upstream the
wrapper (maybe as a plugin) rather than try to change the IR
I've given some long, hard thought into how I would like to add the header block into the bitcode format. The bitcode format already has 6-bit character string support. I would like to add an optional block to the bitcode format similar to the MODULE_CODE_DEPLIB record format that would contain a list of 6-bit string records containing the filenames of the headers to be loaded in by the bitcode reader. The loader would read the block and add all of the filenames to a folding set and, once recursion into all of the files is complete, linking into the module would be enacted.
Since it would be an optional block, all of the existing bitcodes would be unaffected and likewise, only new code would contain the new block structure and even then, it would only appear if it is used by the frontend. I want this to be an unobtrusive change, after all. As far as writing my own wrapper, I'd just as soon not have to reinvent the LLVM Assembly parser just to add one optional keyword at the beginning followed by a comma-separated list of filenames.
Some of the other functions you mentioned in your message such as unions, and so on, would make the IR a high-level language thus defeating the purpose of calling it low-level. My proposal is different in that I'm just trying to make it compete with other system-specific Assembly language packages. I do, however, intend to make it functional as a virtual machine to the point of being able to run bitcode on multiple platforms. It has already been tested to work. Perhaps the LLVM Wrapper project might not take off but nonetheless, it's very frustrating to have the LLVM-as program not have any sort of header include functionality.
My purpose here isn't to make more work for everyone. Just the opposite. Nonetheless, I'd like to know if my patches will be rejected on these merits Renato mentioned, or if I may add my series of patches to make this work.
My own two cents worth,