Header in bitcode format 3.0?

Hello LLVM team,

In the past I've worked on a PEG parser generator for any LLVM-based language to use. One obstacle we ran into when generating LLVM IR assembly was that we'd end up cutting and pasting a list of declarations and aliases into every .ll file that needed to link with the others. I'd propose that in the Bitcode 3.0 format, that a header definition be added to the IR assembly format using a FoldingSet to make sure that only unique headers are fetched recursively. This would be primarily useful for making the bitcode a true virtual machine instead of just a pure intermediate representation of code written in other languages.

The real reason I'd like to do this is that PNaCl and other projects that are trying to build a target-neutral backend for cross-platform bitcode usage cannot currently implement a target-neutral runtime library without a sandbox running on every platform. If accepted, my proposal would make a much thinner OS abstraction layer of headers that function as link-time substitutions.

I've already started some of the work of abstracting the C runtime library with a project called the LLVM Wrapper found at http://sourceforge.net/projects/llvmlibc/ . At present, file accesses are typecast from FILE * typedefs to byte pointers and back again by a small linker library. When running the link-time optimizer, most of the library just optimizes away into oblivion, leaving equivalent code for x86 on Windows, Mac, and Linux. The caveat for this implementation is that the target triple must be specified in the call to llc. But I digress.

Would such an arrangement be welcome to the LLVM project? If you have any questions, just ask me on the list.

Thanks,

--Sam Crow

Hi Samuel,

That seems an interesting concept, more or less like guarded headers
in C or import mechanisms in Java or Python. However, that goes a bit
off the track with regards to IR.

The more semantics you add to IR, the more complex the middle-end
needs to be to deal with the idiosyncrasies and the less powerful is
the compiler. I for one always welcome changes in the IR regarding
readability and correct representation. I have proposed to (and failed
to convince) the list on a few modifications (unions type, more
complex structures, bit-fields, meta-attributes, sub-target
properties, etc) and I failed to convince why it would be worthy: IR
has to work with any language and any target.

Note that that's not the same as to say that *the same* IR has to work
across languages and targets (as I originally thought). LLVM fails to
accomplish that, and it's clear how that hinders PNaCl's model.

But adding more semantics to your particular problem will complicate
things for others. Ultimately, I think there are only two feasible
approaches to change the IR (except clear representation changes, like
exception handling and debug information):

1. Domain-specific wrappers: In your case, having a domain-specific
header-engine would enable you to distribute pieces of non-functional
IR and grab them via this mechanism to join in the target and execute
code in a less cumbersome way. That is less than ideal, but it doesn't
push deployment capabilities to a clearly focused intermediate
representation, and provide you with the state-machine you wanted.

2. Meta-IR: The IR that is compiled by the back-end doesn't
necessarily need to be the one your front-end generates. You can have
some meta-features (not metadata) on your IR in the form of
instructions that you think it's the right way of doing things, but
don't work. Than, just before the middle-end starts, you create a few
*correction* passes, that understands your meta-features and transform
the IR into a less-readable, less-semantic IR that the middle-end and
the back-end understand.

You can do both, and we have considered (but not implemented) the
second approach for our front-end representation. For now, we generate
the same thing as Clang and llvm-gcc, which is less than ideal.

One example is the struct byval. The ARM back-end still doesn't
support struct byval (maybe now it does, I was away for a while), but
it does implement array byval. So we had to convert every structure
into an array pointer, changing the signature of every struct byval
function and you can guess the delicate relationship with other
modules and so on. C++ strucures, bit-fields and unions provide a
plethora of examples for messing up the IR representation, so all C++
front-ends could benefit from that pre-middle-end pass.

In essence, I'm proposing to wrap semantics around the IR, because
that gives you the freedom to implement your functionality without
loss of semantics, but that also probably means the IR will never grow
out of its scope. But the more I think of it, the more it makes sense
not to. It's the same as run-time optimisation, once you had done it
to one machine, it doesn't make sense to transport that IR to another
machine, even if it's of the same type, because its use of it will be
different.

I hope not to have created more doubts, but that more or less answer
why the IR hasn't changed much for a while. Now, for the costs of
keeping a third party wrapper, it depends. I'd try to upstream the
wrapper (maybe as a plugin) rather than try to change the IR
structure.

My tuppence.

cheers,
--renato

[snip]

One example is the struct byval. The ARM back-end still doesn't
support struct byval (maybe now it does, I was away for a while),

The ARM backend now supports struct byval for APCS. Extending it to support AAPCS shouldn't be too difficult. Alas, I won't have time to revisit this in the near future.

stuart

It's slightly unclear to me what byval means for an ABI that passes
some structs in registers, such as AAPCS-VFP.

deep

Hi Sandeep,

The ARM backend now supports struct byval for APCS. Extending it to support AAPCS shouldn't be too difficult. Alas, I won't have time to revisit this in the near future.

It's slightly unclear to me what byval means for an ABI that passes
some structs in registers, such as AAPCS-VFP.

I think in that case the front-end is supposed to extract the bits passed in
registers and pass them as registers (i.e. as float/integer parameters) and
pass the rest of the struct (i.e. the bit that is passed on the stack) as a
byval parameter.

Ciao, Duncan.

That's not what the EABI says.

Chapter 6.1.2 of the AAPCS list the only cases (floating point
literals or homogeneous aggregates) in which it does use the VFP
registers. For all other cases (as per 6.1.2.2), it uses the base
standard.

In AAPCS_VFP mode, the code below returns the structure in {d0, d1}:

typedef struct { double x; double y; } vector;
vector g() {
  vector a = { 1.2, 2.3 };
  return a;
}

the code below return the structure as a pointer:

struct Foo {
  int a;
  double b;
};

struct Foo f() {
  struct Foo f = { 42, 3.14 };
  return f;
}

cheers,
--renato

IIUC, the byval attribute on a pointer to a structure means "this struct should really be passed by value." If you ignore the byval attribute, the IR mis-represents what the developer wrote.

The target ISel is supposed to notice the byval attribute and replace the pointer argument with a copy of the structure in the generated code.

I'm not familiar with AAPCS-VFP, but I'd assume that any byval struct should be passed by value, as if byval didn't exist.

The ABI should not be aware that byval is in use.

stuart

Not if the argument(s) fits into the specified registers. There's
where the ABI comes in.

In a nutshell, the AAPCS-VFP extends the AAPCS to include cases where
the arguments are floating point, either literals or small vectors,
when they fit in VFP registers. This is similar to the AAPCS extending
the base standard regarding standard registers.

cheers,
--renato

O.K. I guess I wasn't clear. When I said "by value" above, I meant "by value, whether in memory or registers".

stuart