Our team hasn't decided on a representation yet, so the statements
below are my own thoughts/ideas.
The problem is that aggregates typically need to be materialized
in memory; since that memory is completely abstracted out by
FCAs, they're actually a very leaky abstraction. Handling them
well is just a lot of added complexity for every consumer of IR.
That's why we try to minimize their use outside of places where
they're really required.
Yes, but they are not always materialized, for example, when passed in
registers. If the front-end needs to take the address of the
aggregate, it can generate an alloca/store pair. Optimizing out the
alloca/store (in cases where the aggregate was already materialized on
the stack) could be a peephole optimization done in llc.
LLVM IR is not a high-level representation. "LL" is right there
in the name. If it's critical that you maintain exact compatibility
with your platform ABI for arbitrary types, you are signing yourselves
up for preserving full C type information through a representation
that does not naturally do so. Go ahead and read the rules for
passing a struct on x86-64 if you don't see the complexity here.
The IR is already "tainted" with ABI/C type system hints which are not
required for correct execution of the bitcode (e.g. "byval", "sret",
"zext/sext"). I don't see a problem with making this relation more
formal, to include a little extra type information.
X86-64's ABI is daunting, but I'm not convinced it is impossible to
lower this correctly given a few tweaks to the LLVM type system. (for
example, by adding union and complex types).
One idea would be to have a bitcode -> bitcode pass which is
responsible for lowering the higher-level representations (structs,
unions, complex) down to simple types, in a target-specific way. This
includes coercing parameters, expanding va_arg, and so on. This would
move the target-specific lowering into LLVM and out of the front-ends.
(This pass would be idempotent)
To avoid reducing the effectiveness of bitcode optimization, this pass
would be run (in opt) before the optimization passes. For PNaCl, we
would skip this pass during opt, trading some optimization in exchange
for portable bitcode output. We would still need to run this pass
before the bitcode hits SelectionDAGBuilder, but this could be done
inside llc, when the target is known.
I don't understand why it isn't sufficient to only guarantee platform
ABI compatibility for a more restricted set of types. Presumably
native entrypoints are identified in some special way in the
frontend, which you could use to check that the API doesn't
require passing or returning aggregates by value. You could then
teach your frontends to avoid trying to pass structs by value
on interoperation boundaries, e.g. by always passing structs
The problem is not interaction with native entry-points, so much as
interaction with existing code generated by other compilers. There are
already long-standing NaCl compilers for X86-32 and X86-64 (based on
gcc) with which we are trying to maintain ABI-compatibility.
Unfortunately, the interaction between the two involves C++ APIs which
use small struct passing. If this turns out to be impossible, then
modifying the other compilers may be our only option. This is our
"nuclear option", since it would involve a lot of organizational pain.
There are other benefits to having a single bitcode ABI for all
targets, at least from our point of view. It would free the front-ends
from having to write target-specific ABI lowering for PNaCl. Anything
that targets 32-bit little-endian bitcode could target PNaCl, and thus
the web, without extensive modifications. It would also give us better
test coverage of the now-obscure bitcode features we are usiing
(direct aggregate argument passing, returning, and ABI lowering in
llc). Any time PNaCl needs to diverge from the mainstream, it gives
our small team a headache and delays our release.
Hoping I haven't made a fool of myself,