Passing StructType arguments directly

David_Meyer · September 17, 2011, 10:49pm

Hello,

I am working with Ivan Krasin on the PNaCl target in Clang. Our ABI
requires that aggregates are passed as first class types in bitcode.

For example:
void foo(struct bar x);

Should compile to:
declare void foo(%struct.bar %x)

To implement this, I added a custom ABIInfo for PNaCl which uses
ABIArgInfo::Direct for aggregate arguments. However, the result of
this is:

define void @foo(i32 %x.coerce0, i32 %x.coerce1)

This reason appears to be code in lib/CodeGen/CGCall.cpp which always
coerces structure types:

      // If the coerce-to type is a first class aggregate, we flatten it and
      // pass the elements. Either way is semantically identical, but fast-isel
      // and the optimizer generally likes scalar values better than FCAs.
      if (llvm::StructType *STy =
            dyn_cast<llvm::StructType>(ArgI.getCoerceToType())) {
        Ptr = Builder.CreateBitCast(Ptr, llvm::PointerType::getUnqual(STy));

        for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {
          assert(AI != Fn->arg_end() && "Argument mismatch!");
          AI->setName(Arg->getName() + ".coerce" + Twine(i));
          llvm::Value *EltPtr = Builder.CreateConstGEP2_32(Ptr, 0, i);
          Builder.CreateStore(AI++, EltPtr);
        }
      }

What's the right way to bypass this step (only for target PNaCl) so
that I can actually pass aggregates directly?

Thanks,
- pdox

John_McCall1 · September 18, 2011, 9:21am

Your ABI requires a specific IR representation? One which the LLVM core developers are trying to strongly discourage?

Anyway, this step is not actually bypassable; there is just not any code in Clang to pass structs as true FCAs. You could probably convince Clang to pass your structure using byval, if that would be good enough. Otherwise, you'll probably have to add a whole new style of argument passing.

John.

David_Meyer · September 18, 2011, 10:08am

Hi John,

Thanks for your reply.

I am surprised to hear that this method of argument passing is
discouraged. Do you have a reference, or know the reasoning?

The IR is our primary representation. PNaCl is a project to bring LLVM
into the browser. For example, to recognize bitcode for JIT execution
(similar to Emscripten). In order to maintain linkability between
different front-ends, and to maintain platform ABI compatibility,
we're setting everything up to use a high-level representation.

I will try adding a new ABIArgInfo kind.

Thanks,
- pdox

John_McCall1 · September 19, 2011, 12:02am

I am surprised to hear that this method of argument passing is
discouraged. Do you have a reference, or know the reasoning?

The problem is that aggregates typically need to be materialized
in memory; since that memory is completely abstracted out by
FCAs, they're actually a very leaky abstraction. Handling them
well is just a lot of added complexity for every consumer of IR.
That's why we try to minimize their use outside of places where
they're really required.

I've CC'ed Dan Gohman, who can speak to this more.

The IR is our primary representation. PNaCl is a project to bring LLVM
into the browser.

Yes, I know what PNaCl is.

In order to maintain linkability between
different front-ends, and to maintain platform ABI compatibility,
we're setting everything up to use a high-level representation.

LLVM IR is not a high-level representation. "LL" is right there
in the name. If it's critical that you maintain exact compatibility
with your platform ABI for arbitrary types, you are signing yourselves
up for preserving full C type information through a representation
that does not naturally do so. Go ahead and read the rules for
passing a struct on x86-64 if you don't see the complexity here.

I don't understand why it isn't sufficient to only guarantee platform
ABI compatibility for a more restricted set of types. Presumably
native entrypoints are identified in some special way in the
frontend, which you could use to check that the API doesn't
require passing or returning aggregates by value. You could then
teach your frontends to avoid trying to pass structs by value
on interoperation boundaries, e.g. by always passing structs
indirectly.

John.

David_Meyer · September 19, 2011, 2:48am

Hi John,

Our team hasn't decided on a representation yet, so the statements
below are my own thoughts/ideas.

The problem is that aggregates typically need to be materialized
in memory; since that memory is completely abstracted out by
FCAs, they're actually a very leaky abstraction. Handling them
well is just a lot of added complexity for every consumer of IR.
That's why we try to minimize their use outside of places where
they're really required.

Yes, but they are not always materialized, for example, when passed in
registers. If the front-end needs to take the address of the
aggregate, it can generate an alloca/store pair. Optimizing out the
alloca/store (in cases where the aggregate was already materialized on
the stack) could be a peephole optimization done in llc.

LLVM IR is not a high-level representation. "LL" is right there
in the name. If it's critical that you maintain exact compatibility
with your platform ABI for arbitrary types, you are signing yourselves
up for preserving full C type information through a representation
that does not naturally do so. Go ahead and read the rules for
passing a struct on x86-64 if you don't see the complexity here.

The IR is already "tainted" with ABI/C type system hints which are not
required for correct execution of the bitcode (e.g. "byval", "sret",
"zext/sext"). I don't see a problem with making this relation more
formal, to include a little extra type information.

X86-64's ABI is daunting, but I'm not convinced it is impossible to
lower this correctly given a few tweaks to the LLVM type system. (for
example, by adding union and complex types).

One idea would be to have a bitcode -> bitcode pass which is
responsible for lowering the higher-level representations (structs,
unions, complex) down to simple types, in a target-specific way. This
includes coercing parameters, expanding va_arg, and so on. This would
move the target-specific lowering into LLVM and out of the front-ends.
(This pass would be idempotent)

To avoid reducing the effectiveness of bitcode optimization, this pass
would be run (in opt) before the optimization passes. For PNaCl, we
would skip this pass during opt, trading some optimization in exchange
for portable bitcode output. We would still need to run this pass
before the bitcode hits SelectionDAGBuilder, but this could be done
inside llc, when the target is known.

I don't understand why it isn't sufficient to only guarantee platform
ABI compatibility for a more restricted set of types. Presumably
native entrypoints are identified in some special way in the
frontend, which you could use to check that the API doesn't
require passing or returning aggregates by value. You could then
teach your frontends to avoid trying to pass structs by value
on interoperation boundaries, e.g. by always passing structs
indirectly.

The problem is not interaction with native entry-points, so much as
interaction with existing code generated by other compilers. There are
already long-standing NaCl compilers for X86-32 and X86-64 (based on
gcc) with which we are trying to maintain ABI-compatibility.
Unfortunately, the interaction between the two involves C++ APIs which
use small struct passing. If this turns out to be impossible, then
modifying the other compilers may be our only option. This is our
"nuclear option", since it would involve a lot of organizational pain.

There are other benefits to having a single bitcode ABI for all
targets, at least from our point of view. It would free the front-ends
from having to write target-specific ABI lowering for PNaCl. Anything
that targets 32-bit little-endian bitcode could target PNaCl, and thus
the web, without extensive modifications. It would also give us better
test coverage of the now-obscure bitcode features we are usiing
(direct aggregate argument passing, returning, and ABI lowering in
llc). Any time PNaCl needs to diverge from the mainstream, it gives
our small team a headache and delays our release.

Hoping I haven't made a fool of myself,
- pdox

John_McCall1 · September 19, 2011, 6:35am

LLVM IR is not a high-level representation. "LL" is right there
in the name. If it's critical that you maintain exact compatibility
with your platform ABI for arbitrary types, you are signing yourselves
up for preserving full C type information through a representation
that does not naturally do so. Go ahead and read the rules for
passing a struct on x86-64 if you don't see the complexity here.

The IR is already "tainted" with ABI/C type system hints which are not
required for correct execution of the bitcode (e.g. "byval", "sret",
"zext/sext"). I don't see a problem with making this relation more
formal, to include a little extra type information.

byval is required for correct execution of bitcode even when prototypes
match, and zext/sext are important for correctness under "compatible"
prototypes. LLVM calls do not seem to incorporate C's undefined
behavior rules about calling something with the wrong prototype,
which is good, because the various C/C++ frontends do not always
agree about how to lower a C prototype into IR.

X86-64's ABI is daunting, but I'm not convinced it is impossible to
lower this correctly given a few tweaks to the LLVM type system. (for
example, by adding union and complex types).

See, this is exactly what I'm worried about. I don't want to take
patches that add complexity to clang based on an untested hypothesis
that they'll be useful if you subsequently add an unknown amount of
complexity to LLVM. That's particularly true when you haven't
actually validated that those LLVM changes will be welcome.

John.

David_Meyer · September 19, 2011, 7:36am

John,

byval is required for correct execution of bitcode even when prototypes
match, and zext/sext are important for correctness under "compatible"
prototypes. LLVM calls do not seem to incorporate C's undefined
behavior rules about calling something with the wrong prototype,
which is good, because the various C/C++ frontends do not always
agree about how to lower a C prototype into IR.

byval.. because of the hidden copy?

Now that you mention it, enforcing prototypes for PNaCl might be a
good idea, since the behavior is not guaranteed to be portable.

X86-64's ABI is daunting, but I'm not convinced it is impossible to
lower this correctly given a few tweaks to the LLVM type system. (for
example, by adding union and complex types).

See, this is exactly what I'm worried about. I don't want to take
patches that add complexity to clang based on an untested hypothesis
that they'll be useful if you subsequently add an unknown amount of
complexity to LLVM. That's particularly true when you haven't
actually validated that those LLVM changes will be welcome.

I intend to validate any major changes before they are made.

In either case, PNaCl will definitely be using direct struct passing,
whether or not we can make the result match the platform ABI, so
there's no controversy over the code written so far.

- pdox

John_McCall1 · September 19, 2011, 4:37pm

byval is required for correct execution of bitcode even when prototypes
match, and zext/sext are important for correctness under "compatible"
prototypes. LLVM calls do not seem to incorporate C's undefined
behavior rules about calling something with the wrong prototype,
which is good, because the various C/C++ frontends do not always
agree about how to lower a C prototype into IR.

byval.. because of the hidden copy?

Yes.

X86-64's ABI is daunting, but I'm not convinced it is impossible to
lower this correctly given a few tweaks to the LLVM type system. (for
example, by adding union and complex types).

See, this is exactly what I'm worried about. I don't want to take
patches that add complexity to clang based on an untested hypothesis
that they'll be useful if you subsequently add an unknown amount of
complexity to LLVM. That's particularly true when you haven't
actually validated that those LLVM changes will be welcome.

I intend to validate any major changes before they are made.

I'm not asking you to undergo code review, which would happen in any
case. I'm asking you to verify that your technical direction is acceptable
downstream before muddying up clang with code that's only useful if it is.

In either case, PNaCl will definitely be using direct struct passing,
whether or not we can make the result match the platform ABI, so
there's no controversy over the code written so far.

But if you don't care about matching the platform ABI, there's no reason
you can't use byval and/or indirection.

John.

David_Meyer · September 19, 2011, 8:13pm

John,

I'm not asking you to undergo code review, which would happen in any
case. I'm asking you to verify that your technical direction is acceptable
downstream before muddying up clang with code that's only useful if it is.

I most certainly will get the blessings of llvmdev and Chris before
embarking on any major technical changes.

But LLVM already has struct types which can be passed directly. It
makes perfect sense for Clang to be able to emit this, and the patch
is trivial. How many other people have asked for or may want this
feature?

Also, this change is immediately useful to us without any changes in
technical direction. See my comments below.

In either case, PNaCl will definitely be using direct struct passing,
whether or not we can make the result match the platform ABI, so
there's no controversy over the code written so far.

But if you don't care about matching the platform ABI, there's no reason
you can't use byval and/or indirection.

We *do* care about matching the platform ABI, and we also care about
performance (register-passing for small structs). Using byval in the
usual sense would commit us to passing structs on the stack.

Using the non-byval struct argument form is crucial for two reasons:

1) The front-ends don't already expect this form to work in a certain
way, which gives us the ability to fix how it is handled in the
backends. With the information already stored in bitcode, we can lower
"simple" structures correctly in the LLVM target backends, even for
the X86-64 ABI. For our needs, even partial compliance is a lot better
than no compliance at all. Other users of this form may also
appreciate having the expansion better match the target ABI.

Right now, struct arguments are flattened element by element, which
creates a non-standard calling convention. My understanding is that
the original intent was to do the correct ABI lowering here, but those
plans were scaled back because of the missing ABI information.

2) Using this form doesn't require us to create a copy to obtain a
pointer (in SelectionDAGBuilder). It requires the front-ends to do
this in the bitcode, but opt can optimize this out.

- pdox

Dan_Gohman3 · September 19, 2011, 8:35pm

I am surprised to hear that this method of argument passing is
discouraged. Do you have a reference, or know the reasoning?

The problem is that aggregates typically need to be materialized
in memory; since that memory is completely abstracted out by
FCAs, they're actually a very leaky abstraction. Handling them
well is just a lot of added complexity for every consumer of IR.
That's why we try to minimize their use outside of places where
they're really required.

I've CC'ed Dan Gohman, who can speak to this more.

One of the big problems with FCAs is that they violate the principle
of LLVM IR that it's supposed to be reasonably efficient code even
when the optimizer misses some opportunities. With FCAs, if the
optimizer accidentally leaves an FCA sitting around in an unlucky
place, the result is a carnival of spilling.

FCAs were invented as a generalization of the earlier multiple return
value feature, where aggregates were first-class in some places and
not in others. The problem FCAs solved was some awkward special
cases and some mild complexity.

First-class aggregates were also motivated by the desire to make
some kinds of front-end code easier to write, though in retrospect
this is dubious.

Because of their awkwardness, FCAs haven't been broadly useful.
They're generally discouraged whenever there's a reasonable
alternative.

In retrospect, the old multiple return value approach was probably
less complex overall, but there's too much inertia behind FCAs at
this point to easily go back.

The IR is already "tainted" with ABI/C type system hints which are not
required for correct execution of the bitcode (e.g. "byval", "sret",
"zext/sext"). I don't see a problem with making this relation more
formal, to include a little extra type information.

X86-64's ABI is daunting, but I'm not convinced it is impossible to
lower this correctly given a few tweaks to the LLVM type system. (for
example, by adding union and complex types).

While PNaCl may be able to get away with picking x86-64 and ignoring
everything else, LLVM upstream supports multiple ABIs. Platform ABIs
are defined in terms of C types, so unions and complex may not be
sufficient for present and future targets; other things which could
be relevant include signed/unsigned (LLVM's sext/zext aren't
always enough), _Bool, enums, bitfields, special struct members
(packed/aligned/ms_layout/etc.), and complex integer types. These
things were all intentionally excluded from the LLVM IR type system,
for the benefit of consumers.

Unions in particular were removed after being added to LLVM IR
briefly, in part because they never got enthusiasm among major
contributors.

One idea would be to have a bitcode -> bitcode pass which is
responsible for lowering the higher-level representations (structs,
unions, complex) down to simple types, in a target-specific way. This
includes coercing parameters, expanding va_arg, and so on. This would
move the target-specific lowering into LLVM and out of the front-ends.
(This pass would be idempotent)

To avoid reducing the effectiveness of bitcode optimization, this pass
would be run (in opt) before the optimization passes. For PNaCl, we
would skip this pass during opt, trading some optimization in exchange
for portable bitcode output. We would still need to run this pass
before the bitcode hits SelectionDAGBuilder, but this could be done
inside llc, when the target is known.

With this approach, you'd still be exposing all of the new type
diversity to all LLVM IR consumers.

LLVM calls do not seem to incorporate C's undefined
behavior rules about calling something with the wrong prototype,
which is good, because the various C/C++ frontends do not always
agree about how to lower a C prototype into IR.

This is an area where the LL disagress with the VM. The LL
was implicitly allowed to win here for practical reasons,
though that doesn't mean the VM is happy with it ;-).

David_Meyer · September 20, 2011, 12:45am

Dan,

X86-64's ABI is daunting, but I'm not convinced it is impossible to
lower this correctly given a few tweaks to the LLVM type system. (for
example, by adding union and complex types).

While PNaCl may be able to get away with picking x86-64 and ignoring
everything else, LLVM upstream supports multiple ABIs. Platform ABIs
are defined in terms of C types, so unions and complex may not be
sufficient for present and future targets; other things which could
be relevant include signed/unsigned (LLVM's sext/zext aren't
always enough), _Bool, enums, bitfields, special struct members
(packed/aligned/ms_layout/etc.), and complex integer types. These
things were all intentionally excluded from the LLVM IR type system,
for the benefit of consumers.

We currently support ARM, X86-32, and X86-64, and we wish to leave the
door open for supporting other targets.

Unions in particular were removed after being added to LLVM IR
briefly, in part because they never got enthusiasm among major
contributors.

...

With this approach, you'd still be exposing all of the new type
diversity to all LLVM IR consumers.

Not necessarily. Imagine having two flavors of IR: high-level (with
extended types), and low-level (without extended types, identical to
bitcode today).

Any IR consumer can choose which form it wants to handle. Given a
high-level module, you can apply the ABI lowering transformation to
get rid of the extended types, and get the low-level representation
instead.

Frontends would only need to generate high-level bitcode, freeing them
of target-specific considerations.

And the backends would only need to worry about processing low-level
bitcode. In particular, CodeGen/Analysis/Transforms wouldn't need to
know anything about the new type system, since the new types would be
eliminated before the bitcode reached them.

- pdox

Devang_Patel1 · September 20, 2011, 12:54am

Then, who is the intended consumer of this higher-level bitcode ? In my view, CodeGen/Analysis/Transforms pretty much sums up all LLVM IR consumers as Dan mentioned earlier.

Tom_Prince · September 20, 2011, 2:02am

The point isn't that there would be consumers of the higher-level
bitcode, but that front-ends could generate it without having to know
the target-specific information that is currently require to generate
correct IR.

And anyway, I suspect that at least some of the Analysis and Transform
passes would be fairly easy to extend to the high-level IR.

Tom

David_Meyer · September 20, 2011, 5:34am

Devang,

The immediate benefits:

* Front-ends would no longer need to implement the ABIs for each
target architecture to get full ABI compliance. They would generate
high-level bitcode, and the ABI lowering knowledge would be contained
in LLVM.

* Bitcode-level ABI compatibility between different front-ends (for
LTO), since there would be a unique lowering from C signatures to
high-level bitcode.

* The high-level format could become the wire format for portable
applications (e.g. PNaCl)

* You could make the lower-level format even more lower-level, by, for
example, getting rid of struct. (turning struct into a high-level-only
type)

Longer-term benefits:

* Greater flexibility to control the bridge between the frontends and
backends. The frontends benefit from being able to emit a higher-level
IR, while LLVM's CodeGen and optimizers benefit from working with a
lower-level IR.

* Access to a higher-level representation may permit new kinds of
analysis for optimizations, research or debugging.

I hesitate to suggest another layer, but this seems to be what is
needed to achieve ABI neutrality. In this case, it's not a completely
new IR, but a set of extensions to the existing IR which can be
eliminated in a target-specific way.

...

I didn't mean for this thread to become an exposition of this idea. I
only recently started considering this idea. It is only half-baked.

- David

Tom_Prince · September 20, 2011, 7:49am

Let me just mention that even though C code compiled against system
headers contains target-specific information that couldn't be captured
in this high-level bitcode format doesn't make it useless.

Even if the types in function signature are target dependent, being
able to specify the function in terms of those types, rather than the
ABI represantation of those types is a win for the front-end writer.

The example typically given is that C complexes have special rules for
passing them around. But a language implementor doesn't want to know how
exactly to represent the ABI requirements of all the platforms they want
to target to LLVM.

In code that doesn't handle complexes, if LLVM was aware of all the
ABI details, all the lots of code would need that is target dependent
is the size of various integer types. (This is assuming the high-level
format abstracted structure packing, arguments and return values).

Tom

David_Chisnall4 · September 20, 2011, 11:03am

I'd consider this to be a major benefit all by itself. Not only is the ABI-dependence of the IR a major layering violation, it's also incredibly badly documented. For example, I know that FreeBSD's x86 ABI returns small structures in registers, but I have no idea how I am supposed to know that this means that {float, float} is returned as an i64 (or why anyone thinks that's even a remotely sane representation in the IR of 'two floats in float registers').

Note that this wouldn't necessarily need a new IR format. A set of standard annotations on types and an pass that parsed these annotations and did the relevant transforms would be enough.

David

-- Sent from my brain

Reid_Kleckner · September 20, 2011, 2:07pm

This idea of a ABI-neutral IR layer has come up before (can't find the
link), which probably means that it's worth considering. I don't know
if it means it's a good idea or if it's just a bad idea that looks
good.

In any case, the ability to generate target-neutral bitcode seems to
me like the most commonly requested enhancement. I think the LLVM
project should either reject it outright as a non-goal or decide on
the future direction for how to achieve it. OTOH this is an open
source project with many stakeholders with different priorities, so
it's hard to make that kind of decision.

...

If you do want to add the target-neutral IR layer, it's probably best
to completely avoid optimizing such bitcode, because it's likely to
create bugs in the optimizers that don't expect FCAs, unions,
bitfields, pointer-sized integers, etc to be there.

Reid

Matthieu_M · September 20, 2011, 5:57pm

2011/9/20 Reid Kleckner <reid.kleckner@gmail.com>

Chris_Lattner · September 21, 2011, 6:38am

My read of this thread is that there are two independent things being discussed:

1. The utility of a target independent "IR ABI" that nacl wants, and
2. Whether first class aggregates are the way to lower structs.

#1 is clearly essential for native client, and potentially really interesting for other projects. I tend to agree that FCA's are not the right way to get them though. Have you guys considered just having clang scalarize aggregates when they are passed? There is no specific advantage to passing "{i32, i32, i32} %x" instead of "i32 %x.1, i32 %x.2, i32 %x.3", is there? The later form would be much nicer to the optimizers and code generator.

-Chris

John_McCall1 · September 21, 2011, 7:04am

Yeah, to be clear, I think this is a great goal. I would love it if we could get frontends out of the business of doing bizarre ABI lowering around calls.

John.

Topic		Replies	Views
[cfe-commits] [PATCH] Add PNaCl ABIInfo Clang Frontend	2	113	September 27, 2011
[RFC] An ABI lowering library for LLVM LLVM Project clang , llvm	33	2757	September 9, 2025
struct passing on X86-64 LLVM Dev List Archives	10	150	June 15, 2011
llvm-gcc + abi stuff LLVM Dev List Archives	3	126	January 24, 2008
Ideas about C calling convention lowering to LLVM IR LLVM Project	15	889	December 1, 2024

Passing StructType arguments directly

Related topics