First-class aggregate semantics

I think I'm missing something basic about the semantics of returning an
aggregate type (in my case, a structure) from a function. Returning a
structure containing only compile-time constants is simple enough. But
I don't quite get how this works with a struct composed at run-time. If
I constructed it on the stack with alloca, would I be letting a stack
variable escape to to a context where it doesn't exist if I return it?
Or does the return semantics guarantee it will be copied (or space
allocated in the caller) appropriately? Otherwise I should abandon the
idea of returning such a struct and simply pass in a pointer to space
allocated in the caller.

I think my confusion stems from thinking in terms of high-level
languages and not having done nearly enough assembly work to know what
LLVM really wants to do, and I'd be grateful for a clue about the
idiomatic way to do this.

Dustin

The way this works on many targets is that the caller allocates stack
space in its frame for the returned struct and passes a pointer to it
as a first "hidden" argument to the callee. The callee then copies
that data into the space pointed to by the address.

This is all specified by the ABI so it varies by processor and OS.
The target-dependent lowering pieces of LLVM should take care of it.

Long-term, first-class status means that returns of structs should
"just work" and you don't need to worry about getting a pointer to
invalid memory. I believe right now, however, only structs up to a
certain size are supported, perhaps because under some ABIs, small
structs can be returned in registers and one doesn't need to worry
about generating the hidden argument.

Someone working directly on this can answer with more authority.

                              -Dave

The way this works on many targets is that the caller allocates stack
space in its frame for the returned struct and passes a pointer to it
as a first "hidden" argument to the callee. The callee then copies
that data into the space pointed to by the address.

<nod>

Long-term, first-class status means that returns of structs should
"just work" and you don't need to worry about getting a pointer to
invalid memory.

OK, so my thought of constructing the object on the stack was correct?
What I originally wanted to do was roughly

    %Token = type {%c_int, %i8*}

    define %Token @foo()
    {
        ...

        ret %Token {%c_int %token, %i8* %value}
    }

but the compiler complains about the invalid usage of a local name. So
I decided the problem was that I was thinking in terms of languages that
would create a temporary implicitly, and in IR I need to do it
explicitly. So it occurred to me to create the struct on the stack, as
I mentioned.

What bothers me about that is the explicit specification with alloca
that the space is reserved in the callee's frame. Do I just trust the
optimizer to eliminate that and turn the reference to alloca'd memory
into a reference to the space reserved by the caller? Or is that going
to create an unnecessary copy from the alloca'd memory to that reserved
by the caller? From what you said my guess is the former (optimizer
eliminates the pointless temporary), but us premature optimizers like to
be reassured we haven't given up an all-important microsecond. :slight_smile:

...I believe right now, however, only structs up to a
certain size are supported, perhaps because under some ABIs, small
structs can be returned in registers and one doesn't need to worry
about generating the hidden argument.

In the case that prompted the question the struct isn't going to be
bigger than two of whatever the architecture regards as a word, which
surely should be fine, but in principle shouldn't LLVM and not the
front-end programmer be making the decision about whether the struct is
big enough to spill into memory?

Dustin

On x86, the hidden argument is generated automatically at codegen time
if it's needed. As far as I know, other platforms don't yet have that
support.

Hi Dustin-

You'll probably need to use insertvalue to construct your return value.

Alastair

You mean to return a struct in sret form? I thought support was recently added
to return structs in registers?

> The way this works on many targets is that the caller allocates stack
> space in its frame for the returned struct and passes a pointer to it
> as a first "hidden" argument to the callee. The callee then copies
> that data into the space pointed to by the address.

<nod>

> Long-term, first-class status means that returns of structs should
> "just work" and you don't need to worry about getting a pointer to
> invalid memory.

OK, so my thought of constructing the object on the stack was correct?

No. The idea is that you pass the structs around as values and not that you
alloca them and pass by reference/pointer.

What bothers me about that is the explicit specification with alloca
that the space is reserved in the callee's frame.

Yes. Don't do that.

Do I just trust the
optimizer to eliminate that and turn the reference to alloca'd memory
into a reference to the space reserved by the caller?

No. LLVM is trusting you not to return pointers to locals.

Or is that going
to create an unnecessary copy from the alloca'd memory to that reserved
by the caller? From what you said my guess is the former (optimizer
eliminates the pointless temporary), but us premature optimizers like to
be reassured we haven't given up an all-important microsecond. :slight_smile:

I have had great success with my HLVM project by passing around large numbers
of large structs by hand. LLVM has not only survived but actually generated
decent code that beats most languages according to my benchmarks. In
particular, HLVM uses "fat" quadword references (where word = sizeof(void*))
that are passed everywhere by value except when a struct is returned and HLVM
gets the caller to alloca and passes that space by pointer to the callee for
it to fill in.

> ...I believe right now, however, only structs up to a
> certain size are supported, perhaps because under some ABIs, small
> structs can be returned in registers and one doesn't need to worry
> about generating the hidden argument.

In the case that prompted the question the struct isn't going to be
bigger than two of whatever the architecture regards as a word, which
surely should be fine, but in principle shouldn't LLVM and not the
front-end programmer be making the decision about whether the struct is
big enough to spill into memory?

Good question. There was a very interesting discussion about this here a while
ago and everyone coming to LLVM says the same thing: why doesn't LLVM just
handle this for me automatically? The answer is that LLVM cannot make that
decision because it depends upon the ABI. C99 apparently returns user-defined
structs of two doubles by reference but complex numbers in registers. So the
ABI requires knowledge of the front-end and, therefore, LLVM cannot fully
automate this.

Something LLVM could do is spill safely when it knows you don't care about the
foreign ABI (e.g. with fastcc) and that work is underway.

Ah ha!

The fact is I didn't really understand the significance of this part
when I read it, and so didn't remember it when I needed it. OK, so I
have tested it and I can now build up a struct like this

    %s1 = insertvalue {i32, i32} {i32 0, i32 0}, i32 1, 0 ; s1 = {1,0}
    %s2 = insertvalue {i32, i32} %s1, i32 2, 1 ; %s2 == {1,2}

which reminds me of another thing I never understood. I can't make my
code (slightly) more readable by changing that to something like

    %s0 = {i32 0, i32 0}
    %s1 = insertvalue {i32, i32} %s0, i32 1, 0 ; s1 = {1,0}
    %s2 = insertvalue {i32, i32} %s1, i32 2, 1 ; %s2 == {1,2}

because LLVM will complain that it "expected instruction opcode" at the
assignment to %s0. If there is a general way to give names to constants
in that way I didn't find it. In fact, I think I tended not to use
temporaries like I would variables precisely because when I tried the
second alternative as the natural way to hand-code it and it didn't
work, I didn't think how to phrase it so only the results of operations
get named.

Help me understand the underlying logic--why can one only name the
results of operations? I realize that the local temporaries are
notionally register variables for a machine with an infinite number of
registers, but my very dim memory of real assembly was that I not only
could load constants into registers but had to do so. What part of the
picture am I missing here?

You need an IR tutorial. Or, to speak correctly, *I* need a tutorial.
:slight_smile: But I'm learning....

Dustin

No. The idea is that you pass the structs around as values and not that you
alloca them and pass by reference/pointer.

OK, then I need to learn more syntax (which Alistair Lynn got me started
on, it appears :-).

No. LLVM is trusting you not to return pointers to locals.

How naive. :slight_smile:

I have had great success with my HLVM project by passing around large numbers
of large structs by hand. LLVM has not only survived but actually generated
decent code that beats most languages according to my benchmarks.

That's good to know, because I prefer the style of returning structs
rather than passing around pointers or using static data. I'll be happy
to convert my lexer over to returning structs instead of pulling
lex-style tricks.

...LLVM cannot make that
decision because it depends upon the ABI. C99 apparently returns user-defined
structs of two doubles by reference but complex numbers in registers. So the
ABI requires knowledge of the front-end and, therefore, LLVM cannot fully
automate this.

Huh. I'd never have guessed (and would have been quite annoyed if I
had, since numerical code is often at the edge of whatever the computing
budget is (meaning the problem was the largest one the researcher could
afford to solve, not the one he wished he was solving).

Something LLVM could do is spill safely when it knows you don't care about the
foreign ABI (e.g. with fastcc) and that work is underway.

<nod>

Dustin

Hi Dustin,

I think I'm missing something basic about the semantics of returning an
aggregate type (in my case, a structure) from a function. Returning a
structure containing only compile-time constants is simple enough. But
I don't quite get how this works with a struct composed at run-time. If
I constructed it on the stack with alloca, would I be letting a stack
variable escape to to a context where it doesn't exist if I return it?

first class aggregates are basically implemented by sticking the value of each
struct field in a register. For example, returning a struct with two i32
fields amounts to placing each of the fields in a machine register (eg: EAX,
ECX) then returning from the function. The caller gets hold of the values
by reading EAX and ECX. Note that it doesn't return a pointer to the struct,
it returns the value of the struct. Suppose you have stored the struct in
an alloca. Then to get it as a first class aggregate, you first need to do
a load of the alloca (this results in an LLVM register of aggregate type,
equivalent to two machine registers of type i32), then return the loaded value.
At the machine code level, this corresponds to loading the first field from
the stack into EAX, loading the second field from the stack into EDX then
returning.

Or does the return semantics guarantee it will be copied (or space
allocated in the caller) appropriately? Otherwise I should abandon the
idea of returning such a struct and simply pass in a pointer to space
allocated in the caller.

If the struct is so big that there aren't enough machine registers available
to return it, then the code generator will automagically allocate some stack
space in the caller, pass a pointer to it into the called function, and have
the callee return the struct by copying it into the passed stack space [*].

Ciao,

Duncan.

[*] This functionality was implemented after LLVM 2.6 was released. In LLVM
2.6 the code generator will crash if it runs out of registers. In this case
you should pass in a pointer by hand.

Hi Jon,

In the case that prompted the question the struct isn't going to be
bigger than two of whatever the architecture regards as a word, which
surely should be fine, but in principle shouldn't LLVM and not the
front-end programmer be making the decision about whether the struct is
big enough to spill into memory?

Good question. There was a very interesting discussion about this here a while ago and everyone coming to LLVM says the same thing: why doesn't LLVM just handle this for me automatically? The answer is that LLVM cannot make that decision because it depends upon the ABI.

actually LLVM does now handle this for you automatically: if there aren't
enough registers to return the first class aggregate in registers, then it
is automagically returned on the stack. If this is not ABI conformant then
it is up to the front-end to not generate IR that returns such large structs.
In practice front-ends only generate functions returning first class aggregates
when the ABI says the aggregate should be entirely returned in registers. Thus
by definition it is sure not to require more registers than the machine has!
This is why the enhancement to automagically use the stack if there aren't
enough registers has no impact on ABI conformance.

Ciao,

Duncan.

There are small target hooks that need to be implemented for each
target to get this to work. As far as I know, the only hook
implemented was for x86.

It's not a C99 thing, but an ABI thing.

And for the x86-64 ABI, complex double and a struct of two doubles is
returned in exactly the same way. That may not be true for other ABIs.
I'm not as familiar with them.

On some targets it certainly should be possible to do the right thing.

                               -Dave

Dustin Laurence wrote:

  

You'll probably need to use insertvalue to construct your return value.
    
Ah ha!

The fact is I didn't really understand the significance of this part
when I read it, and so didn't remember it when I needed it. OK, so I
have tested it and I can now build up a struct like this

    %s1 = insertvalue {i32, i32} {i32 0, i32 0}, i32 1, 0 ; s1 = {1,0}
    %s2 = insertvalue {i32, i32} %s1, i32 2, 1 ; %s2 == {1,2}
  

As a small refinement, I recommend:

%s1 = insertvalue {i32, i32} undef, i32 1, 0
%s2 = insertvalue {i32, i32} %s1, i32 2, 1

which reminds me of another thing I never understood. I can't make my
code (slightly) more readable by changing that to something like

    %s0 = {i32 0, i32 0}
    %s1 = insertvalue {i32, i32} %s0, i32 1, 0 ; s1 = {1,0}
    %s2 = insertvalue {i32, i32} %s1, i32 2, 1 ; %s2 == {1,2}
  

No, there is no copy or move instruction in LLVM. Recall that the text format is 1:1 with the in-memory model of the program. A copy instruction in the IR would literally mean "go look at my operand instead", leading to logic in every optimization that checks for a CopyInst and chases the pointer.

The astute reader will note that I'm lying again, but it's for your own good. :wink: "%x = bitcast i32 %y to i32" is a legal way to copy, but the intention behind a BitcastInst is that it is used to change the type.

Nick

As a small refinement, I recommend:

%s1 = insertvalue {i32, i32} undef, i32 1, 0
%s2 = insertvalue {i32, i32} %s1, i32 2, 1

Ah, excellent. I hadn't found a use for 'undef' yet, but I like that.
I don't like substituting values into a literal with arbitrary values
because, well, it's unlovely. Saying the values don't matter is much
better.

I don't think there is a place in my lexer that happens to need exactly
that as one or the other members of the token structure always seems to
be a literal, but your example did lead me to use undef for individual
members being replaced in an insertvalue instruction.

No, there is no copy or move instruction in LLVM. Recall that the text
format is 1:1 with the in-memory model of the program. A copy
instruction in the IR would literally mean "go look at my operand
instead", leading to logic in every optimization that checks for a
CopyInst and chases the pointer.

My desire to do such things is a direct consequence of my apparently odd
choice to learn the IR by approaching it as a programming language.
While it clearly wasn't intended for it, my first computer had three
registers, only one of which was a mighty sixteen bits wide, so I've
hurt worse before. :slight_smile:

That said, the need to parameterize code is too great not to do
something, and since LLVM isn't designed to do it I have been using CPP.
Doing without even the modest abilities of CPP is simply not to be
thought of. If I did enough hand-coding it would be well worth writing
a custom preprocessor.

The astute reader will note that I'm lying again, but it's for your own
good. :wink: "%x = bitcast i32 %y to i32" is a legal way to copy, but the
intention behind a BitcastInst is that it is used to change the type.

If I was concerned about my own good I probably wouldn't be hand-coding. :slight_smile:

Dustin