[RFC] Clean up the way we store optional Function data

Function's have three kinds of optional data: prefix data, prologue data, and
personalities. We don't have a consistent way of storing this data, IMO. This
RFC discusses a new way of managing optional data that makes llvm::Function
cleaner, more consistent, and a little smaller.

What do we do currently?

I don't know how prologue and prefix data is used -- is it correct to
say that you're basically trying to give `llvm::Function` s some
"optional" operands, and that you know during construction of an
`llvm::Function` how many optional operands the `llvm::Function` will
need[1]? If so, you might want to consider using the "descriptor"
functionality that was added recently[2]. It allows classes deriving
from `llvm::User` to allocate an arbitrary number of bookkeeping bytes
before the normal `Use &` array. Instructions with operand bundles
use these bytes to remember which where the individual operand bundles
start and end, and an `llvm::Function` could easily use the extra
storage to remember if / where optional operands like the personality
function are in the `Use &` array. And by making the operands part of
the regular `Use &` array, functionality like RAUW that use Def-Use
chains should Just Work(TM).

Keep in mind, I'm biased here, so take with a grain of salt! :slight_smile:

[1]: I see there are methods like `setPrologueData`, but they're only
   used from files that I'd expect to be creating `llvm::Function`
   instances from scratch.

[2]: See `DescBytes` in `User::User`

-- Sanjoy

Vedant Kumar wrote:

Hi Sanjoy,

I don't know how prologue and prefix data is used -- is it correct to
say that you're basically trying to give `llvm::Function` s some
"optional" operands, and that you know during construction of an
`llvm::Function` how many optional operands the `llvm::Function` will
need[1]?

Yep. Though not operands exactly, since they wouldn't be in Function's use list.

If so, you might want to consider using the "descriptor"
functionality that was added recently[2]. It allows classes deriving
from `llvm::User` to allocate an arbitrary number of bookkeeping bytes
before the normal `Use &` array. Instructions with operand bundles
use these bytes to remember which where the individual operand bundles
start and end, and an `llvm::Function` could easily use the extra
storage to remember if / where optional operands like the personality
function are in the `Use &` array. And by making the operands part of
the regular `Use &` array, functionality like RAUW that use Def-Use
chains should Just Work(TM).

That's a neat idea. To summarize: make Function have 3 optional operands. (For context -- Function currently has 1 optional operand, and my proposal is to move to 0.)

Could someone else chime in on what they'd like to see?

vedant

Sanjoy's idea makes sense to me, but only if we never need to add
prefix/prologue data after functions are created. Are there any places
where we need/want to add them after the fact?

That's a neat idea. To summarize: make Function have 3 optional operands. (For context -- Function currently has 1 optional operand, and my proposal is to move to 0.)

Could someone else chime in on what they'd like to see?

Sanjoy's idea makes sense to me, but only if we never need to add
prefix/prologue data after functions are created. Are there any places
where we need/want to add them after the fact?

I think so. I see:

LinkModules.cpp: Dst.setPrefixData(MapValue(Src.getPrefixData(), ValueMap,
BitcodeReader.cpp: FunctionPrologueWorklist.back().first->setPrologueData(C);
InlineFunction.cpp: Caller->setPersonalityFn(CalledPersonality);

Some of these sites could be refactored so that the Functions are created with the prefix/prologue data they need. I don't think that's possible for personality functions (see my third example).

Would we inhibit any future patches which add prefix/prologue data to Functions on the fly by taking this approach?

vedant

Vedant Kumar wrote:

That's a neat idea. To summarize: make Function have 3 optional operands. (For context -- Function currently has 1 optional operand, and my proposal is to move to 0.)

Could someone else chime in on what they'd like to see?

Sanjoy's idea makes sense to me, but only if we never need to add
prefix/prologue data after functions are created. Are there any places
where we need/want to add them after the fact?

I think so. I see:

LinkModules.cpp: Dst.setPrefixData(MapValue(Src.getPrefixData(), ValueMap,
BitcodeReader.cpp: FunctionPrologueWorklist.back().first->setPrologueData(C);
InlineFunction.cpp: Caller->setPersonalityFn(CalledPersonality);

Some of these sites could be refactored so that the Functions are created with the prefix/prologue data they need. I don't think that's possible for personality functions (see my third example).

Would we inhibit any future patches which add prefix/prologue data to Functions on the fly by taking this approach?

You should always be able to create a new `llvm::Function` instance (and RAUW it in) if you want to add prefix/prologue data to functions after they've been created; just like you have to do today for any other `llvm::User`s that do not have hung off uses. Which brings me to -- can you use hung off uses for this? These use lists can be resized on the fly, so you should be able to add and remove prologue data on the fly. If you're using hung off uses, you'll probably still need a descriptor to remember whether / which operands are prologue data etc.

-- Sanjoy

Vedant Kumar wrote:

That's a neat idea. To summarize: make Function have 3 optional operands. (For context -- Function currently has 1 optional operand, and my proposal is to move to 0.)

Could someone else chime in on what they'd like to see?

Sanjoy's idea makes sense to me, but only if we never need to add
prefix/prologue data after functions are created. Are there any places
where we need/want to add them after the fact?

I think so. I see:

LinkModules.cpp: Dst.setPrefixData(MapValue(Src.getPrefixData(), ValueMap,
BitcodeReader.cpp: FunctionPrologueWorklist.back().first->setPrologueData(C);
InlineFunction.cpp: Caller->setPersonalityFn(CalledPersonality);

Some of these sites could be refactored so that the Functions are created with the prefix/prologue data they need. I don't think that's possible for personality functions (see my third example).

Would we inhibit any future patches which add prefix/prologue data to Functions on the fly by taking this approach?

You should always be able to create a new `llvm::Function` instance (and RAUW it in) if you want to add prefix/prologue data to functions after they've been created; just like you have to do today for any other `llvm::User`s that do not have hung off uses.

It's possible, but a lot more involved with `Function`s. Besides
RAUW, you need to transfer over all the basic blocks.

This seems kind of wrong to me, if we expect it to happen.

Which brings me to -- can you use hung off uses for this? These use lists can be resized on the fly, so you should be able to add and remove prologue data on the fly. If you're using hung off uses, you'll probably still need a descriptor to remember whether / which operands are prologue data etc.

Sure, this is another option. It might be simplest. I'd be
tempted to start with a 0/3 choice (if we allocate any hung-off
uses, allocate enough for all three operands) to simplify the
logic. Although I can't remember right now whether that's
legal (having nullptr operands followed by real ones)...

Personalities are stored as ``optional`` Function operands. We actually always
allocate the space for this ``optional`` operand: there's a FIXME in the
destructor for Function about this.

Makes me wonder, why didn't we use hung off uses to begin with?
Do functions "usually" have personality functions, for some
definition of?

Duncan P. N. Exon Smith wrote:

>
>
>
> Vedant Kumar wrote:
>>>> That's a neat idea. To summarize: make Function have 3 optional
operands. (For context -- Function currently has 1 optional operand, and my
proposal is to move to 0.)
>>>>
>>>> Could someone else chime in on what they'd like to see?
>>> Sanjoy's idea makes sense to me, but only if we never need to add
>>> prefix/prologue data after functions are created. Are there any places
>>> where we need/want to add them after the fact?
>>
>> I think so. I see:
>>
>> LinkModules.cpp: Dst.setPrefixData(MapValue(Src.getPrefixData(),
ValueMap,
>> BitcodeReader.cpp:
FunctionPrologueWorklist.back().first->setPrologueData(C);
>> InlineFunction.cpp: Caller->setPersonalityFn(CalledPersonality);
>>
>> Some of these sites could be refactored so that the Functions are
created with the prefix/prologue data they need. I don't think that's
possible for personality functions (see my third example).
>>
>> Would we inhibit any future patches which add prefix/prologue data to
Functions on the fly by taking this approach?
>
> You should always be able to create a new `llvm::Function` instance (and
RAUW it in) if you want to add prefix/prologue data to functions after
they've been created; just like you have to do today for any other
`llvm::User`s that do not have hung off uses.

It's possible, but a lot more involved with `Function`s. Besides
RAUW, you need to transfer over all the basic blocks.

This seems kind of wrong to me, if we expect it to happen.

> Which brings me to -- can you use hung off uses for this? These use
lists can be resized on the fly, so you should be able to add and remove
prologue data on the fly. If you're using hung off uses, you'll probably
still need a descriptor to remember whether / which operands are prologue
data etc.

Sure, this is another option. It might be simplest. I'd be
tempted to start with a 0/3 choice (if we allocate any hung-off
uses, allocate enough for all three operands) to simplify the
logic. Although I can't remember right now whether that's
legal (having nullptr operands followed by real ones)...

>>>>>> Personalities are stored as ``optional`` Function operands. We
actually always
>>>>>> allocate the space for this ``optional`` operand: there's a FIXME
in the
>>>>>> destructor for Function about this.

Makes me wonder, why didn't we use hung off uses to begin with?
Do functions "usually" have personality functions, for some
definition of?

Depends. In C++? It's pretty common to have objects which have non-trivial
destructors on the stack which means calling a function will be an invoke
which will require the function to have a personality. In C? It's pretty
rare. You'd need something like __attribute__((cleanup)) to do it, the most
common source of this will be something like pthread_cleanup_push. If I
recall correctly, Julia sets the personality on functions regardless of
whether or not there are any invokes, they need the AsmPrinter to scribble
something down. I can't say for other languages (Rust, etc.). From what I
understand, Swift doesn't use landingpad for EH so they wouldn't need the
personality set.

David Majnemer wrote:

    >
    > Vedant Kumar wrote:
    > >>> That's a neat idea. To summarize: make Function have 3 optional
    operands. (For context -- Function currently has 1 optional operand,
    and my proposal is to move to 0.)
    > >>>
    > >>> Could someone else chime in on what they'd like to see?
    > >> Sanjoy's idea makes sense to me, but only if we never need to add
    > >> prefix/prologue data after functions are created. Are there any
    places
    > >> where we need/want to add them after the fact?
    > >
    > > I think so. I see:
    > >
    > > LinkModules.cpp:
    Dst.setPrefixData(MapValue(Src.getPrefixData(), ValueMap,
    > > BitcodeReader.cpp:
    FunctionPrologueWorklist.back().first->setPrologueData(C);
    > > InlineFunction.cpp: Caller->setPersonalityFn(CalledPersonality);
    > >
    > > Some of these sites could be refactored so that the Functions are
    created with the prefix/prologue data they need. I don't think
    that's possible for personality functions (see my third example).
    > >
    > > Would we inhibit any future patches which add prefix/prologue
    data to Functions on the fly by taking this approach?
    >
    > You should always be able to create a new `llvm::Function`
    instance (and RAUW it in) if you want to add prefix/prologue data to
    functions after they've been created; just like you have to do today
    for any other `llvm::User`s that do not have hung off uses.

    It's possible, but a lot more involved with `Function`s. Besides
    RAUW, you need to transfer over all the basic blocks.

    This seems kind of wrong to me, if we expect it to happen.

    > Which brings me to -- can you use hung off uses for this? These
    use lists can be resized on the fly, so you should be able to add
    and remove prologue data on the fly. If you're using hung off uses,
    you'll probably still need a descriptor to remember whether / which
    operands are prologue data etc.

    Sure, this is another option. It might be simplest. I'd be
    tempted to start with a 0/3 choice (if we allocate any hung-off
    uses, allocate enough for all three operands) to simplify the
    logic. Although I can't remember right now whether that's
    legal (having nullptr operands followed by real ones)...

    > >>>>> Personalities are stored as ``optional`` Function operands.
    We actually always
    > >>>>> allocate the space for this ``optional`` operand: there's a
    FIXME in the
    > >>>>> destructor for Function about this.

    Makes me wonder, why didn't we use hung off uses to begin with?
    Do functions "usually" have personality functions, for some
    definition of?

Depends. In C++? It's pretty common to have objects which have
non-trivial destructors on the stack which means calling a function will
be an invoke which will require the function to have a personality. In
C? It's pretty rare. You'd need something like __attribute__((cleanup))
to do it, the most common source of this will be something
like pthread_cleanup_push. If I recall correctly, Julia sets the
personality on functions regardless of whether or not there are any
invokes, they need the AsmPrinter to scribble something down. I can't
say for other languages (Rust, etc.). From what I understand, Swift
doesn't use landingpad for EH so they wouldn't need the personality set.

Most functions we emit from our Java frontend have personalities.

-- Sanjoy

I like the idea of using hung off uses.

We can keep using SubclassData to indicate whether or not some optional data is present.

Benefits: zero space overhead until some optional data is set, we can get rid of the DenseMaps in LLVMContextImpl, and RAUW just works (so no clang changes are needed).

I'll have a patch up before the end of the week.

thanks
vedant

Here is a WIP patch as promised:

    http://reviews.llvm.org/D13829

It uses a hungoff uselist to store optional data as needed.

Some early objections from Duncan:

    - An extra one-time malloc() is required to set personality functions.
    - We get and set personality functions frequently. This patch introduces a level of indirection which slows the common case down.

Is this overhead acceptable?

If not, maybe we could leave personality function handling untouched and add a "Constant **OptionalData" array to Function.

vedant

Here is a WIP patch as promised:

   http://reviews.llvm.org/D13829

It uses a hungoff uselist to store optional data as needed.

Some early objections from Duncan:

   - An extra one-time malloc() is required to set personality functions.
   - We get and set personality functions frequently. This patch introduces a level of indirection which slows the common case down.

Is this overhead acceptable?

If the overhead isn't bad, then this SGTM. (I haven't looked at the
patch.)

I guess I was thinking about large applications that use C++ exceptions
and build with LTO, but even there it's a per-function overhead. I
doubt it really matters.

If not, maybe we could leave personality function handling untouched and add a "Constant **OptionalData" array to Function.

The `IndirectUser` approach you outlined initially might be better
than this?

I've done some measurements on this.

The test program I have just calls Function::Create(), F->setPersonalityFn(), and then F->eraseFromParent() in a loop 2^20 times.

Results:

    pre-patch --- min: 1.10s max: 1.13s avg: 1.11s
    post-patch --- min: 1.26s max: 1.35s avg: 1.29s

So we expect to lose 0.2 seconds per 1 million functions (with personality functions) in a project. I've only tested on my machine, and I haven't accounted for performance variations in other parts of the codebase where code was removed.

I'm happy enough with the patch and think that this is a reasonable tradeoff. Should we go with this?

http://reviews.llvm.org/D13829

vedant

I feel OK about this trade-off. The vast majority of C++ TUs have < 65k functions. Sanjoy?

David Majnemer wrote:

I feel OK about this trade-off. The vast majority of C++ TUs have < 65k
functions. Sanjoy?

Most of our "TU"s have at most a handful of (~ 20) functions, so I'm fine with this too.

-- Sanjoy