[RFC] A proposal for byval in a world with opaque pointers

Hi,

In the past months, several options have been presented for making byval
(and similar attributes, such as inalloca or sret) work with opaque pointers.

The main two I've seen were byval(T) and byval(N) where N is the size of T.

They both have their upsides and downsides, for example: byval(T) would be
a type-parametric attribute, which, AFAIK, does not already exist and may
complicate the attribute system significantly, while byval(N) would be hard
to introduce in tests as computing N from T requires LLVM's DataLayout.

Also, this would have to be done for inalloca and sret as well - sret only
needs it when targeting SPARC, although still generally useful in analysis.

To sidestep some of the concerns and allow a smooth transition towards a
byval that works with opaque pointers, I've come up with a new approach:

Reuse dereferenceable(S) and align A for the size and alignment of byval.

That is, a byval dereferenceable(S) align A argument is guaranteed to have
S bytes available to read from, *and only S*, aligned to a multiple of A.
Reading past that size is UB, as LLVM will not copy more than S bytes.

An API can be provided to add the attribute alongside dereferenceable
and align attributes, for a given Type* and DataLayout.

A preliminary implementation (w/o sret) can be found at:
https://github.com/eddyb/llvm/compare/2579466...65ac99b

To maintain compatibility with existing code, dereferenceable and align
attributes are automatically injected as soon as a non-default DataLayout
is available. The "injection" mechanism could potentially be replaced with
a pass, although it was easier to experiment with it being guaranteed.

This works out pretty well in practice, as analysis already understands
dereferenceable and can make decisions based on it.

The verifier checks that for byval & friends, dereferenceable(S) and
align A are present (clang always adds align, but not all tests have it)
and that S is the exact size of the pointee type (while we still know that).

That last bit is very important, because it allows a script to do the following:

1. Find all byval arguments in tests that are missing dereferenceable, e.g.
    ... i32* byval align 4 ...
    .... {i8, i64}* byval ...
2. Add a bogus dereferenceable(unique ID) to each of them, i.e.
    ... i32* byval dereferenceable(123400001) align 4 ...
    .... {i8, i16}* byval dereferenceable(123400002) ...
3. Run the tests and record the errors, which may look like:

Attribute 'byval' expects 'dereferenceable(4)' for type i32*,
    found 'dereferenceable(123400001)'

Attribute 'byval' expects 'dereferenceable(16) align 8' for type {i8, i64}*,
    found 'dereferenceable(123400002)'

4. Use the verifier error messages to replace the bogus attributes
with the proper ones, which include align A when it is missing:
    ... i32* byval dereferenceable(4) align 4 ...
    .... {i8, i16}* byval dereferenceable(16) align 8 ...

For what is worth, the same scheme would also work for byval(N), and
would be entirely unnecessary for byval(T).

I would love to know your thoughts on this, and more specifically:
Which of the 3 (byval(T), byval(N) and byval + dereferenceable + align)
do you think would provide the easiest transition path for front-ends?

Thank you,
- eddyb

+Chandler, Duncan, David M, and Reid (pseudorandom selection of people who might be interested/voiced opinions on prior threads/conversations on the subject)

Firstly, thanks Eddy for doing a lot of work in the opaque pointer area - it’s greatly appreciated. This is a particularly thorny part of it (or, at least I think so at the moment - sounds like you’ve seen thornier parts further down the trail).

We (llvmlite and Numba) don't use byval specifically but, were we to use
it, I think byval(T) would by far be the easiest for us. By contrast,
computing ABI sizes is a bit of a PITA due to llvmlite's architecture
(llvmlite's IR generation side doesn't call into LLVM at all, it
generates textual IR in pure Python).

(as a matter of fact, I plan to keep typed pointers in llvmlite,
because they provide invaluable help discovering bugs early rather than
late - i.e. giving a proper error rather than crashing mysteriously)

Regards

Antoine.

Do all tests need it? Align is just for non-default alignment (> align(1)) so it may be OK for the tests to omit it & it's probably not right to /require/ an align attribute on a byval parameter.

I've had to fix some of the tests which checked that the right
alignment was applied,
by looking at the generated asm, but had no "align A" attributes and
relied on the
target alignment guessing, which I had taken out in my experimental
implementation.

The rest of the tests might be fine, but could behave surprisingly
upon closer inspection.

> Do all tests need it? Align is just for non-default alignment (>
align(1)) so it may be OK for the tests to omit it & it's probably not
right to /require/ an align attribute on a byval parameter.

I've had to fix some of the tests which checked that the right
alignment was applied,

Rough order of magnitude number of tests that needed to be fixed? (10s?
100s?)

by looking at the generated asm, but had no "align A" attributes and
relied on the
target alignment guessing, which I had taken out in my experimental
implementation.

The rest of the tests might be fine, but could behave surprisingly
upon closer inspection.

I'd generally be OK with this ^

>
> I would love to know your thoughts on this, and more specifically:
> Which of the 3 (byval(T), byval(N) and byval + dereferenceable + align)
> do you think would provide the easiest transition path for front-ends?

We (llvmlite and Numba) don't use byval specifically but, were we to use
it, I think byval(T) would by far be the easiest for us. By contrast,
computing ABI sizes is a bit of a PITA due to llvmlite's architecture
(llvmlite's IR generation side doesn't call into LLVM at all, it
generates textual IR in pure Python).

If you have the datalayout string, the ABI sizes become almost trivial to
compute, the algorithms for structs are simple and I'm not aware of any
special-cases that would complicate a front-end's computation.

(as a matter of fact, I plan to keep typed pointers in llvmlite,
because they provide invaluable help discovering bugs early rather than
late - i.e. giving a proper error rather than crashing mysteriously)

I wholeheartedly agree: IMO, all LLVM front-ends should attach their own
typesystem information to LLVM values instead of passing bare Value*
around, or the textual equivalent, in your case.

In rustc we didn't do this from the start and it's become a complete
nightmare to work with those old parts of the codebase.

Relevant commit:
https://github.com/eddyb/llvm/commit/c76e3d6bd6b7008795e16f4422a014d2ff7d48dd
It's just 4 tests, but I had to split each one because they were getting
ran against
multiple targets, and those needed different alignment attributes.

Only casually glanced at that, but do we need to test it with different
alignments on different targets? Or can we just pick one alignment to test
on all of them, do you think? (if you're not sure, that's OK - I/we can
look into it & see what the tests are trying to do, etc)

For us the datalayout string is more of an implementation detail on
LLVM's side, our front-end is not normally aware of it. As for the
algorithm, it may be trivial but is there an outline or specification
of it somewhere? :slight_smile:

Regards

Antoine.

From: "Antoine Pitrou via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Tuesday, January 19, 2016 5:54:27 PM
Subject: Re: [llvm-dev] [RFC] A proposal for byval in a world with opaque pointers

> 2016-01-20 1:11 GMT+02:00 Antoine Pitrou via llvm-dev <
> llvm-dev@lists.llvm.org>:
>
> > >
> > > I would love to know your thoughts on this, and more
> > > specifically:
> > > Which of the 3 (byval(T), byval(N) and byval + dereferenceable
> > > + align)
> > > do you think would provide the easiest transition path for
> > > front-ends?
> >
> > We (llvmlite and Numba) don't use byval specifically but, were we
> > to use
> > it, I think byval(T) would by far be the easiest for us. By
> > contrast,
> > computing ABI sizes is a bit of a PITA due to llvmlite's
> > architecture
> > (llvmlite's IR generation side doesn't call into LLVM at all, it
> > generates textual IR in pure Python).
> >
> If you have the datalayout string, the ABI sizes become almost
> trivial to
> compute, the algorithms for structs are simple and I'm not aware of
> any
> special-cases that would complicate a front-end's computation.

For us the datalayout string is more of an implementation detail on
LLVM's side, our front-end is not normally aware of it. As for the
algorithm, it may be trivial but is there an outline or specification
of it somewhere? :slight_smile:

The format is documented http://llvm.org/docs/LangRef.html#data-layout but you're not generally expected to parse the string directly. Instead, you'd use the DataLayout class (include/llvm/IR/DataLayout.h) which contains functions like 'uint64_t getTypeSizeInBits(Type *Ty) const;'

-Hal

From: "Eddy B. via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Tuesday, January 19, 2016 4:47:56 PM
Subject: [llvm-dev] [RFC] A proposal for byval in a world with opaque pointers

Hi,

In the past months, several options have been presented for making
byval
(and similar attributes, such as inalloca or sret) work with opaque
pointers.

The main two I've seen were byval(T) and byval(N) where N is the size
of T.

They both have their upsides and downsides, for example: byval(T)
would be
a type-parametric attribute, which, AFAIK, does not already exist and
may
complicate the attribute system significantly, while byval(N) would
be hard
to introduce in tests as computing N from T requires LLVM's
DataLayout.

Also, this would have to be done for inalloca and sret as well - sret
only
needs it when targeting SPARC, although still generally useful in
analysis.

To sidestep some of the concerns and allow a smooth transition
towards a
byval that works with opaque pointers, I've come up with a new
approach:

Reuse dereferenceable(S) and align A for the size and alignment of
byval.

That is, a byval dereferenceable(S) align A argument is guaranteed to
have
S bytes available to read from, *and only S*, aligned to a multiple
of A.
Reading past that size is UB, as LLVM will not copy more than S
bytes.

An API can be provided to add the attribute alongside dereferenceable
and align attributes, for a given Type* and DataLayout.

This sounds like a reasonable plan to me.

-Hal

Hi,

As there seems to be some concern that adding type attributes complicates the attributes system, I decided to experiment with implementing type attributes and byval as a type attribute.

Adding type attributes required mostly adding boilerplate code. The only slightly tricky part was updating the types in the linker (IRMover.cpp). This however seems to follow the common pattern in IRMover.cpp. See D16515 for the implementation [1].

Adding the byval attribute on top of this was straightforward as well and probably similar to adding an integer attribute. The tricky bit here was bitcode compatibility. When reading the attribute group, we don't have access to the parameter types, so the current implementation casts `-1` to Type * and fixes the attribute later (when the attribute set is referenced by an actual function). A similar hack would likely also be needed if it was an integer attribute. See D16516 for the implementation [2].

I think we can only go from byval(<type>) or byval(<size>) to dereferenceable(<size>), but not the other way around. In particular, the current definition of the dereferenceable attribute contains "It is legal for the number of bytes to be less than the size of the pointee type.".

-Manuel

[1] http://reviews.llvm.org/D16515
[2] http://reviews.llvm.org/D16516

Hi,

In the past months, several options have been presented for making byval
(and similar attributes, such as inalloca or sret) work with opaque pointers.

The main two I've seen were byval(T) and byval(N) where N is the size of T.

They both have their upsides and downsides, for example: byval(T) would be
a type-parametric attribute, which, AFAIK, does not already exist and may
complicate the attribute system significantly, while byval(N) would be hard
to introduce in tests as computing N from T requires LLVM's DataLayout.

Also, this would have to be done for inalloca and sret as well - sret only
needs it when targeting SPARC, although still generally useful in analysis.

To sidestep some of the concerns and allow a smooth transition towards a
byval that works with opaque pointers, I've come up with a new approach:

Reuse dereferenceable(S) and align A for the size and alignment of byval.

That is, a byval dereferenceable(S) align A argument is guaranteed to have
S bytes available to read from, *and only S*, aligned to a multiple of A.
Reading past that size is UB, as LLVM will not copy more than S bytes.

Just to make sure I understand, you are *not* planning on changing the semantics of the existing attributes right? The current semantics for "dereferenceable(N)" are "N bytes are known to be dereferenceable, more bytes might be". What worried my in your wording was the UB comment. It is not currently UB to have a read beyond the *known* dereferenceable bytes. This is important to me and I would be very hesitant to change it.

The semantics I described were for byval+dereferenceable(N), not
dereferenceable(N) alone,
i.e. LLVM will only guarantee exactly N bytes will be copied, no less, no
more.

Although with mjacob prototyping byval(T) so quickly, I do feel that my
work is obsoleted.

Hi,

In the past months, several options have been presented for making byval
(and similar attributes, such as inalloca or sret) work with opaque
pointers.

The main two I've seen were byval(T) and byval(N) where N is the size of
T.

They both have their upsides and downsides, for example: byval(T) would
be
a type-parametric attribute, which, AFAIK, does not already exist and may
complicate the attribute system significantly, while byval(N) would be
hard
to introduce in tests as computing N from T requires LLVM's DataLayout.

Also, this would have to be done for inalloca and sret as well - sret
only
needs it when targeting SPARC, although still generally useful in
analysis.

To sidestep some of the concerns and allow a smooth transition towards a
byval that works with opaque pointers, I've come up with a new approach:

Reuse dereferenceable(S) and align A for the size and alignment of byval.

That is, a byval dereferenceable(S) align A argument is guaranteed to
have
S bytes available to read from, *and only S*, aligned to a multiple of A.
Reading past that size is UB, as LLVM will not copy more than S bytes.

Just to make sure I understand, you are *not* planning on changing the
semantics of the existing attributes right? The current semantics for
"dereferenceable(N)" are "N bytes are known to be dereferenceable, more
bytes might be". What worried my in your wording was the UB comment. It
is not currently UB to have a read beyond the *known* dereferenceable
bytes. This is important to me and I would be very hesitant to change it.

The semantics I described were for byval+dereferenceable(N), not
dereferenceable(N) alone,
i.e. LLVM will only guarantee exactly N bytes will be copied, no less, no
more.

Although with mjacob prototyping byval(T) so quickly, I do feel that my
work is obsoleted.

Even with the prototype (which I haven't looked at in detail) I think this
thread (& your prototype) is still valuable & would like to continue
it/hear from others (especially those I cc'd earlier) about
preferences/ideas/tradeoffs in the design of this piece.