[RFC] Attributes on Values

Hi everyone,

Nick and Philip suggested something yesterday that I'd also thought about: supporting attributes on values (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). The primary motivation for this is to provide a way of attaching pointer information, such as noalias, nonnull and dereferenceable(n), to pointers generated by loads. Doing this for pointers generated by inttoptr also seems potentially useful.

A problem that we currently have with C++ lambda captures from Clang (and will have with other similar constructs, like outlined OpenMP loop bodies), is that we end up packing things that would be function parameters into structures to pass to the generated function. Unfortunately, this means that, in the generated anonymous function, these values are generated by loads (not true function parameters), and so we lose our ability to assert things about them (like noalias, nonnull, etc.).

How this might work:

1. Instead of only CallInst/InvokeInst having an AttributeSet member variable, all instructions would have one. Accessor functions like getAttributes would be moved from CallInst/InvokeInst to Instruction. Only 'AttributeSet::ReturnIndex' would be meaningful on most instructions.

2. For the text IR format: Like with call/invoke currently, we'd optionally parse attributes in between the instruction name ('load', etc.) and the first type parameter. So load would become:
  <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align <alignment>] (...)
allowing you to write:
  %val = load dereferenceable(32) i32** %ptr

3. The bitcode format extension would mirror the setup for metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record array of [Instruction #, Attribute Set #]. This will allow attaching attribute sets to any instruction without increasing the bitcode record sizes for the instructions themselves.

Obviously there is some potential functionality overlap here with metadata, and an alternate approach would be to add a metadata type that mirrors each attribute we'd like to model on values. Potential downsides to using metadata are:
- It could be a lot of metadata (consider adding nonnull/dereferenceable(n) to every load of a C++ reference type)
- We already have code that handles these as return attributes on call instructions; extending that to look at all instructions seems straightforward. Adding alternate code to also look at metadata also would require additional development for each attribute.

One thing to keep in mind is that, like with metadata, the attributes can have control-flow dependencies, and we'd generally need to clear them when hoisting loads, etc. We already do this with metadata, at least for loads, so the places where we'd need to do this should hopefully be relatively-easy to find.

What do you think?

Thanks again,
Hal

From a very high level, I wonder if maybe we should have encoded nonnull and dereferenceable as metadata instead of attributes. To me it seems like the dividing line is what is an optimization hint vs. what is a requirement for correctness or functionality. For example, sanitize_address, byval, sret, inreg, and inalloca cannot be dropped without changing program semantics. On the other hand, noalias, nonnull, readonly, readnone, and dereferenceable can all be dropped in an -O0 build.

I think the bar for LangRef changes is sufficiently high that we can suffer a little bit of implementation complexity and use the existing metadata support for values instead of attributes. We could probably add a pass that infers metadata from call site and function prototype attributes.

Just my two cents, you don’t need listen to me, I don’t write optimizations, I just break them. :wink:

Hi everyone,

Nick and Philip suggested something yesterday that I'd also thought about: supporting attributes on values (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). The primary motivation for this is to provide a way of attaching pointer information, such as noalias, nonnull and dereferenceable(n), to pointers generated by loads. Doing this for pointers generated by inttoptr also seems potentially useful.

A problem that we currently have with C++ lambda captures from Clang (and will have with other similar constructs, like outlined OpenMP loop bodies), is that we end up packing things that would be function parameters into structures to pass to the generated function. Unfortunately, this means that, in the generated anonymous function, these values are generated by loads (not true function parameters), and so we lose our ability to assert things about them (like noalias, nonnull, etc.).

How this might work:

1. Instead of only CallInst/InvokeInst having an AttributeSet member variable, all instructions would have one. Accessor functions like getAttributes would be moved from CallInst/InvokeInst to Instruction. Only 'AttributeSet::ReturnIndex' would be meaningful on most instructions.

Do you have any idea what impact this would have on memory use? We’ve been seeing a lot of small regressions in unoptimized compile times, and they’re really starting to add up. Adding a member to all instructions will have a cost even when not optimizing. We need to be more careful about that.

2. For the text IR format: Like with call/invoke currently, we'd optionally parse attributes in between the instruction name ('load', etc.) and the first type parameter. So load would become:
<result> = load [ret attrs] [volatile] <ty>* <pointer>[, align <alignment>] (...)
allowing you to write:
%val = load dereferenceable(32) i32** %ptr

3. The bitcode format extension would mirror the setup for metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record array of [Instruction #, Attribute Set #]. This will allow attaching attribute sets to any instruction without increasing the bitcode record sizes for the instructions themselves.

Obviously there is some potential functionality overlap here with metadata, and an alternate approach would be to add a metadata type that mirrors each attribute we'd like to model on values. Potential downsides to using metadata are:
- It could be a lot of metadata (consider adding nonnull/dereferenceable(n) to every load of a C++ reference type)

Metadata is really expensive but we could simply omit it when not optimizing. The tradeoff isn’t obvious to me, but it seems like it would be OK if limited to C++ lambda captures and things like outlined OpenMP loop bodies. If you’re talking about moving the attributes that are currently on call/invoke instructions to metadata, the cost may be too high.

From: "Bob Wilson" <bob.wilson@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM-Dev" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, September 9, 2014 2:34:05 PM
Subject: Re: [LLVMdev] [RFC] Attributes on Values

>
> Hi everyone,
>
> Nick and Philip suggested something yesterday that I'd also thought
> about: supporting attributes on values
> (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html).
> The primary motivation for this is to provide a way of attaching
> pointer information, such as noalias, nonnull and
> dereferenceable(n), to pointers generated by loads. Doing this for
> pointers generated by inttoptr also seems potentially useful.
>
> A problem that we currently have with C++ lambda captures from
> Clang (and will have with other similar constructs, like outlined
> OpenMP loop bodies), is that we end up packing things that would
> be function parameters into structures to pass to the generated
> function. Unfortunately, this means that, in the generated
> anonymous function, these values are generated by loads (not true
> function parameters), and so we lose our ability to assert things
> about them (like noalias, nonnull, etc.).
>
> How this might work:
>
> 1. Instead of only CallInst/InvokeInst having an AttributeSet
> member variable, all instructions would have one. Accessor
> functions like getAttributes would be moved from
> CallInst/InvokeInst to Instruction. Only
> 'AttributeSet::ReturnIndex' would be meaningful on most
> instructions.

Do you have any idea what impact this would have on memory use? We’ve
been seeing a lot of small regressions in unoptimized compile times,
and they’re really starting to add up. Adding a member to all
instructions will have a cost even when not optimizing. We need to
be more careful about that.

I don't know, but I'll see if I can make some reasonable measurements. Another possibility would be to add the AttributeSet only to LoadInst, IntToPtr, etc. corresponding to the use cases I currently understand.

>
> 2. For the text IR format: Like with call/invoke currently, we'd
> optionally parse attributes in between the instruction name
> ('load', etc.) and the first type parameter. So load would become:
> <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align
> <alignment>] (...)
> allowing you to write:
> %val = load dereferenceable(32) i32** %ptr
>
> 3. The bitcode format extension would mirror the setup for
> metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be
> added with a record array of [Instruction #, Attribute Set #].
> This will allow attaching attribute sets to any instruction
> without increasing the bitcode record sizes for the instructions
> themselves.
>
> Obviously there is some potential functionality overlap here with
> metadata, and an alternate approach would be to add a metadata
> type that mirrors each attribute we'd like to model on values.
> Potential downsides to using metadata are:
> - It could be a lot of metadata (consider adding
> nonnull/dereferenceable(n) to every load of a C++ reference type)

Metadata is really expensive but we could simply omit it when not
optimizing. The tradeoff isn’t obvious to me, but it seems like it
would be OK if limited to C++ lambda captures and things like
outlined OpenMP loop bodies. If you’re talking about moving the
attributes that are currently on call/invoke instructions to
metadata, the cost may be too high.

We could omit the attributes also when not optimizing (maybe we should be doing that now for the function parameter attributes we do emit). I'm not proposing to move attributes currently on call/invoke to metadata (although Reid did suggest that, also noting that they could be omitted at -O0), but if we had these attributes (nonnull, dereferenceable, etc.) on loads, for example, we could certainly emit them in more places in Clang's CodeGen -- this would likely provide improvements not just in lambda captures, etc. (although that is a key use case for this functionality, however we implement it).

Thanks again,
Hal

Hi everyone,

Nick and Philip suggested something yesterday that I'd also thought about: supporting attributes on values (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). The primary motivation for this is to provide a way of attaching pointer information, such as noalias, nonnull and dereferenceable(n), to pointers generated by loads. Doing this for pointers generated by inttoptr also seems potentially useful.

A problem that we currently have with C++ lambda captures from Clang (and will have with other similar constructs, like outlined OpenMP loop bodies), is that we end up packing things that would be function parameters into structures to pass to the generated function. Unfortunately, this means that, in the generated anonymous function, these values are generated by loads (not true function parameters), and so we lose our ability to assert things about them (like noalias, nonnull, etc.).

How this might work:

  1. Instead of only CallInst/InvokeInst having an AttributeSet member variable, all instructions would have one. Accessor functions like getAttributes would be moved from CallInst/InvokeInst to Instruction. Only 'AttributeSet::ReturnIndex' would be meaningful on most instructions.

From a usability perspective, I want to rename the enum (or maybe provide an alias?). More generally, the interface around attributes is an utter mess to work with. Before attempting to replace metadata (either in part or completely), I'd want to invest in creating an attributes API which was easier to understand and extend.

To address the size question asked in a followup, I'll respond somewhat glibly. We already have metadata on a Value, how is having Attributes in their place any different? (This is assuming we completely merge metadata and attributes. No one has seriously proposed doing that yet. Alternatively, we could merge the storage and preserve the interface separation if we thought that was useful.)

  2. For the text IR format: Like with call/invoke currently, we'd optionally parse attributes in between the instruction name ('load', etc.) and the first type parameter. So load would become:
   <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align <alignment>] (...)
allowing you to write:
   %val = load dereferenceable(32) i32** %ptr

  3. The bitcode format extension would mirror the setup for metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record array of [Instruction #, Attribute Set #]. This will allow attaching attribute sets to any instruction without increasing the bitcode record sizes for the instructions themselves.

Obviously there is some potential functionality overlap here with metadata, and an alternate approach would be to add a metadata type that mirrors each attribute we'd like to model on values. Potential downsides to using metadata are:
  - It could be a lot of metadata (consider adding nonnull/dereferenceable(n) to every load of a C++ reference type)

I can't really comment on the implications of this. I don't actually know how wasteful metadata is space wise. So far, I've never cared in practice.

  - We already have code that handles these as return attributes on call instructions; extending that to look at all instructions seems straightforward. Adding alternate code to also look at metadata also would require additional development for each attribute.

I honestly don't see there being that much duplication. I'd prefer to see us taking an approach of getting it working, then generalizing. If we do find there's a lot of duplication over time and that simple functional abstraction doesn't abstract it well, we can revisit.

One thing to keep in mind is that, like with metadata, the attributes can have control-flow dependencies, and we'd generally need to clear them when hoisting loads, etc. We already do this with metadata, at least for loads, so the places where we'd need to do this should hopefully be relatively-easy to find.

Agreed. This is non-trivial.

What do you think?

Overall, I think supporting one unified metadata/attribute model long term is a good idea. We need to support all the various use cases (well known properties, prototyping, easy extensions, required vs dropable, etc..), but having two mechanisms is less than ideal. I don't believe the separation is an urgent problem. I don't believe that work on improving our ability to optimize by proving hints using metadata should be held on this idea goal. (Unless you're volunteering to do the work on this in the near future. In which case, yeah!)

Philip

Attributes and metadata aren't completely interchangeable. Metadata is
semantically unimportant and can always be dropped (TBAA, debug info,
stuff). Attributes can control semantically important things like how to
pass a parameter, and can't necessarily be dropped.

I agree that the semantic separation between “stuff that can be dropped” and “stuff that can’t” needs to be preserved. I would argue that this distinction is not tied to the implementation of metadata and attributes today. As someone else pointed out in this thread, there are some attributes (nonnull for instance) which can be safely dropped without effecting correctness. Utter side note: Has anyone written a utility pass to drop “safe to drop” metadata and attributes? Running the test suite with and without such a pass in the pass ordering might be a good way to identify places where such attributes have accidentally picked up functionality relevant semantics. :slight_smile: Philip