RFC: attribute for a pointer which is dereferenceable xor null

I’d like to propose that we add an attribute which expresses the notion that the specified value is either null or dereferenceable up to a fixed size. (Note the xor.) Our current dereferenceable(n) attribute doesn’t quite fit the bill, it implies that the pointer is non-null. Similarly, our nonnull attribute says nothing about dereferenceability.

There are two syntax proposals below, but let’s start with the motivation.

These semantics arise in a number of common cases:

  • In C, malloc is defined to either return null, or a dereferenceable region of the size requested.
  • In Java, any reference is either null or dereferenceable to the size of the static type.
  • I suspect this will also be useful for Julia, Go, Rust, and others for similar reasons.

With such an attribute available, we can increase the effectiveness of LICM. We can’t move a load outside a loop if it might introduce a fault. Knowing that a pointer is deferefenceable(N) at a location (i.e. the loop preheader) allows us to satisfy this constraint. In the near term, we can simply add a case in the dereferenceability analysis that combines the new attribute and isKnownNonNull. This won’t be too effective out of the box, but will enable testing with llvm.assumes and might catch some cases. I will probably also add a case to look at the controlling branch to the loop preheader since in practice that tends to be where a unswitched null check would live.

Longer term, I plan on introducing a mechanism to have isKnownNonNull consider trivially dominating conditions. This will make the proposed attribute more powerful, but is explicitly not part of this proposal. That’s a lot more work and will need a fair amount of discussion on its own.

Now, on to possible syntax.

Option 1
We could simply redefine our current notion of dereferenceable(N) to allow the pointer to be null. Since we already have the nonnull attribute, this wouldn’t loose any expressibility. Frontends would need to be modified to emit both dererefenceable(N) and nonnull if they want to preserve the same semantics. Most of the existing utility functions for dereferenceability in LLVM would be modified to just check both. There’d need to by a forward migration added to the bytecode parser to enable upgrade from the old semantics to the new.

This is my preferred option, but in offline conversation, Hal objected to this change. I’ll let him describe his objection since I was never quite clear on it.

Option 2
We introduce a new attribute with the desired semantics. This results in a collection of confusing overlapping attributes, but is otherwise straight forward.

My proposed strawman syntax would be: dereferenceable_or_null(N). (Bikeshedding welcomed.) This would be a legal parameter and return attribute on both function declarations and call sites (i.e. calls and invokes). As with above, we’d extend all the places that currently consider ‘dereferenceable’ to consider the new attribute in combination with isKnownNonNull.

Philip

From: "Philip Reames" <listmail@philipreames.com>
To: llvmdev@cs.uiuc.edu
Sent: Thursday, February 12, 2015 11:59:17 AM
Subject: [LLVMdev] RFC: attribute for a pointer which is dereferenceable xor null

I'd like to propose that we add an attribute which expresses the
notion that the specified value is either null or dereferenceable up
to a fixed size. (Note the xor.) Our current dereferenceable(n)
attribute doesn't quite fit the bill, it implies that the pointer is
non-null. Similarly, our nonnull attribute says nothing about
dereferenceability.

There are two syntax proposals below, but let's start with the
motivation.

These semantics arise in a number of common cases:
- In C, malloc is defined to either return null, or a dereferenceable
region of the size requested.

I think this is really only useful if we allowed 'n' to be a runtime value.

- In Java, any reference is either null or dereferenceable to the
size of the static type.
- I suspect this will also be useful for Julia, Go, Rust, and others
for similar reasons.

With such an attribute available, we can increase the effectiveness
of LICM. We can't move a load outside a loop if it might introduce a
fault. Knowing that a pointer is deferefenceable(N) at a location
(i.e. the loop preheader) allows us to satisfy this constraint. In
the near term, we can simply add a case in the dereferenceability
analysis that combines the new attribute and isKnownNonNull. This
won't be too effective out of the box, but will enable testing with
llvm.assumes and might catch some cases. I will probably also add a
case to look at the controlling branch to the loop preheader since
in practice that tends to be where a unswitched null check would
live.

Longer term, I plan on introducing a mechanism to have isKnownNonNull
consider trivially dominating conditions. This will make the
proposed attribute more powerful, but is explicitly not part of this
proposal. That's a lot more work and will need a fair amount of
discussion on its own.

Now, on to possible syntax.

Option 1
We could simply redefine our current notion of dereferenceable(N) to
allow the pointer to be null. Since we already have the nonnull
attribute, this wouldn't loose any expressibility. Frontends would
need to be modified to emit both dererefenceable(N) and nonnull if
they want to preserve the same semantics. Most of the existing
utility functions for dereferenceability in LLVM would be modified
to just check both. There'd need to by a forward migration added to
the bytecode parser to enable upgrade from the old semantics to the
new.

This is my preferred option, but in offline conversation, Hal
objected to this change. I'll let him describe his objection since I
was never quite clear on it.

I feel this would be all pain and no gain. We already have the dereferenceable attribute, and a fair about of code now exists which depends on the current semantics. Introducing a silent semantic change now requires, at least, all producers to be updated. Plus it would be confusing; we currently assume that dereferenceable pointers in address-space zero are not null (and optimize based on that). 'dereferenceable' is the terminology we use for that (not 'dereferenceableAndNotNull'), and I don't like the proposed inconsistency with our API. Lastly, it would be inconsistent with its name: a null pointer in address-space zero is not dereferenceable.

Option 2
We introduce a new attribute with the desired semantics. This results
in a collection of confusing overlapping attributes, but is
otherwise straight forward.

My proposed strawman syntax would be: dereferenceable_or_null(N).
(Bikeshedding welcomed.) This would be a legal parameter and return
attribute on both function declarations and call sites (i.e. calls
and invokes). As with above, we'd extend all the places that
currently consider 'dereferenceable' to consider the new attribute
in combination with isKnownNonNull.

Okay; I don't object to this attribute. Just so we're on the same page, what is your use case? Is it like the Java case you mentioned above? Also, I wonder: Are you satisfied with the static size constraint, or do you also want runtime sizes?

-Hal

From: "Philip Reames" <listmail@philipreames.com>
To: llvmdev@cs.uiuc.edu
Sent: Thursday, February 12, 2015 11:59:17 AM
Subject: [LLVMdev] RFC: attribute for a pointer which is dereferenceable xor null

I'd like to propose that we add an attribute which expresses the
notion that the specified value is either null or dereferenceable up
to a fixed size. (Note the xor.) Our current dereferenceable(n)
attribute doesn't quite fit the bill, it implies that the pointer is
non-null. Similarly, our nonnull attribute says nothing about
dereferenceability.

There are two syntax proposals below, but let's start with the
motivation.

These semantics arise in a number of common cases:
- In C, malloc is defined to either return null, or a dereferenceable
region of the size requested.

I think this is really only useful if we allowed 'n' to be a runtime value.

For malloc, you might have a point. However, I believe that the same is true for operator new and the size will frequently be a compile time constant there.

I am not proposing adding a 'n' as a runtime value. I am not opposed to it, but it's not part of this proposal.

- In Java, any reference is either null or dereferenceable to the
size of the static type.
- I suspect this will also be useful for Julia, Go, Rust, and others
for similar reasons.

With such an attribute available, we can increase the effectiveness
of LICM. We can't move a load outside a loop if it might introduce a
fault. Knowing that a pointer is deferefenceable(N) at a location
(i.e. the loop preheader) allows us to satisfy this constraint. In
the near term, we can simply add a case in the dereferenceability
analysis that combines the new attribute and isKnownNonNull. This
won't be too effective out of the box, but will enable testing with
llvm.assumes and might catch some cases. I will probably also add a
case to look at the controlling branch to the loop preheader since
in practice that tends to be where a unswitched null check would
live.

Longer term, I plan on introducing a mechanism to have isKnownNonNull
consider trivially dominating conditions. This will make the
proposed attribute more powerful, but is explicitly not part of this
proposal. That's a lot more work and will need a fair amount of
discussion on its own.

Now, on to possible syntax.

Option 1
We could simply redefine our current notion of dereferenceable(N) to
allow the pointer to be null. Since we already have the nonnull
attribute, this wouldn't loose any expressibility. Frontends would
need to be modified to emit both dererefenceable(N) and nonnull if
they want to preserve the same semantics. Most of the existing
utility functions for dereferenceability in LLVM would be modified
to just check both. There'd need to by a forward migration added to
the bytecode parser to enable upgrade from the old semantics to the
new.

This is my preferred option, but in offline conversation, Hal
objected to this change. I'll let him describe his objection since I
was never quite clear on it.

I feel this would be all pain and no gain. We already have the dereferenceable attribute, and a fair about of code now exists which depends on the current semantics. Introducing a silent semantic change now requires, at least, all producers to be updated. Plus it would be confusing; we currently assume that dereferenceable pointers in address-space zero are not null (and optimize based on that). 'dereferenceable' is the terminology we use for that (not 'dereferenceableAndNotNull'), and I don't like the proposed inconsistency with our API. Lastly, it would be inconsistent with its name: a null pointer in address-space zero is not dereferenceable.

I think this is a far smaller change than your indicating. There's only a handful of places in the code base that directly access the attributes; we'd extend them to check 'new deref' and 'nonnull' at once. As a result, most of the APIs would be semantically unchanged. We might want to rename them, but that's a separate and less risky change.

Your naming point is a reasonable one. I'm more okay with the seperation between "this has a dereferenceable attribute (but might still be null)", and "this pointer is dereferenceable. I think that in practice this confusion is likely to be less than introducing a parallel attribute.

Option 2
We introduce a new attribute with the desired semantics. This results
in a collection of confusing overlapping attributes, but is
otherwise straight forward.

My proposed strawman syntax would be: dereferenceable_or_null(N).
(Bikeshedding welcomed.) This would be a legal parameter and return
attribute on both function declarations and call sites (i.e. calls
and invokes). As with above, we'd extend all the places that
currently consider 'dereferenceable' to consider the new attribute
in combination with isKnownNonNull.

Okay; I don't object to this attribute. Just so we're on the same page, what is your use case? Is it like the Java case you mentioned above? Also, I wonder: Are you satisfied with the static size constraint, or do you also want runtime sizes?

My use case is the Java object case. I do not need runtime sizes.

Philip

From: "Philip Reames" <listmail@philipreames.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: llvmdev@cs.uiuc.edu
Sent: Friday, February 13, 2015 11:48:12 AM
Subject: Re: [LLVMdev] RFC: attribute for a pointer which is dereferenceable xor null

>> From: "Philip Reames" <listmail@philipreames.com>
>> To: llvmdev@cs.uiuc.edu
>> Sent: Thursday, February 12, 2015 11:59:17 AM
>> Subject: [LLVMdev] RFC: attribute for a pointer which is
>> dereferenceable xor null
>>
>>
>> I'd like to propose that we add an attribute which expresses the
>> notion that the specified value is either null or dereferenceable
>> up
>> to a fixed size. (Note the xor.) Our current dereferenceable(n)
>> attribute doesn't quite fit the bill, it implies that the pointer
>> is
>> non-null. Similarly, our nonnull attribute says nothing about
>> dereferenceability.
>>
>> There are two syntax proposals below, but let's start with the
>> motivation.
>>
>> These semantics arise in a number of common cases:
>> - In C, malloc is defined to either return null, or a
>> dereferenceable
>> region of the size requested.
> I think this is really only useful if we allowed 'n' to be a
> runtime value.
For malloc, you might have a point. However, I believe that the same
is
true for operator new and the size will frequently be a compile time
constant there.

I am not proposing adding a 'n' as a runtime value. I am not opposed
to
it, but it's not part of this proposal.
>
>> - In Java, any reference is either null or dereferenceable to the
>> size of the static type.
>> - I suspect this will also be useful for Julia, Go, Rust, and
>> others
>> for similar reasons.
>>
>> With such an attribute available, we can increase the
>> effectiveness
>> of LICM. We can't move a load outside a loop if it might introduce
>> a
>> fault. Knowing that a pointer is deferefenceable(N) at a location
>> (i.e. the loop preheader) allows us to satisfy this constraint. In
>> the near term, we can simply add a case in the dereferenceability
>> analysis that combines the new attribute and isKnownNonNull. This
>> won't be too effective out of the box, but will enable testing
>> with
>> llvm.assumes and might catch some cases. I will probably also add
>> a
>> case to look at the controlling branch to the loop preheader since
>> in practice that tends to be where a unswitched null check would
>> live.
>>
>> Longer term, I plan on introducing a mechanism to have
>> isKnownNonNull
>> consider trivially dominating conditions. This will make the
>> proposed attribute more powerful, but is explicitly not part of
>> this
>> proposal. That's a lot more work and will need a fair amount of
>> discussion on its own.
>>
>> Now, on to possible syntax.
>>
>> Option 1
>> We could simply redefine our current notion of dereferenceable(N)
>> to
>> allow the pointer to be null. Since we already have the nonnull
>> attribute, this wouldn't loose any expressibility. Frontends would
>> need to be modified to emit both dererefenceable(N) and nonnull if
>> they want to preserve the same semantics. Most of the existing
>> utility functions for dereferenceability in LLVM would be modified
>> to just check both. There'd need to by a forward migration added
>> to
>> the bytecode parser to enable upgrade from the old semantics to
>> the
>> new.
>>
>> This is my preferred option, but in offline conversation, Hal
>> objected to this change. I'll let him describe his objection since
>> I
>> was never quite clear on it.
> I feel this would be all pain and no gain. We already have the
> dereferenceable attribute, and a fair about of code now exists
> which depends on the current semantics. Introducing a silent
> semantic change now requires, at least, all producers to be
> updated. Plus it would be confusing; we currently assume that
> dereferenceable pointers in address-space zero are not null (and
> optimize based on that). 'dereferenceable' is the terminology we
> use for that (not 'dereferenceableAndNotNull'), and I don't like
> the proposed inconsistency with our API. Lastly, it would be
> inconsistent with its name: a null pointer in address-space zero
> is not dereferenceable.
I think this is a far smaller change than your indicating. There's
only
a handful of places in the code base that directly access the
attributes; we'd extend them to check 'new deref' and 'nonnull' at
once. As a result, most of the APIs would be semantically unchanged.
We might want to rename them, but that's a separate and less risky
change.

Your naming point is a reasonable one. I'm more okay with the
seperation between "this has a dereferenceable attribute (but might
still be null)", and "this pointer is dereferenceable. I think that
in
practice this confusion is likely to be less than introducing a
parallel
attribute.
>
>> Option 2
>> We introduce a new attribute with the desired semantics. This
>> results
>> in a collection of confusing overlapping attributes, but is
>> otherwise straight forward.
>>
>> My proposed strawman syntax would be: dereferenceable_or_null(N).
>> (Bikeshedding welcomed.) This would be a legal parameter and
>> return
>> attribute on both function declarations and call sites (i.e. calls
>> and invokes). As with above, we'd extend all the places that
>> currently consider 'dereferenceable' to consider the new attribute
>> in combination with isKnownNonNull.
> Okay; I don't object to this attribute. Just so we're on the same
> page, what is your use case? Is it like the Java case you
> mentioned above? Also, I wonder: Are you satisfied with the static
> size constraint, or do you also want runtime sizes?
>
My use case is the Java object case. I do not need runtime sizes.

I'm fine with reviewing patches for dereferenceable_or_null(N), with N some constant. I'd like to address the runtime size issue at some point, but we can address that at some other point (it might require something other than an attribute anyway).

-Hal