UBSan, StringRef and Allocator.h

Hi all

(No idea if I have the correct audience. Please CC more people as needed).

I have an UBSan failure in BumpPtrAllocatorImpl.Allocate.

The problem is that lld requests that we StringRef::copy an empty string. This passes a length of 0 to a BumpPtrAllocator. The BumpPtrAllocator happened to not have anything allocated yet so the CurPtr is nullptr, but given that we need 0 space we think we have enough space and return an allocation of size 0 at address nullptr. This therefore returns nullptr from Allocate, but that method is marked with LLVM_ATTRIBUTE_RETURNS_NONNULL and LLVM_ATTRIBUTE_RETURNS_NOALIAS, both of which aren’t true in this case.

To put this in code, if I have

BumpPtrAllocator allocator;
StringRef s;
s.copy(allocator);

then i’m going to allocate 0 bytes in the allocator and get a StringRef(nullptr, 0). Its a valid StringRef, but an UBSan failures in the allocator.

Lang and I looked up malloc behaviour online as this is fairly analogous. The answer there is that you are allowed to return nullptr, or not, its implementation defined. So no help there.

So the question is, how do we want this to behave in our code?

Some options:

  • Assert that Allocate never gets a size 0 allocation. So fix StringRef::copy to see this case
  • Remove the attributes from Allocate but continue to return nullptr (or base of the allocator?) in this case
  • Keep the attributes on Allocate and treat size 0 allocations as size 1

Thanks,
Pete

Hi all

(No idea if I have the correct audience. Please CC more people as needed).

I have an UBSan failure in BumpPtrAllocatorImpl.Allocate.

The problem is that lld requests that we StringRef::copy an empty string.
This passes a length of 0 to a BumpPtrAllocator. The BumpPtrAllocator
happened to not have anything allocated yet so the CurPtr is nullptr, but
given that we need 0 space we think we have enough space and return an
allocation of size 0 at address nullptr. This therefore returns nullptr
from Allocate, but that method is marked
with LLVM_ATTRIBUTE_RETURNS_NONNULL and LLVM_ATTRIBUTE_RETURNS_NOALIAS,
both of which aren’t true in this case.

To put this in code, if I have

BumpPtrAllocator allocator;
StringRef s;
s.copy(allocator);

then i’m going to allocate 0 bytes in the allocator and get a
StringRef(nullptr, 0). Its a valid StringRef, but an UBSan failures in the
allocator.

Lang and I looked up malloc behaviour online as this is fairly analogous.
The answer there is that you are allowed to return nullptr, or not, its
implementation defined. So no help there.

So the question is, how do we want this to behave in our code?

Some options:
- Assert that Allocate never gets a size 0 allocation. So fix
StringRef::copy to see this case
- Remove the attributes from Allocate but continue to return nullptr (or
base of the allocator?) in this case
- Keep the attributes on Allocate and treat size 0 allocations as size 1

I believe the last is closer to 'new's behavior - which I think returns a
unique non-null address (well, unique amongst current allocations - can be
recycled once deleted) if I recall correctly. Can check for wording if
that's helpful/desired.

Hi all

(No idea if I have the correct audience. Please CC more people as needed).

I have an UBSan failure in BumpPtrAllocatorImpl.Allocate.

The problem is that lld requests that we StringRef::copy an empty string. This passes a length of 0 to a BumpPtrAllocator. The BumpPtrAllocator happened to not have anything allocated yet so the CurPtr is nullptr, but given that we need 0 space we think we have enough space and return an allocation of size 0 at address nullptr. This therefore returns nullptr from Allocate, but that method is marked with LLVM_ATTRIBUTE_RETURNS_NONNULL and LLVM_ATTRIBUTE_RETURNS_NOALIAS, both of which aren’t true in this case.

To put this in code, if I have

BumpPtrAllocator allocator;
StringRef s;
s.copy(allocator);

then i’m going to allocate 0 bytes in the allocator and get a StringRef(nullptr, 0). Its a valid StringRef, but an UBSan failures in the allocator.

Lang and I looked up malloc behaviour online as this is fairly analogous. The answer there is that you are allowed to return nullptr, or not, its implementation defined. So no help there.

So the question is, how do we want this to behave in our code?

Some options:

  • Assert that Allocate never gets a size 0 allocation. So fix StringRef::copy to see this case
  • Remove the attributes from Allocate but continue to return nullptr (or base of the allocator?) in this case
  • Keep the attributes on Allocate and treat size 0 allocations as size 1

I believe the last is closer to 'new’s behavior - which I think returns a unique non-null address (well, unique amongst current allocations - can be recycled once deleted) if I recall correctly.

That’s what I would have expected too. Its like sizeof(struct {}) which can be a 1 depending on factors we don’t need to get in to here.

Can check for wording if that’s helpful/desired.

No need for my benefit :slight_smile: I’m in agreement that this is a good behavior to go for, but will leave it to others to say if they’d like the extra detail.

One thing I did forget to say is that I’d like to fix StringRef::copy in all of the above cases. I think that this method should always avoid the allocator and return StringRef(nullptr, 0) when length is 0. I’ll get a patch up on llvm-commits if there’s no objections there.

Thanks,
Pete

Hi all

(No idea if I have the correct audience. Please CC more people as
needed).

I have an UBSan failure in BumpPtrAllocatorImpl.Allocate.

The problem is that lld requests that we StringRef::copy an empty
string. This passes a length of 0 to a BumpPtrAllocator. The
BumpPtrAllocator happened to not have anything allocated yet so the CurPtr
is nullptr, but given that we need 0 space we think we have enough space
and return an allocation of size 0 at address nullptr. This therefore
returns nullptr from Allocate, but that method is marked
with LLVM_ATTRIBUTE_RETURNS_NONNULL and LLVM_ATTRIBUTE_RETURNS_NOALIAS,
both of which aren’t true in this case.

To put this in code, if I have

BumpPtrAllocator allocator;
StringRef s;
s.copy(allocator);

then i’m going to allocate 0 bytes in the allocator and get a
StringRef(nullptr, 0). Its a valid StringRef, but an UBSan failures in the
allocator.

Lang and I looked up malloc behaviour online as this is fairly
analogous. The answer there is that you are allowed to return nullptr, or
not, its implementation defined. So no help there.

So the question is, how do we want this to behave in our code?

Some options:
- Assert that Allocate never gets a size 0 allocation. So fix
StringRef::copy to see this case
- Remove the attributes from Allocate but continue to return nullptr (or
base of the allocator?) in this case
- Keep the attributes on Allocate and treat size 0 allocations as size 1

I believe the last is closer to 'new's behavior - which I think returns a
unique non-null address (well, unique amongst current allocations - can be
recycled once deleted) if I recall correctly.

That’s what I would have expected too. Its like sizeof(struct {}) which
can be a 1 depending on factors we don’t need to get in to here.

Can check for wording if that's helpful/desired.

No need for my benefit :slight_smile: I’m in agreement that this is a good behavior
to go for, but will leave it to others to say if they’d like the extra
detail.

One thing I did forget to say is that I’d like to fix StringRef::copy in
all of the above cases. I think that this method should always avoid the
allocator and return StringRef(nullptr, 0) when length is 0. I’ll get a
patch up on llvm-commits if there’s no objections there.

Sounds plausible to me

Hi all

(No idea if I have the correct audience. Please CC more people as needed).

I have an UBSan failure in BumpPtrAllocatorImpl.Allocate.

The problem is that lld requests that we StringRef::copy an empty string.
This passes a length of 0 to a BumpPtrAllocator. The BumpPtrAllocator
happened to not have anything allocated yet so the CurPtr is nullptr, but
given that we need 0 space we think we have enough space and return an
allocation of size 0 at address nullptr. This therefore returns nullptr
from Allocate, but that method is marked with LLVM_ATTRIBUTE_RETURNS_NONNULL
and LLVM_ATTRIBUTE_RETURNS_NOALIAS, both of which aren’t true in this case.

Why is noalias not valid here?

I thought LLVM's notion of noalias was based on data dependence --
ptr_a does not alias ptr_b if a store to ptr_a does not have a data
dependence with a load from ptr_b and vice versa (and for a zero-sized
allocation there can be no data dependence since there cannot legally
be such a load/store). Is the C++ notion of noalias different?

To put this in code, if I have

BumpPtrAllocator allocator;
StringRef s;
s.copy(allocator);

then i’m going to allocate 0 bytes in the allocator and get a
StringRef(nullptr, 0). Its a valid StringRef, but an UBSan failures in the
allocator.

Lang and I looked up malloc behaviour online as this is fairly analogous.
The answer there is that you are allowed to return nullptr, or not, its
implementation defined. So no help there.

So the question is, how do we want this to behave in our code?

Some options:
- Assert that Allocate never gets a size 0 allocation. So fix
StringRef::copy to see this case
- Remove the attributes from Allocate but continue to return nullptr (or
base of the allocator?) in this case
- Keep the attributes on Allocate and treat size 0 allocations as size 1

A fourth option would be return a non-null pointer (maybe the address
of some static char) for allocations of size 0; unless I'm mistaken
about the noalias bit above.

-- Sanjoy

Hi all

(No idea if I have the correct audience. Please CC more people as needed).

I have an UBSan failure in BumpPtrAllocatorImpl.Allocate.

The problem is that lld requests that we StringRef::copy an empty string.
This passes a length of 0 to a BumpPtrAllocator. The BumpPtrAllocator
happened to not have anything allocated yet so the CurPtr is nullptr, but
given that we need 0 space we think we have enough space and return an
allocation of size 0 at address nullptr. This therefore returns nullptr
from Allocate, but that method is marked with LLVM_ATTRIBUTE_RETURNS_NONNULL
and LLVM_ATTRIBUTE_RETURNS_NOALIAS, both of which aren’t true in this case.

Why is noalias not valid here?

I thought LLVM’s notion of noalias was based on data dependence –
ptr_a does not alias ptr_b if a store to ptr_a does not have a data
dependence with a load from ptr_b and vice versa (and for a zero-sized
allocation there can be no data dependence since there cannot legally
be such a load/store). Is the C++ notion of noalias different?

That was an assumption on my part that “allocate(0) != allocate(0)’. That is, I didn’t think you needed to dereference a pointer for no alias, just compare it. I could be wrong as I’m just assuming thats the behavior without any formal knowledge on the subject.

To put this in code, if I have

BumpPtrAllocator allocator;
StringRef s;
s.copy(allocator);

then i’m going to allocate 0 bytes in the allocator and get a
StringRef(nullptr, 0). Its a valid StringRef, but an UBSan failures in the
allocator.

Lang and I looked up malloc behaviour online as this is fairly analogous.
The answer there is that you are allowed to return nullptr, or not, its
implementation defined. So no help there.

So the question is, how do we want this to behave in our code?

Some options:

  • Assert that Allocate never gets a size 0 allocation. So fix
    StringRef::copy to see this case
  • Remove the attributes from Allocate but continue to return nullptr (or
    base of the allocator?) in this case
  • Keep the attributes on Allocate and treat size 0 allocations as size 1

A fourth option would be return a non-null pointer (maybe the address
of some static char) for allocations of size 0; unless I’m mistaken
about the noalias bit above.

Interesting. I didn’t think of that, but there’s no reason that wouldn’t work, assuming the noalias is ok as you say.

Cheers,
Pete

I was quoting from AliasAnalysis.h:

// - NoAlias doesn't imply inequal pointers. The most obvious example of this
// is two pointers to constant memory. Even if they are equal, constant
// memory is never stored to, so there will never be any dependencies.
// In this and other situations, the pointers may be both NoAlias and
// MustAlias at the same time. The current API can only return one result,
// though this is rarely a problem in practice.

but there is the caveat that the C++ level noalias annotation may be different
from LLVM's NoAlias, something I'm not a 100% sure of.

-- Sanjoy

Well except that if sizeof(struct{}) is 1, the allocator is never called with a 0.

I would consider forbidding zero sized allocation in the allocator (assert()) by design (hey we’re controlling every possible uses!), unless there is a real use-case for that.

This would also be in line with the C++ standard requirement for allocator which specifies that the result of “a.allocate(0)” is unspecified (ref: C++14 Table 28 — Allocator requirements).

FWIW, I agree with Mehdi that we should just assert that our types don’t get called with size zero.

That said, I don’t think we can be terribly cavalier with what we expect from standard allocator types, operator new, or malloc. And so I would expect LLVM_ATTRIBUTE_RETURNS_NOALIAS to not imply NONNULL, and while it seems reasonable to put NONNULL on our allocate function because of the assert and the fact that we assume the underlying allocation routine never produces a null, it doesn’t seem reasonable for any old function with NOALIAS to have NONNULL inferred.

-Chandler

FWIW, I agree with Mehdi that we should just assert that our types don’t get called with size zero.

Yeah, I agree. I’ve tried running the tests with the assert in place and there’s about 1000 failures across llvm/clang. I’ll see what I can fix as I would like to get these to behave. There may be actual logic errors in some of these cases.

That said, I don’t think we can be terribly cavalier with what we expect from standard allocator types, operator new, or malloc. And so I would expect LLVM_ATTRIBUTE_RETURNS_NOALIAS to not imply NONNULL, and while it seems reasonable to put NONNULL on our allocate function because of the assert and the fact that we assume the underlying allocation routine never produces a null, it doesn’t seem reasonable for any old function with NOALIAS to have NONNULL inferred.

I agree. I don’t actually know if NOALIAS and NONNULL were put on this function independently, or if the committer assumed one implied the other. But I do agree that they should be independent in general, even if they happen to both apply here.

Cheers,
Pete

FWIW, I agree with Mehdi that we should just assert that our types don’t get called with size zero.

Yeah, I agree. I’ve tried running the tests with the assert in place and there’s about 1000 failures across llvm/clang. I’ll see what I can fix as I would like to get these to behave. There may be actual logic errors in some of these cases.

Well this is fun. There is a long standing bug in the SelectionDAG tablegen matcher this has uncovered. Seems that ComplexPattern’s don’t track MemRefs. It probably (hopefully!) just causes downstream code to be conservative but its possible there’s real issues being hidden by this.

The problematic code is something like ‘xop(load) → X86XOPrm’ where the resulting X86XOP MachineInstr wouldn’t have an MachineMemOperand’s, despite being marked as loading from memory.

Will file a PR so that we can track some of these issues.

Pete

Coming late to this discussion since I just noticed this was talking about MemRefs… We have deep and horrible problems around our handling of MemoryOperands. As it stands, there does not appear to be a single invariant which is documented and accepted throughout the various backends. There are conflicting assumptions in various bits of code. I tried to clean up some of this a while back and made some forward progress, but there’s a lot of work left to do here.

Coming late to this discussion since I just noticed this was talking about MemRefs… We have deep and horrible problems around our handling of MemoryOperands. As it stands, there does not appear to be a single invariant which is documented and accepted throughout the various backends.

Yeah, i’ve been noticing the same thing :frowning:

There are conflicting assumptions in various bits of code. I tried to clean up some of this a while back and made some forward progress, but there’s a lot of work left to do here.

Yeah, the piece I was seeing seemed local to the DAG matcher and complex patterns, until I realized that fixing it probably requires changing all of the complex pattern matchers which need to handle MemRefs to actually return the list of MemRefs they are interested in. This would impact pretty much every backend so isn’t something I want to suggest without thinking of less invasive alternatives.

Cheers,
Pete

If you want to chat about this, I’m happy to take some time on IRC, phone, or in person. I don’t have the time to devote to fixing this, but I’m happy to help strategize.