RFC: Implementing -fno-delete-null-pointer-checks in clang

I’m working with an embedded architecture that could, in some situations, run faster if code or data could be located at address zero. I don’t know whether this applies to other embedded chips.

Thanks Tim and Eli,
I should have put the GCC description for the flag in the email.

Regarding the suggestion to DataLayout, It is an interesting idea. I like it in particular since it will avoid creating a new llvm optimization flag and is auto embedded in IR.

Given that, how do we want to proceed, do we want to add yet another field to the DataLayout string?

e.g.

  1. Add to already overloaded pointer types? e.g. p[n]::::: <NullPointerValid ? 1 : 0>
  2. (My preference) Or create another unique field identifier: e.g. np[n]: <NullPointerValid ? 1 : 0>

Thanks,
Manoj

Modifying the datalayout is not a good idea; it doesn’t interact with LTO correctly, and the frontend and the backend generally need to agree on the datalayout. You could maybe use a module flag, if the address-space thing doesn’t work for some reason, but we don’t like to introduce more of those if we can avoid it. -Eli

Actually, thinking about it a bit more, a function attribute would be better than a module flag. But I’d still like to explore the address-space thing first, since we already have the LLVM infrastructure to make that work. -Eli

Probably a fair point, though I think it has a subtle effect on how
the feature is documented. It becomes more about whether code in this
translation unit is permitted to access an object at 0 than whether
such an object exists.

Cheers.

Tim.

Thanks Eli,

I was looking around for the cases where AddrSpace !=0 are checked. Seems like there are a bunch of optimizations that will fail to apply for non zero address spaces.
So I’ll start with the function attribute approach.

-Manoj

Isn't that exactly what we want? Did you look in enough detail to
determine that these optimizations *should* have applied? I'd be
pretty interested to know what we disable for the others, because the
the null thing is the only guarantee I knew about that we made.

Also, I'm pretty unconvinced optimization should be our first priority
here. That smacks of the untenable (IMO) option 2. Whatever we do I
want a program with well-defined semantics at the end. Starting from a
known good position and enabling desirable optimizations that we have
decided are valid is a significantly better path than starting with
"you might get lucky" and disabling extra optimizations that are
reported to us.

I realise this actually argues against my datalayout suggestion too,
so I'm getting a lot more convinced by Eli (& Duncan).

Cheers.

Tim.

I was looking around for the cases where AddrSpace !=0 are checked. Seems
like there are a bunch of optimizations that will fail to apply for non zero
address spaces.

Isn’t that exactly what we want? Did you look in enough detail to
determine that these optimizations should have applied? I’d be
pretty interested to know what we disable for the others, because the
the null thing is the only guarantee I knew about that we made.

Also, I’m pretty unconvinced optimization should be our first priority
here. That smacks of the untenable (IMO) option 2. Whatever we do I
want a program with well-defined semantics at the end. Starting from a
known good position and enabling desirable optimizations that we have
decided are valid is a significantly better path than starting with
“you might get lucky” and disabling extra optimizations that are
reported to us.

I realise this actually argues against my datalayout suggestion too,
so I’m getting a lot more convinced by Eli (& Duncan).

Cheers.

Tim.

Thanks a lot Tim,

My understanding is we only want to disable the optimizations regarding undefined behavior
related to null pointer deference optimizations. And address space checks will end up
disabling more optimizations than needed.
I did look at some of the optimizations/transforms and there are some that we definitely want to keep.

Just a quick example from grepping: lib/Transforms/Scalar/LoopIdiomRecognize.cpp


// Don’t create memset_pattern16s with address spaces.
StorePtr->getType()->getPointerAddressSpace() == 0 &&
(PatternValue = getMemSetPatternValue(StoredVal, DL))) {
// It looks like we can use PatternValue!
return LegalStoreKind::MemsetPattern;
}

Even worse, Sanitizers do NOT work with address spaces which is a big deal breaker IMO.

Since address spaces and null pointers are really orthogonal issues, I would prefer to
not conflate them.

In addition, It is already not easy to convince Linux Kernel maintainers to accept clang specific patches.
Worse performance when compared to GCC may make it even harder to push more patches.
(There are already many complains about clang not supporting optimizations that Linux kernel is used to.

As a side note: x86 maintainers deliberately broke clang support in upstream (https://lkml.org/lkml/2018/4/2/115)
related to asm-goto support. It would be greatly preferable to avoid another confrontation related to performance).

Starting from a known good position and enabling desirable optimizations that we have
decided are valid is a significantly better path than starting with “you might get lucky”
and disabling extra optimizations that are reported to us.

From Linux kernel maintainers POV, Clang built kernels are already “getting lucky”. So I am not too
worried about missing a few cases.
I’ll be glad to fix the missing cases whenever reported.

I also have some other concerns with address spaces e.g. how to pick a safe address space not used by
any target e.g. many targets (in and out-of-tree) actively use non-zero address spaces.

User code can also specify any address space via attribute((address_space(N))) so mixing object
files will be tricky in such cases.

I hope that I have made the case for not using address spaces. Not that I am against address spaces but
losing performance and sanitizers support is a deal breaker.

Thanks
-Manoj

In addition, It is already not easy to convince Linux Kernel maintainers to
accept clang specific patches.
Worse performance when compared to GCC may make it even harder to push more
patches.
(There are already many complains about clang not supporting optimizations
that Linux kernel is used to.
As a side note: x86 maintainers deliberately broke clang support in upstream
(https://lkml.org/lkml/2018/4/2/115)

Yeah, elements of the Linux community seem actively hostile to Clang.
We shouldn't let their hostility dictate our technical policy.

I hope that I have made the case for not using address spaces.

You've certainly argued against them. You haven't provided adequate
semantics for IR without them though.

Cheers.

Tim.

Hi Tim,

You've certainly argued against them. You haven't provided adequate
semantics for IR without them though.

I am thinking to use the approach suggested by Eli previously.
i.e. to use a function attribute, lets say "null-pointer-is-valid".

Thanks,
Manoj

> In addition, It is already not easy to convince Linux Kernel

maintainers to

> accept clang specific patches.
> Worse performance when compared to GCC may make it even harder to push

more

> patches.
> (There are already many complains about clang not supporting

optimizations

> that Linux kernel is used to.
> As a side note: x86 maintainers deliberately broke clang support in

upstream

My understanding is we only want to disable the optimizations regarding
undefined behavior
related to null pointer deference optimizations. And address space checks
will end up
disabling more optimizations than needed.

[Repeating what others have already mentioned on this thread]

I this it is better to avoid framing this as "disable optimizations that exploit
UB" and instead frame this as "define some behavior that was undefined before".

I did look at some of the optimizations/transforms and there are some that
we definitely want to keep.

Just a quick example from grepping:
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
...........
             // Don't create memset_pattern16s with address spaces.
             StorePtr->getType()->getPointerAddressSpace() == 0 &&
             (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
    // It looks like we can use PatternValue!
    return LegalStoreKind::MemsetPattern;
  }

Even worse, Sanitizers do NOT work with address spaces which is a big deal
breaker IMO.

IMO fixing these seems less engineering overhead in the long term than
introducing
yet another knob to the IR.

More importantly, we should _not_ be doing these optimizations without auditing
them individually. For instance with -fno-delete-null-pointer checks, it isn't
okay to change a memset loop to a call to llvm.memset unless we've ensured llvm
DTRT for llvm.memset and null checks (i.e. the null check in "llvm.memset(ptr,
...); if (!ptr) {}" does not get optimized away).

Since address spaces and null pointers are really orthogonal issues, I would
prefer to
not conflate them.

I'm not sure I agree with this. Address spaces are a mechanism to provide
additional semantics to pointers. In this case the additional property we
want is "address 0 may be dereferenceable", and using address spaces seems
appropriate.

In addition, It is already not easy to convince Linux Kernel maintainers to
accept clang specific patches.

Maybe I'm missing something, but I thought in this address space scheme we'd
still provide a -fno-delete-null-ptr-checks flag -- it is clang that would mark
pointers with address space n in this mode.

From Linux kernel maintainers POV, Clang built kernels are already "getting
lucky". So I am not too
worried about missing a few cases.
I'll be glad to fix the missing cases whenever reported.

It seems to me that adding back missing (but correct) optimizations when
reported is better than removing existing (but incorrect) optimizations when
reported. If I were a kernel developer (which I am not) I'd rather have a
kernel that boots slower than a security vulnerability.

I also have some other concerns with address spaces e.g. how to pick a safe
address space not used by
any target e.g. many targets (in and out-of-tree) actively use non-zero
address spaces.
User code can also specify any address space via
__attribute__((address_space(N))) so mixing object
files will be tricky in such cases.

I think for that we can start a thread on cfe-dev and llvm-dev about reserving
address spaces. While at it, we should reserve a range of address spaces for
LLVM middle-end use so that we can do more things like
-fno-delete-null-pointer-checks without worrying about address space clashes.

-- Sanjoy

The LLVM address space design has pushed well beyond the sensible boundaries
of less-is-more and really needs some concerted effort to actually define the expected
properties of different address spaces instead of a dozen different engineers applying
a "don't do this optimization if the pointer is in a non-zero address space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of address-space hack
to support this request. We'd just need a very simple audit of the places that check
the "are dereferences of the zero address undefined behavior" bit to make sure that
they honor it even in address space 0. But instead that audit will be confused by a
thousand places that just bail out for non-zero address spaces without further
explanation.

Such a design would hopefully include reserving ranges of address spaces for different
purposes. Clang is already perfectly capable of remapping address space numbers
from the user-facing address_space attribute when generating IR.

John.

Personally I don’t think adding a new address space to express the idea that loads and stores may successfully dereference address zero is the best design.

The existing code to support the handling of address spaces does not inspire a lot of confidence. I am a simple non-GPU compiler developer. I have no idea how address spaces are supposed to work. When I read code that deals with them, I assume nothing about address spaces. I treat them conservatively. They are a black box, I ignore them.

This may sound bad, but this is normal. We all have different reasons to use and contribute to LLVM, and as much as we benefit by sharing designs and development resources as possible, sometimes we end up creating and playing in our own sandboxes and ignoring possible design space overlap with other users. A small amount of duplication of design and effort is OK if it allows people to make forward progress independently. I think this is an area where we want to do that.

I think we can bear the code complexity cost of having to check for both address spaces and this option, and I think both parties (address space users and -fno-delete-null-checks users) are willing to audit LLVM twice for these transformations.

Thanks everyone for the comments,

I actually have a working patch now for "-fno-delete-null-pointer-checks"
option based on the function attribute.
It is not a *huge* change and I only had to change a few places where IR
decision were based on null pointer references (Though I may have missed
some places).

Will be sending the patches for review after cleanup and testing.

-Manoj

Personally I don't think adding a new address space to express the idea

that loads and stores may successfully dereference address zero is the best
design.

The existing code to support the handling of address spaces does not

inspire a lot of confidence. I am a simple non-GPU compiler developer. I
have no idea how address spaces are supposed to work. When I read code that
deals with them, I assume nothing about address spaces. I treat them
conservatively. They are a black box, I ignore them.

This may sound bad, but this is normal. We all have different reasons to

use and contribute to LLVM, and as much as we benefit by sharing designs
and development resources as possible, sometimes we end up creating and
playing in our own sandboxes and ignoring possible design space overlap
with other users. A small amount of duplication of design and effort is OK
if it allows people to make forward progress independently. I think this is
an area where we want to do that.

I think we can bear the code complexity cost of having to check for both

address spaces and this option, and I think both parties (address space
users and -fno-delete-null-checks users) are willing to audit LLVM twice
for these transformations.

> My understanding is we only want to disable the optimizations

regarding

> undefined behavior
> related to null pointer deference optimizations. And address space

checks

> will end up
> disabling more optimizations than needed.

[Repeating what others have already mentioned on this thread]

I this it is better to avoid framing this as "disable optimizations that

exploit

UB" and instead frame this as "define some behavior that was undefined

before".

> I did look at some of the optimizations/transforms and there are some

that

> we definitely want to keep.
>
> Just a quick example from grepping:
> lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> ...........
> // Don't create memset_pattern16s with address spaces.
> StorePtr->getType()->getPointerAddressSpace() == 0 &&
> (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
> // It looks like we can use PatternValue!
> return LegalStoreKind::MemsetPattern;
> }
>
> Even worse, Sanitizers do NOT work with address spaces which is a big

deal

> breaker IMO.

IMO fixing these seems less engineering overhead in the long term than
introducing
yet another knob to the IR.

More importantly, we should _not_ be doing these optimizations without

auditing

them individually. For instance with -fno-delete-null-pointer checks,

it isn't

okay to change a memset loop to a call to llvm.memset unless we've

ensured llvm

DTRT for llvm.memset and null checks (i.e. the null check in

"llvm.memset(ptr,

...); if (!ptr) {}" does not get optimized away).

> Since address spaces and null pointers are really orthogonal issues, I

would

> prefer to
> not conflate them.

I'm not sure I agree with this. Address spaces are a mechanism to

provide

additional semantics to pointers. In this case the additional property

we

want is "address 0 may be dereferenceable", and using address spaces

seems

appropriate.

> In addition, It is already not easy to convince Linux Kernel

maintainers to

> accept clang specific patches.

Maybe I'm missing something, but I thought in this address space scheme

we'd

still provide a -fno-delete-null-ptr-checks flag -- it is clang that

would mark

pointers with address space n in this mode.

> From Linux kernel maintainers POV, Clang built kernels are already

"getting

> lucky". So I am not too
> worried about missing a few cases.
> I'll be glad to fix the missing cases whenever reported.

It seems to me that adding back missing (but correct) optimizations when
reported is better than removing existing (but incorrect) optimizations

when

reported. If I were a kernel developer (which I am not) I'd rather have

a

kernel that boots slower than a security vulnerability.

> I also have some other concerns with address spaces e.g. how to pick a

safe

> address space not used by
> any target e.g. many targets (in and out-of-tree) actively use

non-zero

> address spaces.
> User code can also specify any address space via
> __attribute__((address_space(N))) so mixing object
> files will be tricky in such cases.

I think for that we can start a thread on cfe-dev and llvm-dev about

reserving

address spaces. While at it, we should reserve a range of address

spaces for

LLVM middle-end use so that we can do more things like
-fno-delete-null-pointer-checks without worrying about address space

clashes.

[I want to preface this by saying that given I was late to this
discussion, I'd be fine if we proceed with a function attribute based
solution despite my objections]

Personally I don't think adding a new address space to express the idea that
loads and stores may successfully dereference address zero is the best
design.

The existing code to support the handling of address spaces does not inspire
a lot of confidence. I am a simple non-GPU compiler developer. I have no
idea how address spaces are supposed to work. When I read code that deals
with them, I assume nothing about address spaces. I treat them
conservatively. They are a black box, I ignore them.

As an aside, there are non-GPU users of address spaces too. For
instance, Azul uses (or at least used to when I was there :slight_smile: )
represent GC-able pointers using address space 1, and we spent a lot
of time ensuring that all of the normal scalar optimizations kicked in
for address space 1 pointers (though I'm not sure if all of those
changes are upstream yet).

In any case, while I agree with your assessment of the problem, I
don't agree with the proposed solution. "not everyone (wants to)
understands x" should not be a reason to re-invent a subset of x in
some other form. Instead we should double down making the existing
thing more clearly and explicitly specified so that understanding it
is easy!

Of course, there are boundaries. E.g. to me it isn't a big deal if a
backend wants to introduce an intrinsic or an attribute for
convenience despite it overlapping with existing things. However,
here we're talking injecting new behavior into core LLVM passes so the
scope is larger.

This may sound bad, but this is normal. We all have different reasons to use
and contribute to LLVM, and as much as we benefit by sharing designs and
development resources as possible, sometimes we end up creating and playing
in our own sandboxes and ignoring possible design space overlap with other
users. A small amount of duplication of design and effort is OK if it allows
people to make forward progress independently. I think this is an area where
we want to do that.

To me, his does not look like a small amount of duplication. For
instance, we have to answer questions like what happens when we inline
a function marked "null-ptr-is-dereferenceable" into a regular
function (my current understanding is that we can't inline them) and
whether malloc() can return null or not. I'd rather answer these
kinds of questions in one place if possible.

-- Sanjoy

I agree. The pattern of bailing out if AddrSpace != 0 is unfortunate.

We also need to cap the amount of extra semantics that can be put on address
spaces. For instance, we should probably never support trapping semantics on
loads/stores, even via address spaces.

-- Sanjoy

I would say instead that address spaces are not the right way to support trapping
semantics on loads/stores.

John.

hardcoded address space numbers entirely? Abstractly it seems like an extra
layer of indirection here could help (eg, we could allow IR to define
address spaces that can be part of pointer types, and those could nominate
which target-specific pointer representation and interpretation they use --
perhaps by name instead of by number -- as well as other properties such as
whether zero is a valid address).

Hi John,

I might be misunderstanding the thread here, but are there architectures other than Intel that support alternative address spaces? I’m asking because x86_64 dropped support for having the code, data, stack, and “extra” segments be different from each other; and the only two remaining segment registers, “FS” and “GS”, are only used in practice to alias the current address space. In fact, user-space instructions were later added to read/write the FS/GS segment bases, thus embracing the fact that these segment registers are used in practice to alias the current address space.[1]

I don’t think LLVM needs to model FS/GS as anything other than aliases into the existing address space.

Dave

[1] – Note, these new user-space instructions require permission from the kernel to execute, and popular kernels haven’t enabled them. Last I knew, the Linux folks seem receptive to the idea of enabling these instructions, but the conversation keeps stalling on implementation details.

The LLVM address space design has pushed well beyond the sensible
boundaries
of less-is-more and really needs some concerted effort to actually define
the expected
properties of different address spaces instead of a dozen different
engineers applying
a "don't do this optimization if the pointer is in a non-zero address
space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of
address-space hack
to support this request. We'd just need a very simple audit of the places
that check
the "are dereferences of the zero address undefined behavior" bit to make
sure that
they honor it even in address space 0. But instead that audit will be
confused by a
thousand places that just bail out for non-zero address spaces without
further
explanation.

I agree. The pattern of bailing out if AddrSpace != 0 is unfortunate.

We also need to cap the amount of extra semantics that can be put on
address
spaces. For instance, we should probably never support trapping semantics
on
loads/stores, even via address spaces.

I would say instead that address spaces are not the right way to support
trapping
semantics on loads/stores.

Hi John,

I might be misunderstanding the thread here, but are there architectures
other than Intel that support alternative address spaces?

Yes, they're also used by GPU targets IIUC.