[RFC] Requiring explicit address space arguments for PointerType

Hello all,

I recently finished merging the last 3.5 months of upstream changes
into our CHERI LLVM fork and would like to suggest something to both
simplify our future merges and also avoid bugs for targets such as AVR
that use non-zero pointer address spaces.

We make extensive use of address spaces: all our pointers (CHERI
capabilities) use address space 200 instead of the default zero to
ensure that we generated CHERI instructions instead of integer
arithmetic.
The problem we have is that almost every use of
PointerType::getUnqual() or ->getPointerTo() without an address space
parameter will result in an assertion failure/"cannot select" or even
a miscompilation due to creating an addrspace(0) pointer.
After every upstream merge I have to fix the new uses that have been
introduced since the last merge and I get lots of merge conflicts if
code that we have modified is moved around.

In our fork I have conditionally removed the default zero value for
address space parameters to catch these issues at compile time instead
of run time. Unfortunately, PointerType::getUnqual() and
getPointerTo()
are used so frequently that I wasn't able to just remove the default
argument, but had to resort to conditionally allowing it and disabling
it on a directory-by-directory granularity by setting an #ifdef
So far I've changed all default-zero calls for clang and some parts of
LLVM, but we haven't touched passes/targets that we don't add CHERI
support for [1].

I have uploaded these changes as ⚙ D78491 Avoid relying on address space zero default parameter in llvm/IR which
effectively replaces
`unsigned AddressSpace = 0` with LLVM_DEFAULT_AS_PARAM(AddressSpace).
This macro expands to an argument with a default of zero unless
LLVM_NO_IMPLICIT_ADDRESS_SPACE is defined, in which case the argument
is required.
This allows incrementally removing the default zero from individual
files/directories by adding
add_definitions(-DLLVM_NO_IMPLICIT_ADDRESS_SPACE=1) to a single
CMakeLists.txt and eventually doing it for the root CMakeLists.txt.

If this sounds like a reasonable approach for avoiding default-zero
address spaces, I'd like to propose the following steps:

1. Clean up and commit ⚙ D78491 Avoid relying on address space zero default parameter in llvm/IR and the
dependency https://reviews.llvm.orgD70947

2. Start migrating more directories by adding more
add_definitions(-DLLVM_NO_IMPLICIT_ADDRESS_SPACE=1) lines.

3. Set -DLLVM_NO_IMPLICIT_ADDRESS_SPACE=1 in the LLVM and Clang root
CMakeLists.txt to prevent future uses of PointerType::getUnqual()
inside LLVM/Clang.

(4?) Change the behaviour from opt-in to opt-out by requiring
something like -DLLVM_IMPLICIT_DEFAULT_ADDRESS_SPACE=0.

(5?) Depending on how we feel about out-of-tree consumers, remove the
LLVM_DEFAULT_AS_PARAM(AddressSpace) macros and require the argument to
always be present. Adding the argument is compatible with older LLVM
versions so it should be possible to do so without version #ifdefs. I
think we should probably reduce the amount of changes required by
out-of-tree consumers and keep the optional default, but it would
obviously be nice to eventually remove the #ifdefs from inside LLVM.

Does this sound like a sensible approach? Should I also attempt to
implement steps 4 and 5 or is 1-3 sufficient?

I would be great if we could land this upstream as it will
significantly reduce the maintenance burden for our CHERI LLVM and
should also prevent AVR issues such as e.g.
https://reviews.llvm.org/rG215dc2e203341f7d1edc4c4a191b048af4ace43d

Thanks,
Alex

[1] CHERI is currently available for MIPS and RISC-V and will soon be
available on the upcoming Arm Morello board
(Morello Program – Arm®).

Hi Alex,

I'd be very much in favor of this, thanks for pushing :wink:

(5?) Depending on how we feel about out-of-tree consumers, remove the
LLVM_DEFAULT_AS_PARAM(AddressSpace) macros and require the argument to
always be present. Adding the argument is compatible with older LLVM
versions so it should be possible to do so without version #ifdefs. I
think we should probably reduce the amount of changes required by
out-of-tree consumers and keep the optional default, but it would
obviously be nice to eventually remove the #ifdefs from inside LLVM.

As you say, since consumers can be changed in a way where the new
consumer version still builds against older LLVM versions, I think
this is reasonable. It would be great to find a way where consumers
get deprecation warnings for some period of time, but I can't think of
an easy way to do that.

Cheers,
Nicolai

I agree, improving this makes sense.

Alexander, I don’t think that “LLVM_DEFAULT_AS_PARAM” is the right way to go, I would just remove the “ = 0” default parameter entirely.

-Chris

I agree, improving this makes sense.

Alexander, I don’t think that “LLVM_DEFAULT_AS_PARAM” is the right way to go, I would just remove the “ = 0” default parameter entirely.

+1 on requiring everyone to provide the AS all the time. Something similar helped to remove alignment problems and confusion not too long ago.

Hi Chris,

While I'd love to simply remove the "= 0" parameter immediately, there
are (after my initial patch to remove if from llvm/IR) currently at
least 275 uses of getPointerTo() and 195 PointerType::getUnqual().
Unfortunately, it's not always obvious what the right address space
should be. Should it be a pointer to code/globals/the stack?

The reason I used the ugly LLVM_DEFAULT_AS_PARAM approach is so that I
can incrementally remove those uses from certain directories and
prevent new uses from being added.
Preventing new uses is extremely important for our downstream fork,
but not so much for most targets currently in upstream LLVM.

Therefore, I will remove the =0 locally and submit incremental patches
to remove those uses instead of using the LLVM_DEFAULT_AS_PARAM
approach.
Hopefully there won't be many new uses of the old APIs being added in
the meantime.
Once this is complete I'll add new deprecated overloads that still
have the =0, to allow gradual migration for downstreams and remove
them after a few weeks/months/when the next release branch is created.
Does that sound like a better approach?

Alex

Hi Chris,

While I'd love to simply remove the "= 0" parameter immediately, there
are (after my initial patch to remove if from llvm/IR) currently at
least 275 uses of getPointerTo() and 195 PointerType::getUnqual().
Unfortunately, it's not always obvious what the right address space
should be. Should it be a pointer to code/globals/the stack?

The reason I used the ugly LLVM_DEFAULT_AS_PARAM approach is so that I
can incrementally remove those uses from certain directories and
prevent new uses from being added.
Preventing new uses is extremely important for our downstream fork,
but not so much for most targets currently in upstream LLVM.

Makes sense. The macro isn’t really appropriate for upstream, but I can see how it is super useful for your purposes.

Therefore, I will remove the =0 locally and submit incremental patches
to remove those uses instead of using the LLVM_DEFAULT_AS_PARAM
approach.

This is a great approach.

Hopefully there won't be many new uses of the old APIs being added in
the meantime.
Once this is complete I'll add new deprecated overloads that still
have the =0, to allow gradual migration for downstreams and remove
them after a few weeks/months/when the next release branch is created.
Does that sound like a better approach?

Yes, this sounds awesome. Thank you Alexander!

-Chris