[RFC] C++17 hardware constructive / destructive interference size

Hi atomic fans :metal::atom_symbol::love_you_gesture: (and non-fans I guess),

C++17 adds support for hardware destructive / constructive interference size constexpr values.

  • cppreference: https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size
  • Paper with motivation: http://wg21.link/P0154
    I volunteer to implement the necessary bits to support this in clang and libc++, and to give them proper values for current ARM and x86 processors. I’ve discussed this plan with other libc++ folks as well as libstdc++ / GCC folks, and we plan to implement the same builtins in both toolchains as well as adopt the same constexpr values wherever possible to keep ABIs compatible.

Under this plan, ARM and x86 will properly expose the new values in libc++, and other targets will automagically expose these values in C++ when they’re updated with target-specific values in their target tablegen file. After a while targets that haven’t settled on values will fail that one libc++ C++17 conformance test (for now the test will only check targets which expose the builtin).

FWIW MSVC already exposes this, but since they support fewer targets they decided on what everyone knows the right value is to expose: 64B. We’re not so fortunate, so bear with me as I propose a plan:

1. Standard library support

Add the following code to header :

#if (__cplusplus >= 201703L) && __has_builtin(__builtin_hardware_destructive_interference_size) && __has_builtin(__builtin_hardware_constructive_interference_size)
inline constexpr std::size_t hardware_destructive_interference_size = __builtin_hardware_destructive_interference_size();
inline constexpr std::size_t hardware_constructive_interference_size = __builtin_hardware_constructive_interference_size();
#endif

Add corresponding tests which ensure that both values are at least alignof(std::max_align_t), and are constexpr. Conditionalize these tests on the same __has_builtin test for now. File a bug and leave a FIXME to move the test to just #if __cplusplus >= 201703L once targets have adopted this. libc++ will keep the __has_builtin test so that it’ll compile just fine even if the builtin ins’t defined, it just won’t expose the values (so user code will only fail if they try to use these values).

2. Compiler support

  1. Teach the target infrastructure that hardware interference size is something they can specify (in tablegen files somewhere).
  2. Allow overriding the value in sub-targets using -march or -mcpu (the sub-target defines the numeric value, and the user gets the overriden one by using -march or -mcpu).
  3. Allow overriding the value (or defining, if the target doesn’t already) on the command line using flags -mhardware-destructive-interference-size and -mhardware-constructive-interference-size. Initially I thought we’d go with -mattr, but those don’t really allow values being passed.
  4. In clang, if these properties are set, expose the builtin. Don’t expose a builtin if the value is not set by the target or on the command-line, such that the STL won’t expose a random value. I’ll expose them even if we’re in pre-C++17 mode, because they’re builtins and libc++ only exposes the constexpr value if we’re in C++17 or later.
  5. For generic le32 / be32 ARM targets expose constructive / destructive as 64B.
  6. For generic le64 / be64 ARM targets expose constructive as 64B and destructive as 128B.
  7. For generic x86 expose constructive / destructive as 64B.
  8. Honor existing sub-target preferences (AFAICT x86 doesn’t have any, ARM has some in AArch64Subtarget::initializeProperties). These override the generic ones above.
  9. Leave other targets as-is for now, since I can’t test them and I don’t know what the appropriate values would be. Hopefully this RFC will elicit feedback as to what the appropriate values are.

What do y’all think?

Thanks,

JF

We can’t change the value based on -mcpu. We generally allow mixing code built with different values of -mcpu. And any code which is linked together must use the same value for hardware_destructive_interference_size, or else we violate ODR. -Eli

What does “as is” mean? I think that we should not define values at all until someone with knowledge of the target sets values. These become part of the target ABI, and I don’t think that we want to accidentally make an ABI choice for a target. -Hal

We can’t change the value based on -mcpu. We generally allow mixing code built with different values of -mcpu. And any code which is linked together must use the same value for hardware_destructive_interference_size, or else we violate ODR.

Interesting point. The case I’d like to cover is one where the developer wants to get the exact right value for their particular CPU, instead of a conservative answer with extra padding. How do you think we should meet this use case?

What does “as is” mean? I think that we should not define values at all until someone with knowledge of the target sets values. These become part of the target ABI, and I don’t think that we want to accidentally make an ABI choice for a target.

Targets other than ARM and x86 would, for now, not define the builtin at all. I think that meets exactly your point, or at least my intent was do do exactly as you want.

Go back to the standards committee and ask for a function that isn’t constexpr? I can’t think of any other reasonable solution. -Eli

Unfortunately, to define structure layouts they need to be constant. The best solution I’ve thought of is to extend the abi_tag support to force the mangling of interfaces depending on values of these constructs to be different. -Hal

Hi atomic fans :metal::atom_symbol::love_you_gesture: (and non-fans I guess),

C++17 adds support for hardware destructive / constructive interference
size constexpr values.

   - cppreference: https://en.cppreference.com/w/cpp/thread/
   hardware_destructive_interference_size
   <https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size>
   - Paper with motivation: http://wg21.link/P0154

I volunteer to implement the necessary bits to support this in clang and
libc++, and to give them proper values for current ARM and x86 processors.
I’ve discussed this plan with other libc++ folks as well as libstdc++ / GCC
folks, and we plan to implement the same builtins in both toolchains as
well as adopt the same constexpr values wherever possible to keep ABIs
compatible.

Under this plan, ARM and x86 will properly expose the new values in
libc++, and other targets will automagically expose these values in C++
when they’re updated with target-specific values in their target tablegen
file. After a while targets that haven’t settled on values will fail that
one libc++ C++17 conformance test (for now the test will only check targets
which expose the builtin).

FWIW MSVC already exposes this, but since they support fewer targets they
decided on what everyone knows the right value is to expose: 64B. We’re not
so fortunate, so bear with me as I propose a plan:

*1. Standard library support*

Add the following code to header <new>:

#if (__cplusplus >= 201703L) && __has_builtin(__builtin_
hardware_destructive_interference_size) && __has_builtin(__builtin_
hardware_constructive_interference_size)
inline constexpr std::size_t hardware_destructive_interference_size =
__builtin_hardware_destructive_interference_size();
inline constexpr std::size_t hardware_constructive_interference_size =
__builtin_hardware_constructive_interference_size();
#endif

Add corresponding tests which ensure that both values are at least
alignof(std::max_align_t), and are constexpr. Conditionalize these tests
on the same __has_builtin test for now. File a bug and leave a FIXME to
move the test to just #if __cplusplus >= 201703L once targets have
adopted this. libc++ will keep the __has_builtin test so that it’ll compile
just fine even if the builtin ins’t defined, it just won’t expose the
values (so user code will only fail if they try to use these values).

Why do you propose modeling these as builtin functions rather than as
preprocessor defines? The latter is how we model every other similar
property.

*2. Compiler support*

   1. Teach the target infrastructure that hardware interference size is
   something they can specify (in tablegen files somewhere).
   2. Allow overriding the value in sub-targets using -march or -mcpu
   (the sub-target defines the numeric value, and the user gets the overriden
   one by using -march or -mcpu).

We can't change the value based on -mcpu. We generally allow mixing code
built with different values of -mcpu. And any code which is linked
together must use the same value for hardware_destructive_interference_size,
or else we violate ODR.

Interesting point. The case I’d like to cover is one where the developer
wants to get the exact right value for their particular CPU, instead of a
conservative answer with extra padding. How do you think we should meet
this use case?

Go back to the standards committee and ask for a function that isn't
constexpr? I can't think of any other reasonable solution.

Unfortunately, to define structure layouts they need to be constant.

The best solution I've thought of is to extend the abi_tag support to
force the mangling of interfaces depending on values of these constructs to
be different.

abi_tag is not an effective way of maintaining ABI, because it needs to be
"viral" / transitive, and can't be (at least, not without huge developer
effort).

Perhaps we could add an attribute
to hardware_{con,de}structive_interference_size that produces a warning if
they are used outside the main source file? We'd also need to make them
non-inline, which is an observable conformance break, but seems unlikely to
be important compared to the other issues.

We discussed both, nobody really cared either way, so builtins won. Macro is fine as well.

Interesting. I had thought that abi_tag was transitive. It occurs to me that Transitive ABI Infection Mechanism (TAIM) has a reasonable acronym. :slight_smile: - I suspect that’s what we need in this case. I thought about suggesting this, but didn’t, because I suspect that many/most uses will be in header files, just project-internal header files (because they’ll be defining structure layouts, padding arrays, etc.). I think that such a warning will be pretty noisy, unfortunately. Good point. Do you think that we should file a DR about this? I imagine that most everyone is going to be in the same boat in this regard. -Hal

Interesting. I had thought that abi_tag was transitive. It occurs to me that Transitive ABI Infection Mechanism (TAIM) has a reasonable acronym. :slight_smile: - I suspect that’s what we need in this case. I thought about suggesting this, but didn’t, because I suspect that many/most uses will be in header files, just project-internal header files (because they’ll be defining structure layouts, padding arrays, etc.). I think that such a warning will be pretty noisy, unfortunately. Good point. Do you think that we should file a DR about this? I imagine that most everyone is going to be in the same boat in this regard.

MSVC only ever sets it to 64, so they don’t have this issue.

That was the original idea, but halfway through implementing it, they realized that forward declarations are a thing.

Thus it is unfortunately not possible to infer the ABI-name of a struct from its contents.

   1. Teach the target infrastructure that hardware interference size is
   something they can specify (in tablegen files somewhere).
   2. Allow overriding the value in sub-targets using -march or -mcpu
   (the sub-target defines the numeric value, and the user gets the overriden
   one by using -march or -mcpu).

We can't change the value based on -mcpu. We generally allow mixing code
built with different values of -mcpu. And any code which is linked
together must use the same value for hardware_destructive_interference_size,
or else we violate ODR.

Interesting point. The case I’d like to cover is one where the developer
wants to get the exact right value for their particular CPU, instead of a
conservative answer with extra padding. How do you think we should meet
this use case?

Go back to the standards committee and ask for a function that isn't
constexpr? I can't think of any other reasonable solution.

Unfortunately, to define structure layouts they need to be constant.

The best solution I've thought of is to extend the abi_tag support to
force the mangling of interfaces depending on values of these constructs to
be different.

abi_tag is not an effective way of maintaining ABI, because it needs to be
"viral" / transitive, and can't be (at least, not without huge developer
effort).

Interesting. I had thought that abi_tag was transitive.

It occurs to me that Transitive ABI Infection Mechanism (TAIM) has a
reasonable acronym. :slight_smile: - I suspect that's what we need in this case.

That's not possible, because classes can be forward-declared, and you need
to know what fields and base classes they have to transitively propagate
the taint.

GCC tries make it possible to transitively propagate the taint manually:
they have a warning for a type that uses a tainted type and isn't itself
declared with the abi_tag attribute. But in practice (at least for the
"C++11 ABI" abi_tag) the taint ends up affecting sufficiently many classes
as to make using the attribute that way infeasible to all but the most
dedicated.

Perhaps we could add an attribute to
hardware_{con,de}structive_interference_size

that produces a warning if they are used outside the main source file?

I thought about suggesting this, but didn't, because I suspect that
many/most uses will be in header files, just project-internal header files
(because they'll be defining structure layouts, padding arrays, etc.). I
think that such a warning will be pretty noisy, unfortunately.

We'd also need to make them non-inline, which is an observable conformance
break, but seems unlikely to be important compared to the other issues.

Good point. Do you think that we should file a DR about this? I imagine
that most everyone is going to be in the same boat in this regard.

Yes, we probably should. Making them non-inline would presumably mean that
we can tell the user it's their fault if the value differs between
translation units and they use it in an ODR-relevant context. That doesn't
solve the problem, but it does make it not our problem to solve, to a
certain extent. We're still left with this being a dangerous feature, but I
think that's really unavoidable if we want these things to be compile-time
constants.

It’d certainly be unreasonable for a C++ stdlib to promise that the CPU’s cache layout will never change. And, IMO, we’d be better off not implementing these functions, rather than trying to work around the issue with various warnings/restrictions.

ISTM that user requirements may actually be best served by a non-constexpr function which can safely return the correct value on any platform. The situations where it’s not feasible to use dynamically-allocated memory for such an object seem pretty slim. And if you’re dynamically allocating memory, passing a runtime-constant alignment to the allocator is fine.

I disagree, we need a constexpr value. The original paper has an example, from your own employer, which you’ll find copy / pasted all over the various forks of base/. Plenty of other projects just hard-code 64.

For both values on all platforms? -Hal

For both values on all platforms?

Yes: https://twitter.com/MalwareMinigun/status/1000114366883155968

For both values on all platforms?

Yes: https://twitter.com/MalwareMinigun/status/1000114366883155968

Oh. Right. :wink: -Hal