AArch64 support

Hi

Does anyone have a patch for AArch64 support? Either experimental, complete or notes/suggestions.

Thanks

./C

So here's a 1st draft for review - I don't expect it to be clean on the 1st
pass, but getting some review would be really appreciated.

btw - Is anyone testing this on ARM or PPC64 regularly?

Thanks

openmp-aarch64.diff (39 KB)

+if("${arch}" STREQUAL “32e” OR “${arch}” STREQUAL “32” OR “${arch}” STREQUAL “arm” OR “${arch}” STREQUAL “aarch64”)

set(FEATURE_FLAGS “${FEATURE_FLAGS} -D KMP_USE_ADAPTIVE_LOCKS=1”)

set(FEATURE_FLAGS “${FEATURE_FLAGS} -D KMP_DEBUG_ADAPTIVE_LOCKS=0”)

endif()

Do you really want the adaptive locks for arm and aarch64?

Since they use the Intel specific transactional synchronization extensions (TSX) it seems unlikely to me.

Otherwise LGTM (though I have no testing environment for it).

– Jim

James Cownie james.h.cownie@intel.com
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)

Tel: +44 117 9071438

-if(NOT arch)

  • if("${CMAKE_SIZEOF_VOID_P}" STREQUAL “4”)

  • set(arch 32 CACHE STRING “The architecture to build for (32e/32/arm/ppc64). 32e is Intel(R) 64 architecture, 32 is IA-32 architecture”)

  • else()

  • set(arch 32e CACHE STRING “The architecture to build for (32e/32/arm/ppc64). 32e is Intel(R) 64 architecture, 32 is IA-32 architecture”)

  • endif()

+#if(NOT arch)

+# if("${CMAKE_SIZEOF_VOID_P}" STREQUAL “4”)

+# set(arch 32 CACHE STRING “The architecture to build for (32e/32/arm/ppc64/aarch64). 32e is Intel(R) 64 architecture, 32 is IA-32 architecture”)

+# else()

+# set(arch 32e CACHE STRING “The architecture to build for (32e/32/arm/ppc64/aarch64). 32e is Intel(R) 64 architecture, 32 is IA-32 architecture”)

+# endif()

+#endif()

We can get rid of the if(NOT arch) line, this guard is unnecessary.

When CMake is first run on a build directory, it creates a CMakeCache.txt file inside the build directory which is the configuration of your build. Any variable in the CMakeLists.txt file which is set with the CACHE option (along with a type and docstring) is either set from the command line via –DVARIABLE=value or from this default value in the CMakeLists.txt file. The –DVARIABLE=value takes precedence over what is in the CMakeLists.txt file in the initial CMake run. On subsequent runs of CMake, VARIABLE is set FROM THE CMakeCache.txt file. You can change this value in CMakeCache.txt manually or with:

cmake –DVARIABLE=value …

In summary, the first run of CMake sets the initial value of arch either with –Darch=value, or by the logic in the CMakeLists.txt file. After that initial run, it grabs the value of arch from CMakeCache.txt. An example might help illustrate what I’m saying:

[ This assumes you don’t blast away the build directory between cmake runs ]

  1. cmake –Darch=arm (Rest of your variables) … [ This will override what is in CMakeLists.txt and put arm in CMakeCache.txt and set arch to arm for the initial CMake run)

  1. cmake –Dsome_other_variable=value … [ This will use the arch value stored in CMakeCache.txt (currently arm) ]

  1. cmake –Darch=32e … [ This will set the value in CMakeCache.txt to 32e and use 32e during this third run of CMake ]

  1. vi CMakeCache.txt (edit arch to be ppc64)

  2. make (This call to GNU Make will recall cmake but use value of arch in CMakeCache.txt which is ppc64)

– Johnny

ack - I saw a lot of things I'd like to clean-up, but I didn't touch them
all. I'll include this clean-up in my patch.

Thanks

C Bergström,

// (IA-32 architecture) or 64-bit signed (Intel(R) 64).

//

-#if KMP_ARCH_X86 || KMP_ARCH_ARM

+#if KMP_ARCH_X86 || KMP_ARCH_ARM || KMP_ARCH_AARCH64

define KMP_DISPATCH_INIT __kmp_aux_dispatch_init_4

define KMP_DISPATCH_FINI_CHUNK __kmp_aux_dispatch_fini_chunk_4

define KMP_DISPATCH_NEXT __kmpc_dispatch_next_4

@@ -239,7 +237,6 @@ xexpand(KMP_API_NAME_GOMP_ORDERED_END)(void)

define KMP_DISPATCH_NEXT_ULL __kmpc_dispatch_next_8u

Here, we don’t believe the KMP_ARCH_AARCH64 is correct because those dispatch routines are for 32-bit architectures. It should be using the 64 bit versions (in the #else part).

– Johnny

It’s been a bit since I had time to look at this again, but based on the previous feedback can you take a look at this patch. If it looks good please feel free to push it. I’ll fix any bugs people hit and also follow-up later with some improvements.

openmp.diff.v2 (39 KB)

Sorry. please ignore the last attachment. Here’s the correct one.

openmp.diff.v3 (40.6 KB)

Here’s an updated patch which applies on top of other changes which I presume will be committed 1st or very soon

aarch64_port_v4.patch (37.9 KB)

ping - it would be great if my patch could get reviewed/pushed. I realize it’s the holidays, but I’m trying to use this time to get a buildbot setup and using upstream.

As you say, it is holiday season and many people (including Johnny) are on vacation.

I am nervous about committing a change and then immediately going on holiday myself, (especially when the ball was in your court on these patches that are now so urgent for ~6 weeks).

Having a buildbot is great, especially if it is pushing results somewhere public, but surely your buildbot should only need a one line change to choose which repository to check out from, so you can do most of the work to get it running with it looking at your internal tree, and then switch to the LLVM tree easily once it’s all running.

More generally (and not needed for these patches, but so you can see where we’re going), I’d like to tidy up the way we’re handling the multiple architectures to make the code cleaner and porting easier.

At the moment we’re growing long #if chains like

#if KMP_OS_LINUX && (KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64)

which are hard to read and hard to get right.

I’d like to fix that in two ways

  1. Have #defines for the high level, generic architectures, e.g.

KMP_ARCH_X86_ANY, KMP_ARCH_ARM_ANY, KMP_ARCH_PPC_ANY, …

These would then be defined in terms of the possible architectures they represent, something like

#define KMP_ARCH_X86_ANY (KMP_ARCH_X86 || KMP_ARCH_X86_64)

#define KMP_ARCH_ARM_ANY (KMP_ARCH_ARM || KMP_ARCH_AARCH64)

#define KMP_ARCH_PPC_ANY (KMP_ARCH_PPC || KMP_ARCH_PPC64) // or might be PPC32 || PPC?

For PPC we’d also have the endian options one level down within KMP_ARCH_PPC64, which would be

#define KMP_ARCH_PPC64 (KMP_ARCH_PPC64_LE || KMP_ARCH_PPC64_BE)

That would allow us to shorten the long chains like that above, which would then be

#if KMP_OS_LINUX && (KMP_ARCH_X86_ANY || KMP_ARCH_ARM_ANY)

which is somewhat better, but I’d also like to add the second change…

  1. Define some more functional/property style macros, so, for instance, that big if above is actually being used for the concept of “Does this system support futexes?”. I’d rather have a macro for that concept, which could then be defined in the architecture header, something like

#define KMP_HAVE_FUTEX (KMP_OS_LINUX && (KMP_ARCH_X86_ANY || KMP_ARCH_ARM_ANY))

(or, alternatively, which might be better, have a single #if section for each architecture that defines its capabilities as 1 or 0)

The code in kmp_csupport.c that’s worried about futexes would then just have

#if (KMP_HAVE_FUTEX)

This ought to make porting much simpler, since you wouldn’t need to look at every #if in the code that tests an architecture flag to see if your new architecture needs to be added, but just set the appropriate capability macros for the architecture you have and worry about any really machine specific code (which should get caught by a #error case).

p.s. I am out from this evening (3h30 and counting…) and will be back in the office on 23 February. (Sabbatical, yay! No snow in the Alps, boo!). Andrey Churbanov has commit rights, as does Hal. I trust them in my absence.

– Jim

James Cownie james.h.cownie@intel.com
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)

Tel: +44 117 9071438

As you say, it is holiday season and many people (including Johnny) are
on vacation.

I am nervous about committing a change and then immediately going on
holiday myself, (especially when the ball was in your court on these
patches that are now so urgent for ~6 weeks).

Having a buildbot is great, especially if it is pushing results somewhere
public, but surely your buildbot should only need a one line change to
choose which repository to check out from, so you can do most of the work
to get it running with it looking at your internal tree, and then switch to
the LLVM tree easily once it’s all running.

For anything I can review - I do so pretty quickly (certainly not 6 weeks).
While I realize this patch has been outstanding - my "holiday" activities
and "job" are not the same thing. So I was hoping to take care of this in
my "free" time.

The patch had already been reviewed before and the points which needed to
be fixed should be resolved. So maybe I was wrongly a bit impatient or
thinking it was "ready"

More generally (and not needed for these patches, but so you can see where
we’re going), I’d like to tidy up the way we’re handling the multiple
architectures to make the code cleaner and porting easier.

I agree clean-up is needed, but starting a new thread may get better
visibility and allow more discussion. If you resend it I'll see if I can
come up with any constructive ideas. (I am not a fan of #if hell and
strongly prefer feature tests)

I'm fairly confident my patch adds functionality and doesn't break anything
existing.. thus the risk is "low"..

For anything I can review - I do so pretty quickly (certainly not 6 weeks).

Sure, my point was not that your reviews take a long time, but that you sent a previous version of the pacth back in October, which was reviewed fairly rapidly, and what looked like a bug was pointed out. Then there was silence for ~ 6 weeks before these latest revisions. But, really this is all irrelevant, it’s just that had the latest version appeared a few weeks ago more people would have been around to look at it.

While I realize this patch has been outstanding - my “holiday” activities and “job” are not the same thing. So I was hoping to take care of this in my “free” time.

Sure, I’ve done a startup…

– Jim

James Cownie james.h.cownie@intel.com
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)

Tel: +44 117 9071438