Clang-12 fails build on MacOS

It seems there’s a problem with parallel builds, but I don’t really know the cause of the problem. You can see the screen log and the complete build log here: https://trac.macports.org/ticket/63026#comment:5

iMac Pro (Intel Core i9, 10 cores), MacOS Big Sur 11.4, Xcode-12.5.1, clang-12.0.1.

It would be great if this problem is resolved.

Thanks!

The error is basically:
:info:build CMake Error: failed to create symbolic link '/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_lang_llvm-12/clang-12/work/build/projects/compiler-rt/lib/builtins/outline_atomic_helpers.dir/outline_atomic_ldadd2_1.S': file already exists

Looks like this is caused by https://reviews.llvm.org/D93178. It seems that when you build multiple arm64 slice (in this case, arm64 and arm64e), the LSE builtin will be generated twice. In the rare case that when two same `cmake -E create_symlink` running at the same time, you get that error. Before the change, it was compiling using clang so you don't get this weird race condition.

You can probably workaround by manually set `DARWIN_osx_BUILTIN_ARCHS` to remove `arm64e` since you don't really need that. At the meantime, can you file a bug report to bugs.llvm.org <http://bugs.llvm.org/>?

Steven

The error is basically:

:info:build CMake Error: failed to create symbolic link ‘/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_lang_llvm-12/clang-12/work/build/projects/compiler-rt/lib/builtins/outline_atomic_helpers.dir/outline_atomic_ldadd2_1.S’: file already exists

Understood, thanks!

Looks like this is caused by https://reviews.llvm.org/D93178. It seems that when you build multiple arm64 slice (in this case, arm64 and arm64e), the LSE builtin will be generated twice. In the rare case that when two same cmake -E create_symlink running at the same time, you get that error. Before the change, it was compiling using clang so you don’t get this weird race condition.

The problem is – even when I am forcing Macports to not build “fat” binaries (aka, limit the build to x86_64 only), I’m still getting an error.

You can probably workaround by manually set DARWIN_osx_BUILTIN_ARCHS to remove arm64e since you don’t really need that.

Please see above. I don’t think it helped – at least when I’m building via Macports (where what I’m doing should be the same or equivalent to what you’re suggesting). :frowning:

At the meantime, can you file a bug report to bugs.llvm.org?

I was told this reporting mechanism is going away…?

But sure, if you think it makes sense, I’ll file a report there.

Thanks!

The error is basically:
:info:build CMake Error: failed to create symbolic link '/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_lang_llvm-12/clang-12/work/build/projects/compiler-rt/lib/builtins/outline_atomic_helpers.dir/outline_atomic_ldadd2_1.S': file already exists

Understood, thanks!

Looks like this is caused by https://reviews.llvm.org/D93178. It seems that when you build multiple arm64 slice (in this case, arm64 and arm64e), the LSE builtin will be generated twice. In the rare case that when two same `cmake -E create_symlink` running at the same time, you get that error. Before the change, it was compiling using clang so you don't get this weird race condition.

The problem is – even when I am forcing Macports to not build “fat” binaries (aka, limit the build to x86_64 only), I’m still getting an error.

This is not related to the architecture you build but the architecture the compiler supports. When you build x86_64 only, the x86_64 compiler supports building binary for arm64 as well, thus it needs to build compiler_rt fat no matter what CMAKE_OS_ARCHITECTURE you set. The supported architecture is inferred from the SDK you built against, which has arm64(e) support.

Steven

How does the build end up with two commands symlinking to the same location?
Different targets through `LLVM_RUNTIME_TARGETS` should end up with separate build directories. arm64 and arm64e are different targets, right?

Creating the symlinks during CMake configuration should be possible as an alternative.

Ken researched this problem, discovered its cause and suggested a workaround:

It appears that when running a custom_command, cmake does not use dependency tracking and collisions can happen.

“Do not list the output in more than one independent target that may build in parallel or the two instances of the rule may conflict (instead use the add_custom_target() command to drive the command and make the other targets depend on that one).”

https://cmake.org/cmake/help/v3.19/command/add_custom_command.html

The workaround is to add_custom_target a dependency on the output of the custom command.

https://gist.github.com/socantre/7ee63133a0a3a08f3990

It would be great if LLVM maintainers incorporated this workaround, or, at the very least, gave it a try.

Thanks for the info, however I don't think we still understand the root cause. Why do we end up with two instances trying to create the symlink to the same location?

Per my thinking different targets end up with separate build directories thus this shouldn't happen. And since the different runtime builds are sub-builds your proposed dependency tracking solution wouldn't work.

It appears that the proposed workaround had been tested and proven to work.

Which is why I'm asking to give it a try.

Perhaps doing that would help understanding the root cause too.

Regards,
Uri

Darwin build is a bit different from other build. I would look for a fix that avoid that race conditions rather than hard code targets.

Also I don’t know what you mean by giving the workaround a try. The initial workaround I provide is about altering build configuration which is mostly on the user unless you are using the cmake cache in the repo. We also never hit that problem on our side since we always build with ninja and it appears ninja doesn’t schedule the copy close to each other.

Steven

Darwin build is a bit different from other build.

Yes. But it is still defined in what Macports considers “upstream”, aka – you guys. Which is why I’m bringing this issue here.

I would look for a fix that avoid that race conditions rather than hard code targets.

Adding an explicit separate target via “add_custom_target()” eliminates the race conditions.

Also I don’t know what you mean by giving the workaround a try.

I mean – incorporate that solution and confirm for yourself that it works.

The initial workaround I provide is about altering build configuration

which is mostly on the user unless you are using the cmake cache in the

repo. We also never hit that problem on our side since we always build

with ninja and it appears ninja doesn’t schedule the copy close to each other.

Well, with Macports we do not have the luxiry of building with ninja. So, for us it has to stay with CMake.

I rather doubt that the workaround you suggested (which kills Apple M1 builds, if I recall correctly) would be acceptable for Macports, whose goal is to successfully build and run for Intel and M1 platforms.

Hi Uri

You misunderstood my message. I never intended to leave it at the broken state for MacPort but I am leaning towards find a correct solution while I suggest some feasible workaround for you. The workaround I provide early will not break M1 mac support since arm64e architecture is experimental and not ABI stable and it is only meant for security researchers to evaluate the implementation.

After some closer look at the build, it seems that the problem only occurs in makefile build, while ninja doesn't even have the race condition since the create_symlink command only ran once. This might be a limitation/bug on the makefile generator, which it lists create_symlink command in both: projects/compiler-rt/lib/builtins/CMakeFiles/clang_rt.builtins_arm64_osx.dir/build.make and projects/compiler-rt/lib/builtins/CMakeFiles/clang_rt.builtins_arm64e_osx.dir/build.make. I don't know enough about CMake to comment on that.

We also do not have CI system to test makefile support on darwin at all so we might not catch problem of this kind in the future. It might be better to investigate to switch to ninja build to avoid troubles down the road.

Steven

You misunderstood my message.

My apologies, and thank you for clarifying.

I never intended to leave it at the broken state for MacPort but I am leaning towards finding a correct solution while I suggest some feasible workaround for you.

Thank you!

But it seems that adding a custom target in CMake via add_custom_target() would fix the MacPort version, while staying harmless on the other platforms? Or am I missing something…?

The workaround I provide early will not break M1 mac support since arm64e architecture is experimental and not ABI stable and it is only meant for security researchers to evaluate the implementation.

It looks like Macports team would prefer a different workaround.

After some closer look at the build, it seems that the problem only occurs in makefile build, while ninja doesn’t even have the race condition since the create_symlink command only ran once.

Yes, I concur (though the Macports maintainers of the Clang port are the “authoritative” source here, not me :wink:).

This might be a limitation/bug on the makefile generator, which it lists create_symlink command in both: projects/compiler-rt/lib/builtins/CMakeFiles/clang_rt.builtins_arm64_osx.dir/build.make and projects/compiler-rt/lib/builtins/CMakeFiles/clang_rt.builtins_arm64e_osx.dir/build.make. I don’t know enough about CMake to comment on that.

Neither do I. But it looks like that bug (or limitation? Whatever it might be…) gets side-stepped by add_custom_target(). Which is why I’m advocating for applying that workaround now, until (and if!) a better one arrives.

We also do not have CI system to test makefile support on darwin at all so we might not catch problem of this kind in the future. It might be better to investigate to switch to ninja build to avoid troubles down the road.

Understood. Unfortunately, moving to ninja is not my call at all. :frowning:

I suspect that the maintainers are concerned that there already are many dependencies (e.g., full Perl) required to build Clang. Not sure if they’d be happy about pulling ninja in as well – plus, ninja may require other python packages installed to run the build…?

Thanks!

You misunderstood my message.

My apologies, and thank you for clarifying.

I never intended to leave it at the broken state for MacPort but I am leaning towards finding a correct solution while I suggest some feasible workaround for you.

Thank you!
But it seems that adding a custom target in CMake via add_custom_target() would fix the MacPort version, while staying harmless on the other platforms? Or am I missing something…?

Can you propose a patch for that? I will happily review and commit it for you.

Steven

I never intended to leave it at the broken state for MacPort but I am leaning towards finding a correct solution while I suggest some feasible workaround for you.

Thank you!

But it seems that adding a custom target in CMake via add_custom_target() would fix the MacPort version, while staying harmless on the other platforms? Or am I missing something…?

Can you propose a patch for that? I will happily review and commit it for you.

Let me converse with Ken, who has the honor of suggesting this fix, and reply.

Offhand, it seems that somewhere (immediately) after this line

https://github.com/llvm/llvm-project/blob/4c92e31dd0f1bd152eda883af20ff7fbcaa14945/compiler-rt/lib/builtins/CMakeLists.txt#L537

you’d need (something like)

add_custom_target(compiler-rt-asm ALL DEPENDS ${helper_asm})

Thanks!

I am not sure how that `add_custom_target` can help. Let's try run it during configuration time. Patch here: https://reviews.llvm.org/D106305

Steven

Thank you! All I correct to assume that this patch will run through the CI?

Regards,
Uri