Fixing CRT Allocator Support on Windows?

(Double posting from Discord’s #windows channel)
Our downstream LLVM build is recently trying to switch to third party crt allocators too, which brought me to D71786 which raises more questions than it solved.

The patch attempted to support three implementations:

  • rpmalloc
    Probably works, the hard-coded path still works on today’s ToT of rpmalloc, not sure about future.

  • snmalloc
    The hard-coded path no longer works since a long time ago.

  • mimalloc
    Requires manually applying some third-party patch in the issues section from 3 years that remains unmerged by upstream.

On top of that, LLVM_USE_CRT_{target}added by this patch seems to conflict with CMake 3.15’s MSVC_RUNTIME_LIBRARY and maybe could be deprecated / discarded.

Ideas top of my head:

  • rpmalloc
    leaves it there as is.
  • snmalloc
    instead, use snmalloc/CMakeLists.txt to build the library as part of LLVM and link to the library instead of hard-coded paths?
  • mimalloc
    discard its support.

LLVM_USE_CRT_{target} : replace all usages with CMake’s variant, now that LLVM’s minimum CMake version required is way beyond 3.15

Unfortunately I can’t find either the reviewer or the author of the original patch either here or on Discord. So I’ll leave this as-is.

@aganea is the original author.

I think we should revisit the notion that we should re-implement external allocators build systems in llvm. I support vendoring one implementation in that case.

@aganea is the original author.

Thanks. I tried to email his discussing first but my email was rejected :frowning:

I support vendoring one implementation in that case.

In that case, I feel like we should prob choose snmalloc as the integration would be just:

add_subdirectory(/PATH/TO/SOURCE snmalloc-build)
target_link_libraries(LLVMSupport PUBLIC snmallocshim-static)

Waiting for @aganea as I’m nowhere near an expert on this…

We would have to see how that works and performs across platforms we support before making any such decision. We also need to investigate the license.

I think we can first enable it only on Windows, like the original diff did.

IIRC us at hoyoverse saw quite some improvement in our use case with ThinLTO. Though Alexandre probably has more data.

If we stick to the original diff’s design, we don’t ship snmalloc with LLVM itself, the user must explicitly opt-in and build by themselves. Also it has MIT license

We use rpmalloc in Chromium’s llvm build on Windows, and would rather not lose the support for that.

1 Like

yeah the op’s intention was more about fixing the snmalloc branch that requires on manual maintaining and remove the cumbersome mimalloc support :smiley:

I think I was not as clear as I wanted to be. I blame writing comments on my phone :slight_smile:

We also use rpmalloc for our toolchain, so I am not keen on losing that as well.

I think patches to fix what’s broken is fine, but I think we should revisit if this logic, where we need to add custom code for each allocator is the right way forward.

I wonder if there is a way to handle this where the user have to build the allocator code outside LLVM and have a flag to point to the static library. I haven’t investigated deeply enough.

I started to work on vendoring rpmalloc and enable it for more platforms than windows at one point - but it requires some amount of effort that I didn’t have time to do that.

Hello!

We can say with certainty that rpmalloc will be supported in LLVM. It’s the path of least resistance, since there are only a few .c files to build, and the alloc APIs won’t change in the foreseeable future. There are several clients using it, so even if the rpmalloc repo was to change, someone will fix the LLVM integration.

snmalloc and mimalloc were experimental integrations. I am trying to maintain them occasionally to compare the performance and memory usage. Although like @Naville pointed out, they require more maintenance. If someone is willing to contribute with a better integration into LLVM for these allocators, they would be very welcome!

You don’t need to use LLVM_USE_CRT_RELEASE=MT anymore – one can use CMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded instead. We should most likely remove LLVM_USE_CRT_* from LLVM.

As for vendoring rpmalloc, we could do it such as [Support] Vendor rpmalloc · aganea/llvm-project@ebc5764 · GitHub
The only benefit I see is that it would allow compiling LLVM everywhere on any platform with a more optimal allocator, without having to dig in the docs, and understand how to achieve that (using rpmalloc would still remain an option). Several years ago, I had this discussion with the LLVM foundation, regarding vendoring & the rpmalloc licence, and even though it doesn’t fit the LLVM licence, it seems that we could vendor it since the author explicitly said that they were willing to donate the code to LLVM. The only thing is that we should go through the RFC process. If anyone is in favor I could do that.

My only reluctance was that LLVM already contains the SCUDO allocator. However folks from SN-Systems have tried in the past to use it, in place of rpmalloc, and we never really got it to work/match the performance of rpmalloc, on Windows at least. I don’t know if the situation changed.

As for performance, I’ve extensively tested all the popular allocators out there, and rpmalloc was the best above all. Even on recent Ubuntu, it was still able to outperform glibc-alloc by a few percent (last time I tried, a few years ago). On Windows, ThinLTO link is definitely not usable unless you go through one of the lock-free allocators supported in LLVM.

2 Likes

ty. I’ll try to submit a patch this week to remove LLVM_USE_CRT_RELEASE first and see how it goes from there. :smiley:
On a side note, on my recent attempt to build with snmalloc that contains my lastest patch, the build was greeted with a duplicate symbol error. So that’s one more thing worth investigating

@mstorsjo did a bunch of recent work in this area, so he’d make a good reviewer.

Trivial draft PR : [CMake]Remove LLVM_USE_CRT* by Naville · Pull Request #66850 · llvm/llvm-project · GitHub

I fail to see what you’re trying to say here.

A patch from 2019 did add a mention and use of something (which already was in place), which later was deprecated with more recent CMake. When we deprecated LLVM_USE_CRT_{target} we also took care of most of our own internal uses, in e.g. [CMake] Switch the CMP0091 policy (MSVC_RUNTIME_LIBRARY) to the new b… · llvm/llvm-project@c6bd873 · GitHub. After that, there’s nothing relating to the custom allocators that use the deprecated LLVM_USE_CRT_{target} options after that.

This has already been done, in [CMake] Switch the CMP0091 policy (MSVC_RUNTIME_LIBRARY) to the new b… · llvm/llvm-project@c6bd873 · GitHub, no?

That said, as LLVM_USE_CRT_{target} is deprecated and was marked that way in the 17.x branch (which is released now as well), so I guess we could consider removing it at some point. Perhaps that point is now?

But I fail to see the connection to what’s discussed here - the deprecation and switch to MSVC_RUNTIME_LIBRARY (and the associated work to make sure things didn’t break) is already done and complete?

Right, before actually push that pr I was reading the LLVM 16 version of llvm/lib/Support/CMakeLists.txt, which still uses LLVM_USE_CRT for detection. So the initial plan was to remove it and simplify related code before continuing.

After began to actually work on that patch I realized the offending cmake code has been fixed in main and 17+ release. Thought it would still be a good idea to continue the deprecation

Right, I see, that explains the mismatch between statements and actual state of git.

I guess it might make sense to start removing the cruft now already; I’ll comment on the PR.

Reposting my discussion on the other side here Maybe missing _base variants in overrides · Issue #638 · microsoft/snmalloc · GitHub

Apparently windows heap API is a mess and we need a lot more work to fix one specific implementation and vendor that as the default…