Alternatives to the implementation of std modules

(I know libcxx folks prefer Discord. I just feel discourse may be the place for longer text. I’ll send this to Discord later)

Sorry in ahead since this is indeed frustrating.

I found the current implementation of the std module compiles slower than the non-module codes in small projects. Here is the example:

import std;

int main(int, char**) {
  std::vector<int> v;
  v.push_back(5);
  v.push_back(7);
  v.push_back(2);
  v.push_back(7);
  v.push_back(4);
  v.push_back(1);

  int t{3};
  std::optional<int> op{3};
  v.push_back((op <=> t) == std::strong_ordering::equal);

  std::sort(v.begin(), v.end());
  for (int i : v)
    std::cout << i << " ";
  std::cout << "\n";
  return 0;
}

It takes 3.395s to compile. If we apply the optimization described in [Modules] Faster compilation speed or better diagnostic messages?, it will still take 2.455s to compile.

And for the non-modular version:

// import std;
#include <vector>
#include <optional>
#include <algorithm>
#include <iostream>

int main(int, char**) {
  
  std::vector<int> v;
  v.push_back(5);
  v.push_back(7);
  v.push_back(2);
  v.push_back(7);
  v.push_back(4);
  v.push_back(1);

  int t{3};
  std::optional<int> op{3};
  v.push_back((op <=> t) == std::strong_ordering::equal);

  std::sort(v.begin(), v.end());
  for (int i : v)
    std::cout << i << " ";
  std::cout << "\n";
  return 0;
}

It will only take 1.861s to compile. So the std module is slower at least in this case.

The root reason is that the compiler has to deal with many redeclarations within different module units to make things correct. The compiler has tried really hard to make things happen correctly. And personally, I feel it is hard to fix the issue (slow to deal with the redeclarations in different modules) fundamentally.

(I’ll try to document the practice to avoid other users to fall into this)

Currently we implemented the std module by declaring the visibilities of each declarations in each partitions. The structure is clear. But I didn’t foreseen the performance penalty. Then here is the problem.

While the current implementation should be correct and we’re still busying with testing and distributing std modules, I think we can continue in the current direction. Since what we’re doing is orthogonal with the implementation of the std module. We can go back and refactor the implementation of the std modules.


Then for the alternatives of the implementation of std modules, we roughly have 2 solutions.

First is similar with the MSVC’s style:

  1. Include every standard header in the module purview.
  2. Wrap every thing with language linkage (extern "C"/"C++") to avoid breaking ABI.
  3. Export the declarations we want.

Roughly it should be:

// config
#ifdef IN_MODULES
#define LANGUAGE_LINKAGE_IN_MODULES extern "C++"
#else
#define LANGUAGE_LINKAGE_IN_MODULES
#endif

#ifdef IN_MODULES
#define EXPORT export
#else
#define EXPORT
#endif

// vector
...
LANGUAGE_LINKAGE_IN_MODULES
template<...>
class vector_base { ... };
 
EXPORT LANGUAGE_LINKAGE_IN_MODULES
template<...>
class vector : public vector_base<...> { ... }

// std.cppm
module;
#define IN_MODULES
// include headers from C libraries to avoid the C libraries appear in the module purview
export module std;
#include <vector>
...

Note: this is similar but not identical with MSVC’s style since I don’t see a lot of extern "C++" in MSSTL. I am not sure if MSVC treat std module differently. Maybe it is worth to ask MSVC team about this. But at least for clang now, the declarations in the module purview without the language linkage will get a different mangle name. This is problematic.

The second solution is a simple trick to convert existing code bases to:

// std-vector.export.list
export namespace std {
    using std::vector;
    ...
}

// std.cppm
module;
// include all the standard headers
export module std;
#include <std-vector.export.list>
#include <std-list.export.list>
...

In the method, we will only have one module unit. And all the other export using declaration will be included into the primary module interface. So that we can keep the current structure and remain exactly one module unit.

1 Like

Thanks a lot for investigating this, it’s really important to get this right!

So if I understand correctly, the only issue is that we have a bunch of module partitions and we’re doing export import :foo; right now? In that case I think the second approach you suggested (where we basically #include <std-vector.export.list> into std.cppm is the easiest way to keep our current organization (which I think we all like) while solving this problem. I’m curious to know what @mordante thinks about this.

Have you tried the same example with approach #2 to compare the performance? It would be important to confirm that it’ll fix the perf issues if we are to do this re-organization.

So if I understand correctly, the only issue is that we have a bunch of module partitions and we’re doing export import :foo; right now?

Yes. The current style is correct. But there is some compiler limitations.

Have you tried the same example with approach #2 to compare the performance? It would be important to confirm that it’ll fix the perf issues if we are to do this re-organization.

In our downstream implementations, we implement the std module by including every standard headers into the single module unit without specifying which declarations are visible. (I didn’t contribute this since this is not standard conforming.) And the performance numbers show that it is indeed faster than the non-modular versions.

Thanks a lot for the information!

No problem, that’s always the risk when using experimental features.

My initial gut feeling is also to go for option #2.

At the moment I’m quite busy preparing for the LLVM 17 release. After that work slows down again I will have a closer look at the possible changes for the modules in libc++. Since modules are experimental in libc++17 and not installed at all I am fine to have a non-optimal solution in libc++17.

I’ve done some quick testing in libc++. I’ve converted the tests in libcxx/test/std/time/ to use modules instead of headers.

I changed the module partitions chrono, format, string, string_view, and vector to use method #2 and left the other partitions untouched.

I tested with the Clang build we currently use in the libc++ CI (Ubuntu clang version 17.0.0 (++20230713042327+020cdaff615a-1~exp1~20230713042436.1054)).
The directory contains 339 lit tests and I took the average of 3 runs
This reduced the time lit spend on these test from 264 to 222 seconds, so roughly 15%.

The numbers can’t be really compared to @ChuanqiXu’s numbers; lit executes the tests too. Still it gives me confidence it’s worth the effort to make this change. It will break libc++'s module partition test and I want to look how we can keep testing this. (I’ve some ideas.) I’m working on some other patches at the moment and I hope to have time to look at this next week.

2 Likes

Great to hear that! I am also curious if it is possible to merge the backport the change to 17.x? I know generally we only backport bug fixes and very important features. And this change is primarily an optimization. But I am wondering it may be OK since the std module in libcxx is still experimental. How do you think about this?

Based on the reasons you mention I’m somewhat reluctant to backport this change. The other reason not to do it is simply due to the state of the std module in libc++ 17 is not ready for production usage. But before making up my mind I first have to have a patch and see how intrusive the changes are and how soon it’s approved.

Slightly off-topic. I’ve finished most of the format related work which was high on my priority list. I’ve just a few small loose ends left. This means I will soon shift my focus on modules (and time_zones). So there will be more module improvements that will not make it in LLVM 17.

1 Like

Yes, fully understood. I don’t think it is bad too. I just hope that the fans can be happy when they give it a try instead of complaining and being frustrating.

Because, (I guess you already know this), while the WG21 enjoys multiple benefits from modules, many users only hope it can speedup the compilation further.

Also (in my mind) the fact that std module is not ready is also a good reason that we can backport this since we don’t need to worry about breaking anything.

1 Like

I think at the moment using modules is still frustrating, but things are improving. My personal experience of a few weeks ago was a lot better than a year ago. I really appreciate the amount of bugs you fixed and other improvements you and others made. My biggest issue at the moment is the lack of libc++ support :wink:

I agree slow compilation speed of C++ is an issue and modules are a great benefit. Still I see modules as a lot more than just “fix compilation” speed.

Backporting takes some work also for the release managers so I want to weight the pros and cons, but I first need a patch. Having better compilation speed of libc++ will not help its not so great user experience at the moment.

2 Likes

I have created ⚙ D156907 [libc++][modules] Removes the module partitions. using method 2. The results look very promising. Now import std; is faster than #include <print> for a hello world program.
When using the module using FetchContent in CMake the size of teh build files is greatly reduced. Look at the patch’s commit message for more details.

The removal of the partitions is observable and might break tools using the current experimental module code. I know @aaronmondal did work to get this working in Bazel. The patch is not small, so I’m not convinced we should backport it to LLVM-17. But I’ll see what the reviewers of the patch think and what the responses in this thread are.

3 Likes

I got reminded of this thanks to llvmweekly. I believe I understand what is being changed to the module implementation, namely using oldstyle #include to have a single cppm file instead of using module partitions and export-import them to construct the std module.

I don’t understand the why this is needed. It is explained this is a compiler issue:

I’m not familiar with the implementation at all, though as a user, I don’t expect the compilation of my code being influenced by the implementation of a module which is compiled separately. Very naively, I expect the import std; statement to do some kind of lookup, finding a file with some data into it, which contains all function/… declarations and template definitions. Regardless of how that file is created, I would expect that with the same API, the file would have the same data (structured in a different order). Here only the way this file was created was different as the include variant creates the file in one go, while the import variant has some intermediates. With the import variant, it see opportunities to have the top level file referring to the partition info files, though I don’t expect a 7-8x slowdown because of that.

I do see why redeclarations would slow down the creation of the std module, as that is where I would expect this.

As such, my model of how the compiler would deal with modules seems to be wrong. Can you explain where I’m going wrong?

I am not sure where I shall start. It is pretty complex. I think the key point here is that the compiler will see many redeclarations with the partitions style. And the compiler can’t discard the redeclaration simply since there are other declarations pointing to them. So the compiler needs to maintain the redeclaration chain. Then there are a lot of places where we need to iterate the redeclarations (e.g., the ADL). So it will slow down the compilation performance significantly.

The key point to understand the take-away as a user should notice that the problem actually is the redeclarations in many module units. And the partitions are not the issue.

In another word, if we were living in c++ world without headers at the very beginning, this may not be a problem.

1 Like

That makes some sense. I’m not 100% following on why those redeclarations can’t be discarded as they should have the same symbol name and as such the same implementation as they are exposed together.

My main takeaway: if you start using modules, use a bottom-up approach.

That makes some sense. I’m not 100% following on why those redeclarations can’t be discarded as they should have the same symbol name and as such the same implementation as they are exposed together.

One of the reasons is: since there are other entities referring to the redeclarations in their own module units. So it is wrong to discard the declaration directly and it is clearly bad to iterate the AST to change the values. Another reason may be some internal data structure designs. But they now are hard to refactor for sure.

My main takeaway: if you start using modules, use a bottom-up approach.

Yes.

1 Like

They do not necessarily have the same implementation. Elements are exported as named declarations. Several module partitions had one or more overloads of operator==(T, U). In the new implementations all overloads are directly available in the std module.