I am trying to build my companies compiler using –std=c++23. Unfortunately some header files in MLIR are not compatible.
To be clear, I am not asking about/for any c++23 features in MLIR. Just that downstream projects that use MLIR and want to compile their codebases with c++23 can.
My 2 concrete questions are:
Is there any active work on making MLIR compatible with c++23?
Would the community be receptive to upstream changes that made the MLIR header files compatible with c++23?
Testing strategy
To ensure we don’t regress after fixing the header files my plan was to add a test like the one below, and compile with –std=c++23.
<mlir/…/header1> … <mlir/…/headern> int main() { return 0; }
This should assure that none of the header files have any features that aren’t compatible with c++23.
Any feedback on the compatibility test outlined above would also be greatly appreciated!
If it could be done in a stand-alone build bot that would be easiest. In tree probably with some cmake optional compilation & testing too (given LLVM repo is still on and required to compile with C++17 only compilers, restricted in test requiring more for all consumers). Although here, couldn’t one just set the standard during cmake configuration/bazel compilation option rather than needing a new file/test?
I’m not opposed to testing if it will work for C++23 and perhaps having a tracking issue for them.
Although here, couldn’t one just set the standard during cmake configuration/bazel compilation option rather than needing a new file/test?
Adding a standard cmake configuration instead of adding a new test would mean that I would have to change all the source files to be compatible with C++23 compilation. Where as all I need for my personal objective is the header files to be compatible with C++23 compilation.
That being said doing what you suggesting would have more upside to the community. Especially if somebody wanted to compile the MLIR depedency with C++23 for whatever reason.
error: invalid application of 'sizeof' to an incomplete type 'mlir::AnalysisState'
97 | static_assert(sizeof(_Tp)>0,
...
include/mlir/Analysis/DataFlowFramework.h:370:43: note: in defaulted move assignment operator for 'std::unique_ptr<mlir::AnalysisState>' first required here
370 | analysisStates[*leaderIt][TypeId] =
I’m curious why is this a C++23-only failure? Needed the full definition for sizeof is a problem in all C++ versions. The message refers to a defaulted move assignment operator though, something changed in C++23 in this area?
For what its worth, this was only an issue in clang and not gcc.
I was able to get the mlir/Analysis/DataFlowFramework.h part to compile with C++23 with the following change
diff --git a/mlir/include/mlir/Analysis/DataFlowFramework.h b/mlir/include/mlir/Analysis/DataFlowFramework.h
index 49862927caff..d1320b0d1b4a 100644
--- a/mlir/include/mlir/Analysis/DataFlowFramework.h
+++ b/mlir/include/mlir/Analysis/DataFlowFramework.h
@@ -52,6 +52,11 @@ inline ChangeResult operator&(ChangeResult lhs, ChangeResult rhs) {
/// Forward declare the analysis state class.
class AnalysisState;
+/// Custom deleter for AnalysisState to handle C++23 incomplete type requirements
+struct AnalysisStateDeleter {
+ void operator()(AnalysisState *p);
+};
+
/// Program point represents a specific location in the execution of a program.
/// A sequence of program points can be combined into a control flow graph.
struct ProgramPoint : public StorageUniquer::BaseStorage {
@@ -476,7 +481,7 @@ private:
/// A type-erased map of lattice anchors to associated analysis states for
/// first-class lattice anchors.
- DenseMap<LatticeAnchor, DenseMap<TypeID, std::unique_ptr<AnalysisState>>>
+ DenseMap<LatticeAnchor, DenseMap<TypeID, std::unique_ptr<AnalysisState, AnalysisStateDeleter>>>
analysisStates;
/// A map of Ananlysis state type to the equivalent lattice anchors.
@@ -733,10 +738,10 @@ StateT *DataFlowSolver::getOrCreateState(AnchorT anchor) {
// Replace to leader anchor if found.
LatticeAnchor latticeAnchor(anchor);
latticeAnchor = getLeaderAnchorOrSelf<StateT>(latticeAnchor);
- std::unique_ptr<AnalysisState> &state =
+ std::unique_ptr<AnalysisState, AnalysisStateDeleter> &state =
analysisStates[latticeAnchor][TypeID::get<StateT>()];
if (!state) {
- state = std::unique_ptr<StateT>(new StateT(anchor));
+ state = std::unique_ptr<AnalysisState, AnalysisStateDeleter>(new StateT(anchor));
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
state->debugName = llvm::getTypeName<StateT>();
#endif // LLVM_ENABLE_ABI_BREAKING_CHECKS
diff --git a/mlir/lib/Analysis/DataFlowFramework.cpp b/mlir/lib/Analysis/DataFlowFram
ework.cpp
index 16f7033da7ed..0546a26fd737 100644
--- a/mlir/lib/Analysis/DataFlowFramework.cpp
+++ b/mlir/lib/Analysis/DataFlowFramework.cpp
@@ -26,6 +26,11 @@
using namespace mlir;
+// Implementation for AnalysisStateDeleter
+void AnalysisStateDeleter::operator()(AnalysisState *p) {
+ delete p;
+}
+
IMO it’s perfectly fine to land forward-compat changes for C++, but I wouldn’t want this to become a (blocking) CI check until we actually decide on making the switch to C++23 and the new minimum toolchains required. We are going to switch to C++23 eventually, but until this becomes a thing, the interested parties should cover the maintenance cost.
the interested parties should cover the maintenance cost.
I struggle to see what the maintenance cost is. The vast majority of all code passes with C++23 out of the box. For context to build LLVM/MLIR with C++ I had to make the following changes
I can’t justify using C++23 in my companies compiler if MLIR can regress at any moment. I am willing to do the up front work to get MLIR built with C++23 and put a stand-alone build bot like @jpienaar suggested.
I think the community would appreciate knowing that they can use C++23 in their projects. And the only way to guarantee that is with a full blocking build. Frankly I’m a bit surprised that this wasn’t already supported.
Could you give an example of the maintenance cost you are envisioning?
Is it that we spend more time in CI with an extra build configuration.
Or that someone adds code that can’t compile with C++23 and needs to figure out why but doesn’t have access to a clang version with support for C++23.
I imagine it will be exceedingly rare that the latter happens.
If anything I imagine that the maintenance burden is greater without the blocking CI checks. Every time someone adds code to the codebase that isn’t compatible with C++23, somebody will later have to go in and make a patch. They will need to get a reviewer from that specific part of the codebase and explain why the previous code didn’t work in C++23, thereby increasing the total number of reviews needed.
I think it should be similar to how we approach bazel support – it lives inside the repo but contributors are not required to update the bazel build files. Many folks do it anyway because most cmake changes map 1:1 to bazel changes, but everything else that was not handled is patched by the bazel maintainers or community members who use bazel.
I’d think that the maintenance cost of being compatible with C++23 will become very low once we get it to compile for the first time. Later on, we can fix forward as we identify new regressions, which I expect to be rare.
I tend to separate the minimum C++ version (which is the default one we use) with the supported version: we supported C++17 much earlier than the migration of the default (and minimum) from C++14 to C++17.
I see it in the same way as compilers: we support a minimal versions and any new compiler gets automatically added to the set. That is anyone should be welcome to not only send patched but also add CI for this IMO (that I don’t see this the same as Bazel).
We probably don’t have alignment on this and the LLVM policy is likely not explicit enough on this, should we open another global RFC on this?
Seems like this shouldn’t be a drag on the development then?
The issue is that clang is checking the template’s body before being instantiated, and it appears it’s now checking them on mores instances, like constexprstd::unique_ptr<T,Deleter>::operator= - cppreference.com. A minimal reproducer is:
// clang++-19 test.cpp -std=c++23 -c out.o
#include <memory>
#include <vector>
class A;
struct Container {
template <typename T>
void foo(std::vector<std::unique_ptr<A>> &foo) {
x = std::move(foo);
}
std::vector<std::unique_ptr<A>> x;
};
class A {
int x;
};
The solution is not necessarily adding a deleter to the unique ptr, but moving the definition of the template after the class declaration:
// clang++-19 test.cpp -std=c++23 -c out.o
#include <memory>
#include <vector>
class A;
struct Container {
template <typename T>
void foo(std::vector<std::unique_ptr<A>> &foo);
std::vector<std::unique_ptr<A>> x;
};
class A {
int x;
};
template <typename T>
void Container::foo(std::vector<std::unique_ptr<A>> &foo) {
x = std::move(foo);
}
My issue with that is that when we bump the language version, we are careful to select toolchain versions that support it without major known issues… If we say that we now want to support multiple newer C++ versions without specifying the minimum version of each toolchain, we risk running into compiler bugs / language defects. This is probably less of an issue than actually using new language constructs, but I’m still cautious because I want to avoid having to spend my time checking out multiple toolchains and investigating issues that may end up being simple (host) compiler bugs. These are often surfaced by ADT code that I maintain.