What is the status of C++23 Support in MLIR

I am trying to build my companies compiler using –std=c++23. Unfortunately some header files in MLIR are not compatible.

To be clear, I am not asking about/for any c++23 features in MLIR. Just that downstream projects that use MLIR and want to compile their codebases with c++23 can.

My 2 concrete questions are:

  1. Is there any active work on making MLIR compatible with c++23?

  2. Would the community be receptive to upstream changes that made the MLIR header files compatible with c++23?

Testing strategy

To ensure we don’t regress after fixing the header files my plan was to add a test like the one below, and compile with –std=c++23.

<mlir/…/header1>

<mlir/…/headern>
int main() {
return 0;
}

This should assure that none of the header files have any features that aren’t compatible with c++23.

Any feedback on the compatibility test outlined above would also be greatly appreciated!

2 Likes

If it could be done in a stand-alone build bot that would be easiest. In tree probably with some cmake optional compilation & testing too (given LLVM repo is still on and required to compile with C++17 only compilers, restricted in test requiring more for all consumers). Although here, couldn’t one just set the standard during cmake configuration/bazel compilation option rather than needing a new file/test?

I’m not opposed to testing if it will work for C++23 and perhaps having a tracking issue for them.

What kind of failures are you seeing?

Isn’t C++17, (C++20?) a subset of C++=23 ?

Although here, couldn’t one just set the standard during cmake configuration/bazel compilation option rather than needing a new file/test?

Adding a standard cmake configuration instead of adding a new test would mean that I would have to change all the source files to be compatible with C++23 compilation. Where as all I need for my personal objective is the header files to be compatible with C++23 compilation.

That being said doing what you suggesting would have more upside to the community. Especially if somebody wanted to compile the MLIR depedency with C++23 for whatever reason.

What kind of failures are you seeing?

I believe we may only have 1 or two places where compilation fails with C++23. They seem to be related to AnalysisState’s full definition not being available to mlir/Analysis/DataFlowFramework.h. It is only forward declared. llvm-project/mlir/include/mlir/Analysis/DataFlowFramework.h at main · llvm/llvm-project · GitHub

The error looks like this

error: invalid application of 'sizeof' to an incomplete type 'mlir::AnalysisState'
   97 |         static_assert(sizeof(_Tp)>0,
...
include/mlir/Analysis/DataFlowFramework.h:370:43: note: in defaulted move assignment operator for 'std::unique_ptr<mlir::AnalysisState>' first required here
  370 |         analysisStates[*leaderIt][TypeId] =

Mostly. But things can be deprecated. And example being std::aligned_union : https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1413r3.pdf

I’m curious why is this a C++23-only failure? Needed the full definition for sizeof is a problem in all C++ versions. The message refers to a defaulted move assignment operator though, something changed in C++23 in this area?

I believe C++ may have tightened the rules. P0722R3: Efficient sized delete for variable sized classes

For what its worth, this was only an issue in clang and not gcc.

I was able to get the mlir/Analysis/DataFlowFramework.h part to compile with C++23 with the following change

diff --git a/mlir/include/mlir/Analysis/DataFlowFramework.h b/mlir/include/mlir/Analysis/DataFlowFramework.h
index 49862927caff..d1320b0d1b4a 100644
--- a/mlir/include/mlir/Analysis/DataFlowFramework.h
+++ b/mlir/include/mlir/Analysis/DataFlowFramework.h
@@ -52,6 +52,11 @@ inline ChangeResult operator&(ChangeResult lhs, ChangeResult rhs) {
 /// Forward declare the analysis state class.
 class AnalysisState;

+/// Custom deleter for AnalysisState to handle C++23 incomplete type requirements
+struct AnalysisStateDeleter {
+  void operator()(AnalysisState *p);
+};
+
 /// Program point represents a specific location in the execution of a program.
 /// A sequence of program points can be combined into a control flow graph.
 struct ProgramPoint : public StorageUniquer::BaseStorage {
@@ -476,7 +481,7 @@ private:

   /// A type-erased map of lattice anchors to associated analysis states for
   /// first-class lattice anchors.
-  DenseMap<LatticeAnchor, DenseMap<TypeID, std::unique_ptr<AnalysisState>>>
+  DenseMap<LatticeAnchor, DenseMap<TypeID, std::unique_ptr<AnalysisState, AnalysisStateDeleter>>>
       analysisStates;

   /// A map of Ananlysis state type to the equivalent lattice anchors.
@@ -733,10 +738,10 @@ StateT *DataFlowSolver::getOrCreateState(AnchorT anchor) {
   // Replace to leader anchor if found.
   LatticeAnchor latticeAnchor(anchor);
   latticeAnchor = getLeaderAnchorOrSelf<StateT>(latticeAnchor);
-  std::unique_ptr<AnalysisState> &state =
+  std::unique_ptr<AnalysisState, AnalysisStateDeleter> &state =
       analysisStates[latticeAnchor][TypeID::get<StateT>()];
   if (!state) {
-    state = std::unique_ptr<StateT>(new StateT(anchor));
+    state = std::unique_ptr<AnalysisState, AnalysisStateDeleter>(new StateT(anchor));
 #if LLVM_ENABLE_ABI_BREAKING_CHECKS
     state->debugName = llvm::getTypeName<StateT>();
 #endif // LLVM_ENABLE_ABI_BREAKING_CHECKS
diff --git a/mlir/lib/Analysis/DataFlowFramework.cpp b/mlir/lib/Analysis/DataFlowFram
ework.cpp
index 16f7033da7ed..0546a26fd737 100644
--- a/mlir/lib/Analysis/DataFlowFramework.cpp
+++ b/mlir/lib/Analysis/DataFlowFramework.cpp
@@ -26,6 +26,11 @@

 using namespace mlir;

+// Implementation for AnalysisStateDeleter
+void AnalysisStateDeleter::operator()(AnalysisState *p) {
+  delete p;
+}
+
1 Like

IMO it’s perfectly fine to land forward-compat changes for C++, but I wouldn’t want this to become a (blocking) CI check until we actually decide on making the switch to C++23 and the new minimum toolchains required. We are going to switch to C++23 eventually, but until this becomes a thing, the interested parties should cover the maintenance cost.

the interested parties should cover the maintenance cost.

I struggle to see what the maintenance cost is. The vast majority of all code passes with C++23 out of the box. For context to build LLVM/MLIR with C++ I had to make the following changes

git diff --stat
 llvm/include/llvm/CodeGen/MachineFunctionAnalysis.h                  |  8 ++++--
 llvm/include/llvm/CodeGen/ResourcePriorityQueue.h                    |  1 +
 llvm/include/llvm/DebugInfo/GSYM/GsymContext.h                       |  1 +
 llvm/include/llvm/DebugInfo/PDB/Native/InputFile.h                   | 16 +++++++++---
 llvm/include/llvm/DebugInfo/PDB/PDBSymbolFunc.h                      |  1 +
 llvm/include/llvm/DebugInfo/PDB/PDBSymbolTypeEnum.h                  |  2 ++
 llvm/include/llvm/DebugInfo/PDB/PDBSymbolTypeUDT.h                   |  1 +
 llvm/include/llvm/MC/MCGOFFStreamer.h                                |  4 +--
 llvm/include/llvm/Passes/TargetPassRegistry.inc                      | 14 +++++------
 llvm/lib/CodeGen/LiveDebugVariables.cpp                              |  8 +++---
 llvm/lib/CodeGen/MachineFunctionAnalysis.cpp                         |  7 +++++-
 llvm/lib/CodeGen/SelectionDAG/ResourcePriorityQueue.cpp              |  2 ++
 llvm/lib/DebugInfo/BTF/BTFParser.cpp                                 |  2 +-
 llvm/lib/DebugInfo/GSYM/GsymContext.cpp                              |  2 ++
 llvm/lib/DebugInfo/PDB/Native/InputFile.cpp                          | 30 +++++++++++++++++++---
 llvm/lib/MC/MCGOFFStreamer.cpp                                       |  5 ++++
 llvm/lib/Remarks/BitstreamRemarkParser.cpp                           |  3 +++
 llvm/lib/Remarks/BitstreamRemarkParser.h                             |  3 +--
 llvm/lib/TableGen/TGParser.cpp                                       | 19 ++++++++------
 llvm/lib/TableGen/TGParser.h                                         | 11 +++++---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp                            |  2 +-
 llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp       | 39 ++++++++++++++++++++---------
 llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.h         | 55 +++++++++++++++++++++++-----------------
 mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h |  4 +--
 mlir/include/mlir/Query/Matcher/MatchersInternal.h                   | 62 +++++++++++++++++++++++++++-------------------
 mlir/lib/Pass/PassRegistry.cpp                                       |  2 +-
 26 files changed, 201 insertions(+), 103 deletions(-)

I can’t justify using C++23 in my companies compiler if MLIR can regress at any moment. I am willing to do the up front work to get MLIR built with C++23 and put a stand-alone build bot like @jpienaar suggested.

I think the community would appreciate knowing that they can use C++23 in their projects. And the only way to guarantee that is with a full blocking build. Frankly I’m a bit surprised that this wasn’t already supported.

Could you give an example of the maintenance cost you are envisioning?

  • Is it that we spend more time in CI with an extra build configuration.
  • Or that someone adds code that can’t compile with C++23 and needs to figure out why but doesn’t have access to a clang version with support for C++23.

I imagine it will be exceedingly rare that the latter happens.

If anything I imagine that the maintenance burden is greater without the blocking CI checks. Every time someone adds code to the codebase that isn’t compatible with C++23, somebody will later have to go in and make a patch. They will need to get a reviewer from that specific part of the codebase and explain why the previous code didn’t work in C++23, thereby increasing the total number of reviews needed.

1 Like

I think it should be similar to how we approach bazel support – it lives inside the repo but contributors are not required to update the bazel build files. Many folks do it anyway because most cmake changes map 1:1 to bazel changes, but everything else that was not handled is patched by the bazel maintainers or community members who use bazel.

I’d think that the maintenance cost of being compatible with C++23 will become very low once we get it to compile for the first time. Later on, we can fix forward as we identify new regressions, which I expect to be rare.

I tend to separate the minimum C++ version (which is the default one we use) with the supported version: we supported C++17 much earlier than the migration of the default (and minimum) from C++14 to C++17.
I see it in the same way as compilers: we support a minimal versions and any new compiler gets automatically added to the set. That is anyone should be welcome to not only send patched but also add CI for this IMO (that I don’t see this the same as Bazel).
We probably don’t have alignment on this and the LLVM policy is likely not explicit enough on this, should we open another global RFC on this?

Seems like this shouldn’t be a drag on the development then?

2 Likes

Thanks! Let’s just add the missing include though. I wasn’t pushing back on this, that was a curiosity.

The issue is that clang is checking the template’s body before being instantiated, and it appears it’s now checking them on mores instances, like constexpr std::unique_ptr<T,Deleter>::operator= - cppreference.com. A minimal reproducer is:

// clang++-19 test.cpp -std=c++23 -c out.o
#include <memory>
#include <vector>

class A;

struct Container {
  template <typename T>
  void foo(std::vector<std::unique_ptr<A>> &foo) {
    x = std::move(foo);
  }

  std::vector<std::unique_ptr<A>> x;
};

class A {
  int x;
};

The solution is not necessarily adding a deleter to the unique ptr, but moving the definition of the template after the class declaration:

// clang++-19 test.cpp -std=c++23 -c out.o
#include <memory>
#include <vector>

class A;

struct Container {
  template <typename T>
  void foo(std::vector<std::unique_ptr<A>> &foo);

  std::vector<std::unique_ptr<A>> x;
};

class A {
  int x;
};

template <typename T>
void Container::foo(std::vector<std::unique_ptr<A>> &foo) {
  x = std::move(foo);
}
2 Likes

Thanks for the fix!

I made it into a PR if anybody has a couple of cycles to review

should we open another global RFC on this?

I’m strongly in favor of this. Happy to help, however I can.

My issue with that is that when we bump the language version, we are careful to select toolchain versions that support it without major known issues… If we say that we now want to support multiple newer C++ versions without specifying the minimum version of each toolchain, we risk running into compiler bugs / language defects. This is probably less of an issue than actually using new language constructs, but I’m still cautious because I want to avoid having to spend my time checking out multiple toolchains and investigating issues that may end up being simple (host) compiler bugs. These are often surfaced by ADT code that I maintain.

1 Like

FYI, trying to encode this: Policy on supporting newer C++ standard in LLVM codebase

1 Like