[RFC] [C++20] [Modules] Introduce Thin BMI and Decls hash

Some updates. I’ve sent [Serialization] No transitive identifier change by ChuanqiXu9 · Pull Request #92085 · llvm/llvm-project · GitHub. After we land this patch, the contents of a module file won’t get affected by adding (or removing) declaration/identifiers (but we still work for types) in a unused module file. See the link for examples.

Here the term unused is meant to be implementation defined. That said, we won’t tell the users about the specific rules. But we may only say “it is the responsibility of the compiler to make sure the BMI gathers all the information needed”. This is the convention.

In the yesterday’s meeting, @ben.boeckel throws some concerns about the ADL. And I tried to make an example here:

// RUN: rm -rf %t
// RUN: split-file %s %t
//
// RUN: %clang_cc1 -std=c++20 %t/Common.cppm -emit-reduced-module-interface -o %t/Common.pcm
//
// RUN: %clang_cc1 -std=c++20 %t/m-partA.cppm -emit-reduced-module-interface -o %t/m-partA.pcm \
// RUN:     -fmodule-file=Common=%t/Common.pcm
// RUN: %clang_cc1 -std=c++20 %t/m-partA.v1.cppm -emit-reduced-module-interface -o \
// RUN:     %t/m-partA.v1.pcm -fmodule-file=Common=%t/Common.pcm
// RUN: %clang_cc1 -std=c++20 %t/m-partB.cppm -emit-reduced-module-interface -o %t/m-partB.pcm
// RUN: %clang_cc1 -std=c++20 %t/m.cppm -emit-reduced-module-interface -o %t/m.pcm \
// RUN:     -fmodule-file=m:partA=%t/m-partA.pcm -fmodule-file=m:partB=%t/m-partB.pcm \
// RUN:     -fmodule-file=Common=%t/Common.pcm
// RUN: %clang_cc1 -std=c++20 %t/m.cppm -emit-reduced-module-interface -o %t/m.v1.pcm \
// RUN:     -fmodule-file=m:partA=%t/m-partA.v1.pcm -fmodule-file=m:partB=%t/m-partB.pcm \
// RUN:     -fmodule-file=Common=%t/Common.pcm
//
// Produce B.pcm and B.v1.pcm
// RUN: %clang_cc1 -std=c++20 %t/B.cppm -emit-reduced-module-interface -o %t/B.pcm \
// RUN:     -fmodule-file=m=%t/m.pcm -fmodule-file=m:partA=%t/m-partA.pcm \
// RUN:     -fmodule-file=m:partB=%t/m-partB.pcm -fmodule-file=Common=%t/Common.pcm
// RUN: %clang_cc1 -std=c++20 %t/B.cppm -emit-reduced-module-interface -o %t/B.v1.pcm \
// RUN:     -fmodule-file=m=%t/m.v1.pcm -fmodule-file=m:partA=%t/m-partA.v1.pcm \
// RUN:     -fmodule-file=m:partB=%t/m-partB.pcm -fmodule-file=Common=%t/Common.pcm
//
// Verify that both B.pcm and B.v1.pcm can work as expected.
// RUN: %clang_cc1 -std=c++20 %t/use.cpp -fsyntax-only -verify -fmodule-file=m=%t/m.pcm \
// RUN:     -fmodule-file=m:partA=%t/m-partA.pcm -fmodule-file=m:partB=%t/m-partB.pcm \
// RUN:     -fmodule-file=B=%t/B.pcm -fmodule-file=Common=%t/Common.pcm \
// RUN:     -DEXPECTED_VALUE=false
// RUN: %clang_cc1 -std=c++20 %t/use.cpp -fsyntax-only -verify -fmodule-file=m=%t/m.v1.pcm \
// RUN:     -fmodule-file=m:partA=%t/m-partA.v1.pcm -fmodule-file=m:partB=%t/m-partB.pcm \
// RUN:     -fmodule-file=B=%t/B.v1.pcm -fmodule-file=Common=%t/Common.pcm \
// RUN:     -DEXPECTED_VALUE=true
//
// Since we add new ADL function in m-partA.v1.cppm, B.v1.pcm is expected to not be the same with
// B.pcm.
// RUN: not diff %t/B.pcm %t/B.v1.pcm &> /dev/null

// Test that BMI won't differ if it doesn't refer adl.
// RUN: %clang_cc1 -std=c++20 %t/C.cppm -emit-reduced-module-interface -o %t/C.pcm \
// RUN:     -fmodule-file=m=%t/m.pcm -fmodule-file=m:partA=%t/m-partA.pcm \
// RUN:     -fmodule-file=m:partB=%t/m-partB.pcm -fmodule-file=Common=%t/Common.pcm
// RUN: %clang_cc1 -std=c++20 %t/C.cppm -emit-reduced-module-interface -o %t/C.v1.pcm \
// RUN:     -fmodule-file=m=%t/m.v1.pcm -fmodule-file=m:partA=%t/m-partA.v1.pcm \
// RUN:     -fmodule-file=m:partB=%t/m-partB.pcm -fmodule-file=Common=%t/Common.pcm
// RUN: diff %t/C.pcm %t/C.v1.pcm &> /dev/null

//--- Common.cppm
export module Common;

export namespace N {
    struct A {
        constexpr operator int() {
            return 43;
        }
    };
}

//--- m-partA.cppm
export module m:partA;
import Common;

export namespace N {
// A placeholder to introduce the type `bool (N::A)`.
constexpr bool placeholder(A) { return false; }
}

//--- m-partA.v1.cppm
export module m:partA;
import Common;

export namespace N {
constexpr bool placeholder(A) { return false; }
constexpr bool adl(A) { return true; }
}

//--- m-partB.cppm
export module m:partB;

export constexpr bool adl(int) { return false; }

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- B.cppm
export module B;
import m;

export template <class C>
constexpr bool test_adl(C c) {
    return adl(c);
}

//--- use.cpp
// expected-no-diagnostics
import B;
import Common;

void test() {
    N::A a;
    static_assert(test_adl(a) == EXPECTED_VALUE);
}

//--- C.cppm
export module B;
import m;

export template <class C>
constexpr bool not_test_adl(C c) {
    return false;
}

Now the line not diff %t/B.pcm %t/B.v1.pcm &> /dev/null can’t pass.

I believe we can solve such problems when writing UnresolvedLookupExpr.

The theory is, in the consumer’s units, they can only access the reachable but not visible contents via the directly imported modules. In other word, the directly imported modules can have the information that which contents are reachable to the consumers. So it should be implementable.

And of course, there might be some defects like the above ADL example, but I think we can fix them iteratively. This may be fine since this should be experimental. And before we complete, the build system are required to treat all the (indirectly) imported modules as the dependencies.