Running unit tests with different options than clang defaults

Hi, I’m looking for some advice on how to handle an interesting scenario. This is related to running lit on the z/OS platform. On z/OS we will have support for 64-bit and 32-bit compilation with the default being 32-bit. Even though the default is 32-bit, we find the best way to run lit is to use 64-bit mode with ASCII & IEEE floats. This causes a mismatch within some of the unit tests because the unit tests are built with 64-bit/ASCCI/IEEE and the compilations run by the unit tests are in 32-bit/EBCDIC/hexfloat. A few unit tests fail because of things like verifying the spelling of identifier names or string literals in the AST fail because the compile done by the unit test (using the default compilation mode) will create EBCDIC encoding but the expected values will be in ASCII because the unit test program is compiled with ASCII enabled.

Running lit in 32-bit/EBCDIC/hexfloat isn’t really an option as that would cause an even larger number of failures and lots of special casing for z/OS in the lit tests.

What is the best way to approach this problem?

When you say, “run lit … with ASCII”, I’m not sure quite what you mean. Do you mean that lit was compiled such that it uses ASCII for the ordinary literal encoding? Or do you mean it was compiled with “Enhanced ASCII support”? Or something else?

Are you using _BPXK_AUTOCVT and/or file tagging to help with encoding related issues?

In the long term, my intuition from having worked on z/OS in the past, is that you’ll want to solve for running most everything in the native EBCDIC environment with UTF-8 source files that do not cleanly transcode to EBCDIC appropriately tagged so that they are interpreted as UTF-8.

At any rate, I think I would need to have a more detailed description of how you are setting up the environment before I can really offer any advice.

By “running lit” I’m referring to building the check-clang target and for this question the check-clang-unit target.

This isn’t an issue about the file tagging or auto conversion. (btw we’ve put everything in the compiler to handle that). The problem we’re seeing here has to do with in process memory and the encoding that has. The unit test programs are comparing expected values inside the unit test (encoded in ASCII & IEEE) with the actual values from the AST produced when compiling the program segments with clang-tool (encoded in EBCDIC & hexfloat). I’ll give a couple examples to help explain what is happening.

In the first example, see clang/unittests/AST/ASTImporterTest.cpp. One test that is failing is:

TEST_P(ImportExpr, ImportDesignatedInitExpr) {
  MatchVerifier<Decl> Verifier;
  testImport(
      "void declToImport() {"
      "  struct point { double x; double y; };"
      "  struct point ptarray[10] = "
      "{ [2].y = 1.0, [2].x = 2.0, [0].x = 1.0 }; }",
      Lang_C99, "", Lang_C99, Verifier,
      functionDecl(hasDescendant(initListExpr(
          has(designatedInitExpr(designatorCountIs(2),
                                 hasDescendant(floatLiteral(equals(1.0))),
                                 hasDescendant(integerLiteral(equals(2))))),
          has(designatedInitExpr(designatorCountIs(2),
                                 hasDescendant(floatLiteral(equals(2.0))),
                                 hasDescendant(integerLiteral(equals(2))))),
          has(designatedInitExpr(designatorCountIs(2),
                                 hasDescendant(floatLiteral(equals(1.0))),
                                 hasDescendant(integerLiteral(equals(0)))))))));
}

In this test the floatLiteral() match fails because ASTImporterTest.cpp is compiled in 64-bit mode with IEEE floats and the invocation of clang-tool via the unit test harness is compiling in 32-bit mode with hexfloat. The binary value of 1.0 differs between the ASTImporterTest.cpp program and the AST generated by the compile done within the unit test.

Another example is clang/unittests/AST/Interp/Descriptor.cpp (around line 170).

TEST(Descriptor, Primitives) {
...
    ASSERT_TRUE(E1.deref<char>() == 'f');

...

Similar here except in this case E1.deref<char>() is returning the EBCDIC encoding for ‘f’ since the clang-tool command that generated the AST ran in 32-bit mode with EBCDIC encoding and Descriptor.cpp was compiled as a 64-bit ASCII application.

Thank you, that explanation made things more clear.

Is it feasible to invoke the 32-bit compiler such that it uses ASCII & IEEE floats when run as part of these unit tests? I’m making an assumption that options like -fexec-charset and -mfloat-abi= are available to change the default behavior. This would mean that the default would have to be overridden for any tests that are explicitly intended to exercise EBCDIC and hex floats, but you’ll presumably need that for testing those features of the 64-bit compiler anyway.

Are you asking about LLVM_LIT_ARGS as in Building LLVM with CMake — LLVM 19.0.0git documentation? Or is this a different problem. You can also just run llvm-lit manually with whatever arguments you need but it won’t handle dependencies like CMake does.