Missing build cmake tblgen dependency?

I am trying to add a Flang buildbot using msvc in production (Buildbot). One of the conditions is that the buildbot is stable.

I noticed two builds that failed after changes to LinalgNamedStructuredOps.yaml and then got “resolved” by unrelated commits.

Is it possible that a there is a target-level dependency missing?

The buildbot is currently in staging, but if it is going to production, such failures will send a spurious blame email.

A good way to find missing dependencies is to delete the build directory to start fresh, and build only the target that broke non-deterministically ninja tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalg.dir/LinalgOps.cpp.obj

I can reproduce the error on Linux system when going from 25bb61649085c0a6e66630bbffe7faa54cd67829^ to 25bb61649085c0a6e66630bbffe7faa54cd67829:

$ ninja obj.MLIRLinalg
[100%/8.022s :: 0->1->4 (of 4)] Building CXX object tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalg.dir/LinalgOps.cpp.o
FAILED: tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalg.dir/LinalgOps.cpp.o
/soft/compilers/gcc/7.4.0/linux-rhel7-x86_64/bin/g++  -DGTEST_HAS_RTTI=0 -DMLIR_CUDA_CONVERSIONS_ENABLED=1 -DMLIR_ROCM_CONVERSIONS_ENABLED=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/mlir/lib/Dialect/Linalg/IR -I/home/meinersbur/src/llvm-project/mlir/lib/Dialect/Linalg/IR -Iinclude -I/home/meinersbur/src/llvm-project/llvm/include -I/home/meinersbur/src/llvm-project/mlir/include -Itools/mlir/include -fmax-errors=1 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalg.dir/LinalgOps.cpp.o -MF tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalg.dir/LinalgOps.cpp.o.d -o tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalg.dir/LinalgOps.cpp.o -c /home/meinersbur/src/llvm-project/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
In file included from /home/meinersbur/src/llvm-project/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp:2674:0:
tools/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yamlgen.cpp.inc:348:11: error: ‘DepthwiseConv2DInputNhwcFilterHwcPolyOp’ has not been declared
 ArrayAttr DepthwiseConv2DInputNhwcFilterHwcPolyOp::iterator_types() {
compilation terminated due to -fmax-errors=1.
ninja: build stopped: subcommand failed.

but not from a clean build.

This might be due to a missing file-level dependency. From the cmake documentation:

This target-level dependency does NOT add a file-level dependency that would cause the custom command to re-run whenever the executable is recompiled.

This is a bit strange, I would expect a missing include file instead of this error. If you can get the build directory in this state, I’m interested if you can look up tools/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yamlgen.cpp.inc and tools/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.h.inc.

Because the DepthwiseConv2DInputNhwcFilterHwcPolyOp should be declared through:

#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.h.inc"

in mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.h, which is included in mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp.

Something else may be going on with the generator.

This is very strange and seems like the kind of thing that triggers when there is a multi level tool dependency declared incorrectly in cmake, in combination with an incremental rebuild (it is easy to misconfigure cmake to not cause dependent-of-dependent rules to trigger on change).

I’m not familiar with buildbot setups: do they reuse the build directory in some way?

I suspect that I wrote the relevant CMake rules. It’s late here now, but I can look into it tomorrow. Thanks for the repro instructions.

Oh I didn’t understand originally that it was about an incremental build: this seems a bit fragile to me to setup a bot this way. CMake isn’t bulletproof on incremental builds, in particular across revisions (there could be stale generated file left, and they won’t be cleaned up, but can be looked up by header includes).

I spend some time on this today and it seems the problem is indeed that cmake requires also a file dependency and not just a target dependency once you have dependent custom_commands (we run yaml-gen and then tablegen).

Section five in the following blog explains the problem:

The second custom_command is well hidden in LLVM in our case. I implemented the following hack which seems to work:

  1. Update llvm/cmake/modules/TableGen.cmake to add LINALG_DEPS to the dependencies:
     DEPENDS ${${project}_TABLEGEN_TARGET} ${${project}_TABLEGEN_EXE}
       ${local_tds} ${global_tds}
     COMMENT "Building ${ofn}..."
  1. Update mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt to set the dependencies:
 add_dependencies(LinalgOpsDocGen LinalgOdsGen)

+set(LINALG_DEPS MLIRLinalgNamedStructuredOpsYamlIncGen ${CMAKE_CURRENT_BINARY_DIR}/LinalgNamedStructuredOps.yamlgen.td) 

 set(LLVM_TARGET_DEFINITIONS LinalgStructuredOps.td)
 mlir_tablegen(LinalgStructuredOps.h.inc -gen-op-decls)
 mlir_tablegen(LinalgStructuredOps.cpp.inc -gen-op-defs)

I wonder if there is a less intrusive way than touching TableGen.cmake?

Buildbots can be configured either way; clean before every build or not. Build is always cleaned if any CMakeLists.txt is changed, so there should be no problem with stale files.

I setup my buildbots always with incremental builds. This reduces the typical time from more than an hour to minutes. This means it does not have to coalesce as many commits, you get faster responses, any honestly always recompiling everything feels like a waste of resources. It also helps identifying problems like this.

IMO: ccache is just more principled and robust from this point of view, and very efficient. This is what we use for Buildbot and the build is frequently <5 min.

ccache does not support msvc.

Independently of how buildbots should be configured, don’t you agree that incremental build should work as well? That’s what every developer is using and occasional failures will cost a lot of developer time.

Yes, within the limits of the tools. I typically blow away my cmake build dir and start over in O(month). It is not always glitch free – I suspect that people are just running ninja again when this issue happens for them locally. Thanks for raising it as a real concern.

@gysit has sent out a fix for review: ⚙ D105272 [CMake][MLIR][Linalg] Adding variable to specify tablegen file dependencies.

Right, I’m building incrementally all the time, but CMake is intrinsically limited in terms of incremental builds: you should just keep these limitations in mind, because there isn’t much we can do about it. I guess bots may not break frequently because we don’t modify the CMake structure too often: I just don’t want to be on the receive end of debugging such issues when they occurs :slight_smile:

And of course we should fix bugs when we find them, thanks @gysit :slight_smile: