Crash when running update_core_linalg_named_ops.sh

Hi!

We’re working on a downstream fork of llvm-project and are trying to update to a more recent LLVM green commit. We have some additional ops in core_named_ops.py and we need to update LinalgNamedStructuredOps.yaml.

When trying to run update_core_linalg_named_ops.sh however, we run into the following crash:

$./bin/update_core_linalg_named_ops.sh 
Updating ops in /workspace/llvm-project/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.8/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/workspace/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/dialects/linalg/__init__.py", line 6, in <module>
    from ..._mlir_libs._mlirDialectsLinalg import *
  File "/workspace/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/_mlir_libs/__init__.py", line 106, in <module>
    _site_initialize()
  File "/workspace/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/_mlir_libs/__init__.py", line 56, in _site_initialize
    from ._mlir import ir
ImportError: /workspace/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/_mlir_libs/libMLIRPythonCAPI.so.16git: symbol _ZN4llvm15SmallVectorBaseIjE13mallocForGrowEmmRm version LLVM_16 not defined in file libLLVM-16git.so with link time reference

Can anybody provide us with more information on how we could resolve this error?

That last line looks suspect to me. It does indicate that you are using dynamic linking and looks to be like a different shared library is being found at runtime vs compile time. I personally haven’t used this tool with dylib linking enabled – can you confirm your cmake options? Is there anywhere what on your system that a different version of that libLLVM library could be coming from?

While not a fix, I am pretty confident that building statically would work.

We are indeed building with dynamic libraries / dynamic linking. Here are the CMake flags we use:

cmake -GNinja
      -DLLVM_ENABLE_PROJECTS=mlir
      -DLLVM_TARGETS_TO_BUILD='host'
      -DLLVM_ENABLE_ASSERTIONS=ON
      -DLLVM_ENABLE_TERMINFO=OFF
      -DLLVM_BUILD_LLVM_DYLIB=ON
      -DLLVM_LINK_LLVM_DYLIB=ON
      -DMLIR_LINK_MLIR_DYLIB=ON
      -DLLVM_BUILD_TEST=OFF
      -DLLVM_ENABLE_EH=ON
      -DLLVM_ENABLE_RTTI=ON
      -DLLVM_INCLUDE_TESTS=OFF
      -DMLIR_INCLUDE_TESTS=ON
      -DMLIR_ENABLE_BINDINGS_PYTHON=ON

Not sure where the other libLLVM library could be coming from. It might be coming from our clang install, as we use clang-15 for building.

I set up a fresh worktree outside of our downstream project and used the following CMake flags to link statically to update the yaml file:

cmake -GNinja
    -DLLVM_ENABLE_PROJECTS=mlir
    -DLLVM_TARGETS_TO_BUILD='host'
    -DLLVM_ENABLE_ASSERTIONS=ON
    -DLLVM_ENABLE_TERMINFO=OFF
    -DLLVM_BUILD_TEST=OFF
    -DLLVM_ENABLE_EH=ON
    -DLLVM_ENABLE_RTTI=ON
    -DLLVM_INCLUDE_TESTS=OFF
    -DMLIR_INCLUDE_TESTS=ON
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON

Still doesn’t work, but at least I hit a different error now:

$ ./bin/update_core_linalg_named_ops.sh 
Updating ops in /workspace/llvm-project/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.8/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/workspace/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/dialects/linalg/__init__.py", line 6, in <module>
    from ..._mlir_libs._mlirDialectsLinalg import *
  File "/workspace/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/_mlir_libs/__init__.py", line 106, in <module>
    _site_initialize()
  File "/workspace/llvm-project/build/tools/mlir/python_packages/mlir_core/mlir/_mlir_libs/__init__.py", line 56, in _site_initialize
    from ._mlir import ir
ImportError: cannot import name 'ir' from 'mlir._mlir_libs._mlir' (unknown location)

I have python bindings enabled, as you can see. And I also checked the build/tools/mlir/python_packages/mlir_core/mlir/_mlir_libs/_mlir folder. ir.pyi does exist there. So I really don’t know why it cannot find the package. Modifying the script to not set its own PYTHONPATH and setting it manually instead also doesn’t work.

I’ll try to reproduce. Can you confirm that unit tests pass? (ninja check-mlir) those run extennsive tests of the python bindings and I’m trying to figure out if we are fighting a general issue or a tool issue.

That’s the thing: I cannot even build yet, until the yaml file is updated. So maybe I’m in some chicken and egg situation, where I need the file to get updated before I can build, but I need to build before I can update the file.

Ok - that is the problem. I know it is awkward but you have to be able to do at least a minimal build in order to have the tooling to update the file. There is a target that builds just what is needed but I can’t find it now (power is out here just this minute). There are a variety of ways you can do this if you have changes that can’t build (typically, ops are updated in a pristine work tree).

I see, that makes sense. Our problem basically boils down to the op interface having changed from things like foo_bar() to getFooBar(). So if I set the dialect to generate both the new and old form, I should be able to build, update the yaml file, and then set the dialect to only generate the new form again.

Iiwu, I would make a stash with your changes to the op definition, switch to head, apply the stash, build/update, stash and such back to your branch to get unstuck.

First of all, building main, switching back to our branch and then updating the file, did work like a charm. Thanks for the suggestion!

Unfortunately, it didn’t have the desired effect we were hoping for. There seems to be a bug in the code generator from yaml with regards to underscores in attribute names. The custom ops we have in core_named_ops.py have some attributes with underscores in them, e.g. kernel_size. The code that is generated from yaml turns this into getKernel_size, which doesn’t exist of course.

I can either update the naming of the attributes to use camel case, like kernelSize or I can update the yaml file manually until the bug is fixed (if it is indeed a bug).

What do you reckon? Are underscores in core_named_ops.py allowed and this is a bug, or should we not use underscores for attribute names in that python file?

I haven’t been following the whole camel casing saga closely enough to advise you concretely, but it sounds like the code generator needs to be updated – so a bug.

Should I file a bug somewhere?

Yes - we use GitHub issues on the repo. Not sure who it should be assigned to but should definitely be filed.

I filed [Linalg] Incorrect code generated when using underscore in attribute names in core_named_ops.py · Issue #58619 · llvm/llvm-project (github.com) for now