MLIR Python bindings test problem

Hi all,

I implemented MLIR Python Bindings for simple inspecting Operation, Region, Block. You can find the code in my GitHub repo. So far, I have completed the following bindings:

Operation Region Block
getName getContext getParent
isRegistered getParentRegion getParentOp
getBlock getParentOp isEntryBlock
getContext getRegionNumber args_empty
getParentRegion isProperAncestor getNumArguments
getParentOp isAncestor findAncestorOpInBlock
isProperAncestor findAncestorBlockInRegion isOpOrderValid
isAncestor Iteration over the blocks in the region. verifyOpOrder
isBeforeInBlock getTerminator
dump hasNoPredecessors
getNumOperands getSinglePredecessor
getNumRegions getUniquePredecessor
getRegion getNumSuccessors
hasSuccessors getSuccessor
getNumSuccessors dump
isCommutative Iteration over the operations in the block.
isKnownTerminator
isKnownNonTerminator
isKnownIsolatedFromAbove
hasOneUse
use_empty
isUsedOutsideOfBlock

Now I’d like to send a patch to Phabricator, but before that, I think I should optimize the test part. Currently, the code of my test part is very simple; I use the Python “unittest” testing framework, and there are five test parts so far:

  • test_input
  • test_module
  • test_operation
  • test_region
  • test_block

In each part, it tests the members’ binding of the corresponding class. For example:

import unittest
import sys
import os
sys.path.append(os.environ['MLIR_LIBRARY_PATH'])
import mlir

import test_input
import test_module
import test_operation
import test_region
import test_block

class TestMLIR(unittest.TestCase):
  ......
  def test_operation(self):
    self.assertEqual(test_operation.test_getName(), 'module')
    self.assertTrue(test_operation.test_isRegistered())
    self.assertIsInstance(test_operation.test_getBlock(), mlir.Block)
    self.assertIsInstance(test_operation.test_getContext(), mlir.Context)
    ......

if __name__ == '__main__':
  unittest.main()

I’m not sure is it a good method to test, and I’d like to ask is there any better idea to test the bindings? Also, I’d like to collect further feedback and suggestion on my project.

Thanks!

Hi @zhanghb97 - thanks for working on this. I’m going to break my response in two: the first being some things I’ve learned about the testing question you have and the second being some general comments about design direction.

I’ve fiddled with a lot of different ways to test things at this level, and I’d say that the general philosophy I’ve landed on is:

  1. For this low-level boiler-plate testing (i.e. does each method in my binding trivially work, not core dump, etc), I do prefer an actual Python unit test either by way of unittest.TestCase or doctest.
  2. For anything more advanced (that actually involves generating IR), using a lit+FileCheck based test is the best way to go.
  3. Both can be integrated into lit in a fairly straight-forward way.
  4. For both, I try to keep the sometest.py file hermetic so that I can just run it to see the results, possibly piping it through FileCheck and/or opt for further validation.

Regarding point #1, a couple of observations regarding how you’ve written it:

  1. It is nice to have the data under test in the file so I can glance at it, know what it is doing and not have to manage dependencies of the test itself (which can be surprisingly hard to ensure land in the filesystem exactly where you expect them to). For your example, I would pull your example MLIR text into the python itself by using a (raw) multiline string and either setting the module you are loading as a constant or having a helper method to generate it. Again, we shouldn’t have many of these low-level boilerplate tests, and I’ve found that embedding inline makes a lot of things easier. Here is an example of such a test.

  2. I’m not quite sure what you are getting with the indirection you have to your “test_*” modules. They seem to just be encapsulating the loading of some IR and then providing accessors to it? Just load the IR in your actual test and don’t go through test-only helpers. Aside from being clearer, there is a tendency to hide details in such test-only helpers, and actually making the logic visible at the leaf test level has a way of revealing API warts in a way that allow them to get fixed.

  3. For this kind of low level testing on an IR project like this, which is all about representation, I strongly prefer doctest for the very low level tests and FileCheck for anything generative. Given that this is a native module, you may not actually want to embed your doctest annotations directly in the C++ code (which would evoke a very language-in-language feel), but having a side file that is just doctest+boilerplate can be pretty readable. Admittedly, this is a measure of personal style, but I find the unittest assert style to be quite bad for things that are simple API or representation tests. Using doctest forces a level of care on the ergonomics of the output that is often ignored in other testing setups (i.e. make sure everything has a designed string representation, that you have appropriate methods for printing/parsing, etc).

Sample doctest (for your operation tests)

# RUN: %PYTHON %s

"""
  >>> m = load_test_module()
Test basics:
  >>> m.getOperation().getName()
  "module"
  >>> m.getOperation().isRegistered()
  True
  >>> ... etc ...

Verify that repr prints (I've found this to be a good way to enforce nice hygiene in terms of stable, printable string representations for everything):
  >>> m.getOperation()
  <operation 'module'>
"""

import mlir

TEST_MLIR_ASM = r"""
func @test_operation_correct_regions() {
  "test.two_region_op"()(
    {"work"() : () -> ()},
    {"work"() : () -> ()}
  ) : () -> ()
  return
}

func @test_operation_ssa(i64, i1) -> i64 {
^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
  cond_br %cond, ^bb1, ^bb2

^bb1:
  br ^bb3(%a: i64)    // Branch passes %a as the argument

^bb2:
  %b = addi %a, %a : i64
  br ^bb3(%b: i64)    // Branch passes %b as the argument

// ^bb3 receives an argument, named %c, from predecessors
// and passes it on to bb4 twice.
^bb3(%c: i64):
  br ^bb5(%c, %c : i64, i64)

^bb4:
  br ^bb3(%b: i64)

^bb5(%d : i64, %e : i64):
  %0 = addi %d, %e : i64
  return %0 : i64
}
"""

def load_test_module():
  ctx = mlir.Context()
  ctx.allowUnregisteredDialects(True)
  # NOTE: If testing the actual *file* parsing code, I strongly prefer that the driving python test writes a tempfile and attempts to load it versus having a static file dependency on disk somewhere.
  module = ctx.parse_asm(TEST_MLIR_ASM)
  return module


if __name__ == "__main__":
  mlir.registerAllDialects()
  import doctest
  doctest.testmod()

FileCheck for generative tests

Anything that generates MLIR asm should be checked with FileCheck. It looks like most of your API so far is about reading, so this is unlikely to be helpful immediately, but I’ll drop some examples anyway. Here is an example of such a test. Note a few things about it:

  1. It passes through npcomp-opt (would be mlir-opt in the core repo), which validates and can be used for other checks.
  2. There is a function decorator that does the main work. I used to just have this decorator open coded in each test but eventually generalized it and added some fanciness for further error checking, target selection, etc. For most of its life, though, it was a simple 3 line function in every test (until I could see the pattern and generalized it).
  3. Having tried several fancy ways of collecting test functions and running them unittest style in some fashion, I eventually settled on this really simple style: the module just runs in linear order, compiling each decorated function and immediately printing it. This is really nice because at any point in the development flow, I can just execute this file interactively and get a full dump of MLIR asm, ready to be hacked on/fed to opt/etc.

Let’s say that you had the pattern a lot where you had a test function that generated a new MLIR module, and you wanted to make that testable. A simple decorator like this would do it:

# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck

def print_module(f):
  m = f()
  print("// -----")
  print("// TEST_FUNCTION:", f.__name__)
  print(m.to_asm())
  return f

# NOTE: This checks the label emitted above, which is ad-hoc. In more compiler-based flows, I usually have the name of the python function emitted into the module and check the artifact of that instead of an ad-hoc label.
# CHECK-LABEL: TEST_FUNCTION: create_my_op
@print_module
def create_my_op():
  m = mlir.Module()
  builder = mlir.OpBuilder(m)
  # CHECK: mydialect.my_operation ...
  builder.my_op()
  return m

Integrating with lit

It would need some discussion regarding where in the tree these go, but I’ve found it effective to have an isolated lit test suite for all such python-based lit tests. This could be something like mlir/binding_test/python in the main repo. In my repo, I made this test suite a top-level “pytest” directory and gave lit enough configuration to let it run things. There is a lot of magic boiler-plate involved in getting CMake+lit setup right, and it may be easiest to get it right once in the context of the main repo. At it’s core, though, the lit tests, when used in this way, just run little unix-y pipelines of commands with substitutions. You can easily write a shell script minus all of the cmake goo to invoke it in a side repo if you can’t be bothered yet to build-integrate it.

The load-bearing bits:

  • Make the %PYTHON substitution match the python executable detected at CMake configure time. (The reference is made here and there may be some further CMake goo to detect, but I suspect that the pybind11 package is already invoking that).
  • Tell lit to include py files in its test suite.
  • You want to export a PYTHONPATH env var that can find both your python sources (if any) and native libs. Note that I make a hard split between top-level packages that are pure python and backing native libraries. Nesting the latter in the former is possible but causes a lot of pain and slows down incremental development (since you likely need a build step between every python change then, since your src and built files have to live in the same physical directory).

Note that as long as lit can run it, it can live in the test suite directory. Practically, this means that whether you use unittest, doctest or a FileCheck pipeline, you put a # RUN line at the top of the file and put it in your test directory.

There are definitely some points about the above that could be open to debate. While I’ve done this for a while and think my approaches are sound, there may be other opinions when it comes to actually upstreaming these parts that we should listen to.

Hope this helps.

  • Stella

Thanks again for starting work in this area. I’ve done a review in passing of your GitHub repo and found that I had a number of points of feedback/discussion that I wanted to capture. Pretty much all of them fall in to the category of “can be worked out as we go”, and please don’t take them as blocking or critical of the work! (more in the vein of “lessons I’ve learned the hard way”… category)

I’ll separate my response into Integration Issues and Style. The first will likely be forced in your first upstream submission anyway, and the latter can be elaborated over time.

Integration Issues

Composable modules

There are likely two use cases for the MLIR python bindings:

  1. Support users who expect that an installed version of LLVM/MLIR will yield the ability to import mlir and use the API in a pure way out of the box.

  2. Downstream integrations will likely want to include parts of the API in their private namespace or specially built libraries, probably mixing it with other python native bits.

What you have is sufficient for #1 but it gets tricky with #2. As a half measure to supporting #2 without indexing heavily on it, the following design points get you most of the way there:

  • Separate the construction/populating of a py::module from PYBIND11_MODULE global constructor.

  • Introduce headers for your C+±only wrapper classes (PythonModule is the one you have now) as other related C++ modules will need to interop with it.

  • Separate any infectious global initialization into its own module/dependency that can be optionally linked (currently registerAllDialects falls into this category).

As a relatively old example (that has other issues such as not being in LLVM style), consider how we have laid out the pyiree.compiler module by providing C+±side declarations in a header and providing a SetupCommonCompilerBindings(pybind11::module) global function that is called from a binary-level PYBIND11_MODULE initializer. We actually embed this module into several different binaries that, for reasons, have to be built individually and have complicated dependencies.

There are a lot of co-related issues of shared library linkage, distribution concerns, etc that affect such things. Suffice it to say that this flexibility will both be needed and used. Also, it is well known that compilation time for all of the template meta-programming in pybind scales with the number of things you define in a translation unit. Breaking into multiple translation units can significantly aid compile times for APIs with a large surface area.

Consider submodules

This would need more discussion, but I have found that having modules that roughly correspond to the MLIR directory structure makes good semantic sense:

  • mlir.ir

  • mlir.passes (pass is a reserved word :frowning: )

  • mlir.dialect

  • mlir.execution_engine (definitely want “bulky” things like this isolated)

You’re also going to want a dedicated _init module or something where you stash various global init kind of things (such as dialect registration, etc).

I’ve gone back and forth on how fine-grained to make the correspondance. There is something nice about having an ir or core package that is just about the core constructs the IR directory: that is fairly tightly scoped and controlled. Ditto for Pass.

Leave room for a “loader”

Any non-trivial python-native project (of which a dependency on LLVM/MLIR and the related ecosystem is definitely putting this in that category) will eventually grow the need for some kind of “loader” that is responsible for actually finding/loading the native part and patching into the more “public” module namespace. This gets arbitrarily complex. To start, I would only make a small concession in this direction: separate the native modules into a namespace that is different from what you get when you run import mlir: this will let aribtrary python-goo to be inserted in the mlir/__init__.py that can grow to handle the environmental issues that arise with this kind of project. A convention that has worked for other things is to make the native part of the API be something like _mlir or even something more esoteric like _bundled_llvm10_python_mlir. The key is to create the separation point so it can be elaborated when needed and avoid nesting the native modules in the same top-level module as the python loader shim.

You can start with an mlir/__init__.py loader shim that is something like this:


from _mlir import *

Consider having very few load bearing globals

You’ll get more to this when generating new IR (versus mostly parsing/inspecting). But consider these two code fragments:


op = build_my_op()

region = mlir.Region(op)

vs


op = build_my_op()

region = op.new_region()

For tightly coupled data structures like Operation, I generally prefer the latter because:

  • It is syntactically not possible to create something that is going to access illegal memory (less error handling in the bindings, less testing, etc).

  • It reduces the global-API surface area for creating related entities. This makes it more likely that if constructing IR and I just get an Operation instance from somewhere that I know nothing about, I can just call methods on it to do what I want versus needing to reach back into the global namespace and find the right Region class.

  • It leaks fewer things that are in place for C++ convenience (i.e. default constructors to invalid instances).

Be mindful of MLIR evolution

If I read this properly, the way you have it carefully avoids “the RTTI problem” because you are directly binding only non-polymorphic classes. If any of those become polymorphic in the future, the bindings would fail to link on non-RTTI builds of LLVM. For some of these low-level parts of MLIR, I think that is probably safe, if for no other reason than that they were designed to be non-polymorphic and it would be a fairly substantial change of design principles to make them so. However, outside of this very closely held core, there are virtual methods, and no one is in general thinking “is it ok to add a virtual method?” When I’ve done similar things in the past, I’ve coded the bindings very defensively, relying on trampoline wrappers (probably in excess). I’m not sure what the right answer is, but I’m wary of such action at a distance design constraints imposed on the rest of the system by a binding layer.

Style feedback

Properties vs get*() methods

I generally favor converting trivial methods like getContext(), getName(), isEntryBlock(), etc to read-only Python properties (i.e. context). It is primarily a matter of calling def_property_readonly vs def in your binding code, and imo, makes things feel much nicer to the Python side.

repr methods

I like things that print :slight_smile: If there is a reasonable printed form, I often find it to be a significant productivity boost to wire that to the __repr__ method (and verify it with a doctest).

Snake case?

Open to debate, but just spelling things as snake_case instead of CamelCase makes it feel a lot less like a C++ mirror. I’m generally pro fairly close to 1:1 API correspondance, but this kind of spelling transformation is systematic enough that I don’t personally consider it a deviation.

Pseudo-containers

Right now iterating over blocks is something like:


region = ...

for block in region:

  pass

I have found it somewhat more readable/pythonic to make this more like:


region = ...

for block in region.blocks:

  pass

print(len(region.blocks))

print(region.blocks[0])

print(region.blocks[-1])

The way it is spelled now leaks things that are perfectly natural in STL-derived things (front, back, etc) but don’t translate well, imo.

Most of this can be dealt with by constructing a couple of “magic” container templates that provide the methods that make things appear as more native python sequences. Partial example.

This becomes more pronounced with parts of the data structure that model multiple containers (i.e. Block contains both operations and arguments). There are often a variety of quick shortcuts on the C++ side that aid conciseness in that language, but just mirroring these sugar methods vs exposing the underlying structure doesn’t necessarily translate well.

Provide one stop helpers for common things

Make Context have a parse_asm or equivalent that avoids needing to explicitly construct a SourceMgr. Also, don’t just make heavy-weight things like parsing a file be part of a constructor invocation. Prefer to anchor as many of such things on the context as possible. This will help you in one of your next steps when you have to map the DiagnosticEngine in some way that makes sense to Python (routing exceptions, etc).

It is fine to expose more of the guts of the low level parts of source parsing/printing, but one stop high level APIs to do the common things aids immensely.

Hi @stellaraccident

Thanks for your response, all your suggestions for testing and feedback for Integration & style are helpful to me. I think I need to digest them step by step. I will start to modify my code from the testing part and downstream integrations, which seems to need solutions at first, and I will try the style and discuss it during the process.

I’m not very clear about the “low-level boilerplate tests”, does it mean I shouldn’t test the methods one by one but should form some of them as a unit to test?

I use “test_*” modules are to separate the test structure into the “interface call” part and the “test return value” part. I thought this strategy can lead to a clear structure, but then I found the base.py lacks logic. This is where I want to modify, and as you suggested, the doctest will help to improve the logic in the test file.

As for the load_test_module in your sample:

def load_test_module():
  ctx = mlir.Context()
  ctx.allowUnregisteredDialects(True)
  # NOTE: If testing the actual *file* parsing code, I strongly prefer that the driving python test writes a tempfile and attempts to load it versus having a static file dependency on disk somewhere.
  module = ctx.parse_asm(TEST_MLIR_ASM)
  return module

The lifetime of the ctx is ended when the function return, which means the context lives shorter than the module. I am not sure will it cause a crash?

Ideally the ctx.parse_asm would use the py::keep_alive annotation to make sure that ctx survives module.

Thanks, I get it. I bind the parse_asm as a method of context, which actually a wrapper of PythonModule constructor:

PythonModule parse_asm(MLIRContext *ctx, std::string moduleStr) {
  return PythonModule(ctx, moduleStr); 
}

And use py::keep_alive to make sure the context alive:

py::class_<MLIRContext>(
  m, "Context", "MLIR Context")
  .def(py::init<>())
  .def("parse_asm", &parse_asm, py::keep_alive<0, 1>())
  ......