API changes on tiling routines

Hello all,

One week ago I was told to update and recompile LLVM/MLIR due to changes in the tiling interface. I did so. The main change was the introduction of mlir::tileCodeGen instead of (and with different interface from) mlir::tile. Suddenly, my code no longer compiled (and required significant changes).

Today I modified my code so that it compiled and worked on simple cases, but discovered there were some issues (weird heisenbugs, possibly memory-related).

In order to file a bug report, I needed to be on the last version. So I updated again (git pull). My code does not compile (again) as the functions that appeared last week (mlir::tileCodeGen) are now gone. There are also changes in libraries, as, for instance, library libMLIRAffine.a no longer exists…

My questions is simple: How to track rather major API changes, such as these ones, and separate them from minor bugfixes? I really don’t want to recompile LLVM/MLIR every morning, nor have to refactor my code and change my Makefile (to add/remove libraries) every time I do so.

How do you do it?

Best regards,
Dumitru

PS: I assume the replacement of tileCodeGen is tilePerfectlyNested.

Yes, that’s right. It was a rename.

mlir::tileCodeGen had already been there - it predates mlir::tile.

1 Like

For our large/supported projects, my team integrates with upstream ~daily and triages all breakages as they happen. For my smaller/personal projects, I find myself merging from head ~weekly and rolling up fixes. Bisecting across breaking changes is going to be painful no matter what, especially if there is a large gap of incompatibility.

I suspect it is too early still in the project’s life cycle to expect something like major version API compatibility. Curious what others think regarding the out of the tax.

2 Likes

I’m continuing the exchange here, in order to avoid polluting the forum.
I get an error that looks like a memory-related heisenbug. The bug is related to the following tiling code (the matrices are all 2048x2048xf64):

	  {
	    // Tiling for the 3 loops:
	    // For I
#define M_C    64
#define M_R     4
#define N_C  2048
#define N_R     8
#define K_C   256
#define K_U     4
	  
	    // BLIS-like loop tiling, performed in several phases
	    {
	      // Phase 1: tile all 3 initial loops
	      SmallVector<AffineForOp,8> band ;
	      band.push_back(outer_loop) ;
	      band.push_back(middle_loop) ;
	      band.push_back(inner_loop) ;	      
	      SmallVector<unsigned,8> tile_sizes ;
	      tile_sizes.push_back(M_C) ;
	      tile_sizes.push_back(N_C) ;
	      tile_sizes.push_back(K_C) ;
	      
	      SmallVector<AffineForOp, 8> tiled_nest ;
	      assert(succeeded(tilePerfectlyNested(band,tile_sizes,&tiled_nest))
	      	     && "Outer tiling failed.\n") ;
	      
	      {
		// Phase 2: tile the iR and jR loops
		SmallVector<AffineForOp,8> band ;
		band.push_back(tiled_nest[3]) ;
		band.push_back(tiled_nest[4]) ;
		SmallVector<unsigned,8> tile_sizes ;
		tile_sizes.push_back(M_R) ;
		tile_sizes.push_back(N_R) ;
		SmallVector<AffineForOp, 8> tiled_nest2 ;
		assert(succeeded(tilePerfectlyNested(band,tile_sizes,&tiled_nest2))
		       && "Inner tiling failed.\n") ;
	      }
	    }
	  }

If the “Phase 2” inner loop is commented, the code does as expected, with no error. It produces 6 nested loops.

If the “Phase 2” inner loop is not comented, execution of the code results in a segmentation fault:

PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ./main --lower-mydialect examples/abc2.mlir 
1.	0  main                     0x000000010d11c5d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  main                     0x000000010d11b348 llvm::sys::RunSignalHandlers() + 248
2  main                     0x000000010d11cbfd SignalHandler(int) + 285
3  libsystem_platform.dylib 0x00007fff69cfe42d _sigtramp + 29
4  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603338740931568
5  main                     0x000000010cf48e98 mlir::detail::ConversionPatternRewriterImpl::isOpIgnored(mlir::Operation*) const + 120
6  main                     0x000000010cf53f3d (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) + 1405
7  main                     0x000000010cf551c9 (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) + 6153
8  main                     0x000000010cf4eded (anonymous namespace)::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>, mlir::TypeConverter*) + 1389
9  main                     0x000000010cf4d078 mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget&, mlir::OwningRewritePatternList const&, mlir::TypeConverter*) + 72
10 main                     0x000000010cf4f4e3 mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget&, mlir::OwningRewritePatternList const&, mlir::TypeConverter*) + 35
11 main                     0x000000010cc099e6 mlir::mydialect::MyDialectToAffineLoweringPass::runOnFunction() + 678
12 main                     0x000000010cc0954b mlir::FunctionPass::runOnOperation() + 155
13 main                     0x000000010ce708cf mlir::Pass::run(mlir::Operation*, mlir::AnalysisManager) + 239
14 main                     0x000000010ce789d4 mlir::detail::OpToOpPassAdaptorParallel::runOnOperation()::$_7 std::__1::for_each<llvm::SmallVector<mlir::OpPassManager, 1u>*, mlir::detail::OpToOpPassAdaptorParallel::runOnOperation()::$_7>(llvm::SmallVector<mlir::OpPassManager, 1u>*, llvm::SmallVector<mlir::OpPassManager, 1u>*, mlir::detail::OpToOpPassAdaptorParallel::runOnOperation()::$_7) + 436
15 main                     0x000000010ce73815 mlir::detail::OpToOpPassAdaptorParallel::runOnOperation() + 1717
16 main                     0x000000010ce708cf mlir::Pass::run(mlir::Operation*, mlir::AnalysisManager) + 239
17 main                     0x000000010ce73c51 mlir::PassManager::run(mlir::ModuleOp) + 193
18 main                     0x000000010cc021a0 main + 1616
19 libdyld.dylib            0x00007fff69b057fd start + 1
20 libdyld.dylib            0x0000000000000003 start + 18446603338742999047
Segmentation fault: 11

I don’t really know how to debug this code, for several reasons:

  • The trace does not include the function containing the code above (this function is MatmulOpLowering::matchAndRewrite). I don’t understand how this is possible - does the segmentation fault occur in a verification phase after the actual transformation?
  • I can’t find how to print individual statements from a function. From AsmPrinter.cpp, few functions are exported, and they only seem to cover the printing of modules, not of operations.

I suspect there’s a memory handling bug somewhere, but I can’t even be sure if it’s in my code, or in MLIR.

Please help.

BTW: it’s the first time I have memory issues. Tiling with mlir::tile used to work (but then produced iterations with max/min on bounds, which seems to interfere with unrolling).

Please consider using a debug build to help with these - the crash stack trace will have line number info. Try dumping the nest right after your tiling calls. If that doesn’t provide a clue, using valgrind could tell you where things went wrong first.

@bondhugula It seems there’s something wrong with the tiling algorithm. I applied what you said. The code is now the following:

{
  llvm::raw_fd_ostream err(2,false) ;

  // BLIS-like loop tiling, performed in several phases
  {
    // Phase 1: tile all 3 initial loops
    SmallVector<AffineForOp,3> band ;
    band.push_back(outer_loop) ;
    band.push_back(middle_loop) ;
    band.push_back(inner_loop) ;	      
    SmallVector<unsigned,3> tile_sizes ;
    tile_sizes.push_back(2) ; //M_C) ;
    tile_sizes.push_back(4) ; //N_C) ;
    tile_sizes.push_back(8) ; //K_C) ;
    
    SmallVector<AffineForOp, 8> tiled_nest ;
    assert(succeeded(tilePerfectlyNested(band,tile_sizes,&tiled_nest))
    	     && "Outer tiling failed.\n") ;

    err << "\nLoop nest after first tiling phase\n" ;
    tiled_nest[0].getOperation()->print(err,llvm::None) ;
    
    {
  // Phase 2: tile the iR and jR loops
  SmallVector<AffineForOp,2> band ;
  band.push_back(tiled_nest[3]) ;
  band.push_back(tiled_nest[4]) ;
  SmallVector<unsigned,2> tile_sizes ;
  tile_sizes.push_back(16) ; //M_R) ;
  tile_sizes.push_back(32) ; // N_R) ;
  SmallVector<AffineForOp, 8> tiled_nest2 ;
  assert(succeeded(tilePerfectlyNested(band,tile_sizes,&tiled_nest2))
         && "Inner tiling failed.\n") ;

  err << "\n\nLoop nest after second tiling phase:\n" ;
  tiled_nest[0].getOperation()->print(err,llvm::None) ;
  err << "\n\n" ;	
    }	      
  }
}

Prior to the error message, this code prints the following:

Loop nest after first tiling phase
    affine.for %arg0 = 0 to 2048 step 2 {
      affine.for %arg1 = 0 to 2048 step 4 {
        affine.for %arg2 = 0 to 2048 step 8 {
          affine.for %arg3 = affine_map<(d0) -> (d0)>(%arg0) to affine_map<(d0) -> (d0 + 2)>(%arg0) {
            affine.for %arg4 = affine_map<(d0) -> (d0)>(%arg1) to affine_map<(d0) -> (d0 + 4)>(%arg1) {
              affine.for %arg5 = affine_map<(d0) -> (d0)>(%arg2) to affine_map<(d0) -> (d0 + 8)>(%arg2) {
                %6 = affine.load %1[%arg3, %arg5] : memref<2048x2048xf64>
                %7 = affine.load %2[%arg5, %arg4] : memref<2048x2048xf64>
                %8 = affine.load %0[%arg3, %arg4] : memref<2048x2048xf64>
                %9 = mulf %6, %7 : f64
                %10 = addf %8, %9 : f64
                affine.store %10, %0[%arg3, %arg4] : memref<2048x2048xf64>
              }
            }
          }
        }
      }
    }

Loop nest after second tiling phase:
affine.for %arg0 = 0 to 2048 step 2 {
  affine.for %arg1 = 0 to 2048 step 4 {
    affine.for %arg2 = 0 to 2048 step 8 {
      affine.for %arg3 = affine_map<(d0) -> (d0)>(%arg0) to affine_map<(d0) -> (d0 + 2)>(%arg0) step 16 {
        affine.for %arg4 = affine_map<(d0) -> (d0)>(%arg1) to affine_map<(d0) -> (d0 + 4)>(%arg1) step 32 {
          affine.for %arg5 = affine_map<(d0) -> (d0)>(%arg3) to 2 {
            affine.for %arg6 = affine_map<(d0) -> (d0)>(%arg4) to 4 {
              affine.for %arg7 = affine_map<(d0) -> (d0)>(%arg2) to affine_map<(d0) -> (d0 + 8)>(%arg2) {
                %6 = affine.load %1[%arg5, %arg7] : memref<2048x2048xf64>
                %7 = affine.load %2[%arg7, %arg6] : memref<2048x2048xf64>
                %8 = affine.load %0[%arg5, %arg6] : memref<2048x2048xf64>
                %9 = mulf %6, %7 : f64
                %10 = addf %8, %9 : f64
                affine.store %10, %0[%arg5, %arg6] : memref<2048x2048xf64>
              }
            }
          }
        }
      }
    }
  }
}

The first loop nest is correct. The second seems to be wrong - the loops with iteration variables %arg5 and %arg6 have incorrect upper bounds, which are fixed (not affine maps).

Let’s get the crash that you reported out of the way first. From your output, it looks like it does produce a valid loop nest - the print doesn’t fail. (You could run verify after the calls to tiling utility to verify). So the crash you reported is from something else in your local copy later?

That output does look wrong - could you please file a bug? This should be reproducible with a single test case (just including your 4th and 5th loops). Thanks.

Thank you again for reporting. This patch fixes it: https://reviews.llvm.org/D78505

1 Like

Btw, you can just dump with tiled_nest[0].dump().

Thanks for helping.

  1. The patch works - the output is no longer obviously flawed (but I did not check it thoroughly).

  2. The other problem is a nasty memory problem that remains active. The funny aspect is that every now and then, when I make a major modification, the code will run just once, and then revert to crashing. My assumption is that it’s about a non-initialized pointer. If allocated memory is 0 (NULL) then the code runs just fine. If not, it’s crashing.

The crash is outside my code, so I have to instrument MLIR itself… Now I’m trying to compile with the Debug option.

ADDENDUM: same behavior inside lldb - the first run goes untroubled, the second crashes. Funny behavior (on MacOSX Catalina).

@bondhugula Hello again. There is either something obviously incorrect in my manipulations, or a nasty memory bug in MLIR. Please take a look at my code, which I have simplified. This is the lowering routine for the only op of the dialect. My intuition is that either I have to do something more after tiling, or that the tiling routine fails to set up the parent fields correctly for the tiled loops (or that a table access with incorrect size changes smth at the wrong place in memory).

>       LogicalResult
>       MatmulOpLowering::matchAndRewrite(Operation *op,
> 		      ArrayRef<Value> operands,
> 		      ConversionPatternRewriter &rewriter) const final {
> 	//
> 	llvm::raw_fd_ostream err(2,false) ;
> 	err << "\nSTART MatmulOpLowering::matchAndRewrite on operation:\n" ;
> 	op->print(err,llvm::None) ;
> 	
> 	OpResult result = op->getResult(0) ;
> 	MemRefType memRefType = result.getType().cast<MemRefType>();
> 	Location loc = op->getLoc();
> 	
> 	// The allocated matrix where the result will be placed.
> 	Value alloc = insertAlloc(memRefType, loc, rewriter);	
> 	
> 	// Create an empty affine loop for each of the dimensions within the shape.
> 	AffineForOp outer_loop =
> 	  rewriter.create<AffineForOp>(loc,
> 				       0, // lower bound
> 				       memRefType.getShape()[0], // upper bound on dim. 0
> 				       1  // step
> 				       ) ;
> 	outer_loop.getBody()->clear();
> 	// Terminate the loop body and update the rewriter insertion point to the
> 	// beginning of the loop.
> 	rewriter.setInsertionPointToStart(outer_loop.getBody());
> 	rewriter.create<AffineTerminatorOp>(loc);
> 	rewriter.setInsertionPointToStart(outer_loop.getBody());
> 	
> 	// Create the context (adaptor) in which the previous code
> 	// is executed. I only need it now because I need the
> 	// input value.
> 	mydialect::MatmulOpOperandAdaptor matmulAdaptor(operands);
> 	
> 	// Replace the original upper dialect operation with the first operation
> 	// generated here.
> 	// The code upto here works OK - if I return here everything works.
> 	rewriter.replaceOp(op, alloc);	
> 
> 	{	  
> 	  // Tiling - which does not work!
> 	  SmallVector<AffineForOp,1> band ;
> 	  band.push_back(outer_loop) ;
> 	  SmallVector<unsigned,1> tile_sizes ;
> 	  tile_sizes.push_back(64) ;	    
> 	  SmallVector<AffineForOp, 8> tiled_nest ;
> 	  assert(succeeded(tilePerfectlyNested(band,tile_sizes,&tiled_nest))
> 		 && "Outer tiling failed.\n") ;
> 	  
> 	  err << "\nLoop nest after first tiling phase\n" ;
> 	  tiled_nest[0].getOperation()->print(err,llvm::None) ;
> 	}	
> 
> 	err << "END MatmulOpLowering::matchAndRewrite on operation:\n" ;
> 	return success();
>       }

Tiling poses now problems even for a single loop, but this may be due to the fact that I added lots of prints inside two MLIR files. No bugs occur if the tiling routine is commented out, or if I use the traditional mlir::tile interface (which has other problems we discussed about).

When I print out stuff and use lldb, I can see that the memory access error happens inside Operation::print in the line:

Operation *parentOp = printedOp->getParentOp();

Should you want to try, I can send the full set of source (largely simplified).
Unfortunately, I lack the skill to use lldb efficiently (my experience with gdb helps only partially, as printing seems to be far lower-level with lldb).

Please advise.

Please call verify() on the function right after you do the tiling and check if the verification passes. Also dump the entire function at that point and exit at that point (with say an assert(false)). Then, can you check if you see any of the memory sanity issues from the print that you mentioned above?

Why are you doing this? The body of outer_loop after its creation is already an empty one (with just the affine terminator). Just use outer_loop.getBodyBuilder() to get a builder to start adding things to it.

@bondhugula Ok, thanks. My code is so verbose because I was following the examples provided with MLIR. Function getBodyBuilder is not used there (only the sequence I used above). I still don’t know how to use it.

However, by removing all the calls except rewriter.setInsertionPointToStart(outer_loop.getBody()); I seem to get the effect I want, and my small program does not crash.

Is there some documentation on all these functions?

Also - why the old code gives such errors? Is it simply verbose, or incorrect?

Actually, since you are inside a rewrite method, you’ll just have to use the rewriter. You can’t/shouldn’t create a builder using getBodyBuilder() - sorry.

Looking at it now, your code crashed because of the ->clear(). You can’t erase ops inside a rewrite rule using erase() or clear() — it has to be done only via the rewriter, either via its eraseOp or replaceOp. The documentation on rewrite patterns should have made this clear - @River707 is working on updating that documentation I believe and has a pending revision. I tend to agree that a number of users are going to have a bad debugging/dev experience (and many already have) because these were never documented here. I had a post in Sep 2019 detailing the issues due to the missing doc, and had an experience similar to yours here. Erasing ops the wrong way from a rewrite pattern, either accidentally or unknowingly (of the rewrite rule model) is a common source of a really painful debugging experience (esp. for new users). Unless one is aware of the rewrite rule model, it’s actually impossible to say what went wrong. (The greedy rewrite driver would wind up with a worklist wherein some of the ops were erased which it doesn’t know about, and processing those would lead to all kinds of IR and memory corruption. It’d be extremely hard for a user to track what’s going wrong because it’s a iterating worklist-driven approach.)

All MLIR functions are documented: MLIR: MLIR

2 Likes

@bondhugula

  1. Taking into account your remarks helped, but not with tiling, which still crashes regularly (but not in all configurations).
  2. I’m pretty certain now that I’m on a memory error which is unfortunately a heisenbug. I activated AddressSanitizer and my tiling troubles are gone - furthermore, no memory access error is reported. However, the same code will crash as soon as I compile without AddressSanitizer.
  3. This bug is related to the new tiling routines. The old ones give an suboptimal result (with max/min) but there is no memory issue.
  4. The problem may be related to the fact that inside a lowering routine I do the whole synthesis and optimization process - create the initial loop nest with 3 levels, then tile it twice. It may just be that performing the lowering and tiling in several passes would work (it’s really the traversal of createdOps inside the rewriter that raises the memory error), as one item seems to be corrupted.
  5. In any case, I don’t understand well enough the mechanics of MLIR to debug this. I can help, though - I have a small example.
  6. Should I report a bug? I can provide the faulty example, hoping it crashes in other installations, too.

This obviously is the problem as I explained above. You can’t erase things outside of the rewriter and the tiling utility erases the original loop nest. I’m not sure what is there to debug here - the pattern matching rewriter can’t be used that way - by design. This is not a bug. The other utility may not be erasing but just doing an in-place move like permuteLoops.

My code no longer erases anything itself. Here it is:

>       LogicalResult
>       matchAndRewrite(Operation *op,
> 		      ArrayRef<Value> operands,
> 		      ConversionPatternRewriter &rewriter) const final {
> 	//
> 	llvm::raw_fd_ostream err(2,false) ;
> 	err << "\nSTART MatmulOpLowering::matchAndRewrite on operation:\n" ;
> 	op->print(err,llvm::None) ;
> 
> 	Value operand0 = op->getOperand(0) ;
> 	MemRefType operand0_type = operand0.getType().cast<MemRefType>() ;
> 	OpResult result = op->getResult(0) ;
> 	MemRefType memRefType = result.getType().cast<MemRefType>();
> 	Location loc = op->getLoc();
> 	
> 	// The allocated matrix where the result will be placed.
> 	Value alloc = insertAlloc(memRefType, loc, rewriter);	
> 	
> 	// Create an empty affine loop for each of the dimensions within the shape.
> 	AffineForOp outer_loop =
> 	  rewriter.create<AffineForOp>(loc,
> 				       0, // lower bound
> 				       memRefType.getShape()[0], // upper bound on dim. 0
> 				       1  // step
> 				       ) ;
> 	rewriter.setInsertionPointToStart(outer_loop.getBody());
> 	AffineForOp middle_loop =
> 	  rewriter.create<AffineForOp>(loc,
> 				       0, // lower bound
> 				       memRefType.getShape()[0], // upper bound on dim. 0
> 				       1  // step
> 				       ) ;
> 	rewriter.setInsertionPointToStart(middle_loop.getBody());
> 	AffineForOp inner_loop =
> 	  rewriter.create<AffineForOp>(loc,
> 				       0, // lower bound
> 				       operand0_type.getShape()[1], // upper bound on dim. 0
> 				       1  // step
> 				       ) ;
> 	rewriter.setInsertionPointToStart(inner_loop.getBody());
> 
> 	rewriter.replaceOp(op, alloc);
> 
> 	{	  
> 	  // Tiling - which does not work!
> 	  SmallVector<AffineForOp,3> band ;
> 	  band.push_back(outer_loop) ;
> 	  band.push_back(middle_loop) ;
> 	  band.push_back(inner_loop) ;
> 	  SmallVector<unsigned,3> tile_sizes ;
> 	  tile_sizes.push_back(64) ;	    
> 	  tile_sizes.push_back(64) ;	    
> 	  tile_sizes.push_back(64) ;	    
> 	  SmallVector<AffineForOp, 8> tiled_nest ;
> 	  assert(succeeded(tilePerfectlyNested(band,tile_sizes,&tiled_nest))
> 		 && "Outer tiling failed.\n") ;
> 	  
> 	  err << "\nLoop nest after first tiling phase\n" ;
> 	  tiled_nest[0].getOperation()->print(err,llvm::None) ;
> 
> 	}
> 
> 	err << "END MatmulOpLowering::matchAndRewrite on operation:\n" ;
> 	return success();
>       }

What is funny is that working with only 2 loops in the nest (e.g. outer and middle) leads to no error…

The utility you are calling is erasing loops.