Memref-to-llvm: llvm.alloca operations emitted for memref.copy cause segfault if embedded in loop

andidr · October 10, 2022, 12:54pm

Hi,

the MemRef to LLVM conversion pass emits llvm.alloca operations when
lowering memref.copy operations. The original stack position is
never restored after the allocations, which creates an issue when the operation
is embedded into a loop with a high trip count, ultimately resulting
an a segmentation fault due to the stack growing too large.

The problem is exacerbated when the copy is performed on a memref with
a mapping resulting non-contiguous memory, since the associated
lowering path involving the invocation of memrefCopy emits even more
llvm.alloca operations.

Below is as a minimal example illustrating the issue:

#map = affine_map<(d0, d1) -> (d0 * 64 + d1 + 1056)>

module {
  func.func @main() {
    %arg0 = memref.alloc() : memref<32x64xi64>
    %arg1 = memref.alloc() : memref<16x32xi64>
    %lb = arith.constant 0 : index
    %ub = arith.constant 100000 : index
    %step = arith.constant 1 : index
    %slice = memref.subview %arg0[16,32][16,32][1,1] : memref<32x64xi64> to memref<16x32xi64, #map>

    scf.for %i = %lb to %ub step %step {
       memref.copy %slice, %arg1 : memref<16x32xi64, #map> to memref<16x32xi64>
    }

    return
  }
}

When running the code above, e.g., with mlir-cpu-runner, the
execution crashes with a segmentation fault:

$ mlir-opt --convert-memref-to-llvm --convert-scf-to-cf --convert-func-to-llvm --convert-cf-to-llvm -reconcile-unrealized-casts <file> | mlir-cpu-runner -e main -entry-point-result=void \
--shared-libs=$PWD/build/lib/libmlir_c_runner_utils.so
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./build-Release/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/usr/src/homomorphizer-master/compiler/build-Release/lib/libmlir_c_runner_utils\
.so
 #0 0x0000558358e962cf PrintStackTraceSignalHandler(void*) (./build-Release/bin/mlir-cpu-runner+0x29c2cf)
 #1 0x0000558358e93cec SignalHandler(int) (./build-Release/bin/mlir-cpu-runner+0x299cec)
 #2 0x00007f072f269730 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12730)
 #3 0x00007f072d40fa69 memrefCopy (/usr/src/homomorphizer-master/compiler/build-Release/lib/libmlir_c_runner_utils.so+0x18a69)
 #4 0x00007f072f2930ec
 #5 0x00007f072f29311d
 #6 0x000055835930185f compileAndExecute((anonymous namespace)::Options&, mlir::ModuleOp, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**) (./build-Release/bin/mlir-\
cpu-runner+0x70785f)
 #7 0x0000558359301c9c compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::ModuleOp, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig) (./build-Release/bin/m\
lir-cpu-runner+0x707c9c)
 #8 0x0000558359305902 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) (./build-Release/bin/mlir-cpu-runner+0x70b902)
 #9 0x0000558358e19a46 main (./build-Release/bin/mlir-cpu-runner+0x21fa46)
#10 0x00007f072ed3d09b __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409b)
#11 0x0000558358e7dbca _start (./build-Release/bin/mlir-cpu-runner+0x283bca)
Segmentation fault

The execution only succeeds if the trip count of the scf.for loop is
sufficiently low, e.g., by setting %ub = arith.constant 100 : index.

Is the allocation issue supposed to be fixed by applying a subsequent
optimization pass? If yes, what pass should be run?

I played around a bit with llvm.lifetime.start and
llvm.lifetime.end annotations for the stack allocations, but
couldn’t find a pass exploiting this information and optimizing the
allocations.

Thanks,
Andi

ftynse · October 10, 2022, 1:48pm

Try wrapping the loop body into an additional memref.alloca_scope operation. It lowers to a pair of llvm.stacksave/restrore intrinsics and should mitigate the problem.

andidr · October 10, 2022, 2:58pm

Thanks @ftynse, that worked! Is there anything that speaks against inserting that op (or rather the instrinsics) automatically upon the lowering of memref.copy?

If not, I’d be happy to submit a patch.

ftynse · October 10, 2022, 3:54pm

I don’t remember why the lowering allocates, so make sure the allocation isn’t necessary after the op.

andidr · October 12, 2022, 9:15am

The patch is here: D135756.

Topic		Replies	Views
Llvm17 memref op error MLIR llvm	1	180	June 29, 2023
Failure converting `scf.yield` operation MLIR	13	419	June 23, 2023
Fail to convert memref to llvm MLIR	2	329	February 28, 2022
Can't convert strided memref to LLVM MLIR	5	807	February 22, 2021
Affine-loop-fusion for non-return memref MLIR	2	279	July 14, 2021

Memref-to-llvm: llvm.alloca operations emitted for memref.copy cause segfault if embedded in loop

Related Topics