Affine fusion legality for vector types

dragon007 · November 12, 2024, 9:28am

Hi, I was playing around with affine fusion with affine.vector_load/store operations inside loop and came across this particular case:

func.func @main(%a: memref<64x512xf32>, %b: memref<64x512xf32>, %c: memref<64x512xf32>, %d: memref<64x4096xf32>, %e: memref<64x4096xf32>) {

    affine.for %j = 0 to 8 {
        %lhs = affine.vector_load %a[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
        %rhs = affine.vector_load %b[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
        %res = arith.addf %lhs, %rhs : vector<64x64xf32>
        affine.vector_store %res, %c[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
    }

    affine.for %j = 0 to 8 {
        %lhs = affine.vector_load %c[0, 0] : memref<64x512xf32>, vector<64x512xf32>
        %rhs = affine.vector_load %d[0, %j * 512] : memref<64x4096xf32>, vector<64x512xf32>
        %res = arith.subf %lhs, %rhs : vector<64x512xf32>
        affine.vector_store %res, %d[0, %j * 512] : memref<64x4096xf32>, vector<64x512xf32>
    }

    func.return
}

Upon invoking affine-fusion on this IR with the following command:

mlir-opt --pass-pipeline='builtin.module(affine-loop-fusion)' test.mlir

I see that the loops are getting fused as follows:

 func.func @main(%arg0: memref<64x512xf32>, %arg1: memref<64x512xf32>, %arg2: memref<64x512xf32>, %arg3: memref<64x4096xf32>, %arg4: memref<64x4096xf32>) {
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() : memref<1x1xf32>
    %c0_0 = arith.constant 0 : index
    affine.for %arg5 = 0 to 8 {
      %0 = affine.vector_load %arg0[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
      %1 = affine.vector_load %arg1[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
      %2 = arith.addf %0, %1 : vector<64x64xf32>
      affine.vector_store %2, %arg2[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
      %3 = affine.vector_load %arg0[0, %c0_0 * 64] : memref<64x512xf32>, vector<64x64xf32>
      %4 = affine.vector_load %arg1[0, %c0_0 * 64] : memref<64x512xf32>, vector<64x64xf32>
      %5 = arith.addf %3, %4 : vector<64x64xf32>
      affine.vector_store %5, %alloc[0, 0] : memref<1x1xf32>, vector<64x64xf32>
      %6 = affine.vector_load %alloc[0, 0] : memref<1x1xf32>, vector<64x512xf32>
      %7 = affine.vector_load %arg3[0, %arg5 * 512] : memref<64x4096xf32>, vector<64x512xf32>
      %8 = arith.subf %6, %7 : vector<64x512xf32>
      affine.vector_store %8, %arg3[0, %arg5 * 512] : memref<64x4096xf32>, vector<64x512xf32>
    }
    return
  }

Isn’t this an invalid transformation since 2nd loop can only be executed once 1st loop completely finishes all its iterations and produces the result to be consumed in 2nd loop? Are there any specific flags to be used in affine-loop-fusion to enable analysis in the context of vector types?

Note: My LLVM source is based out on Nov 12th 2024’s commit.

bondhugula · November 12, 2024, 9:32am

But the second loop nest is only reading [0, 0] of %c IIUC. Why does it need to wait for all iterations of the 1st loop to be executed? The fusion is valid.

dragon007 · November 12, 2024, 9:55am

First loop stores vector of 64x64 into a memref of 64x512 iteratively.

affine.vector_store %res, %c[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>

whereas second loop consumes a whole vector<64x512xf32> in a single load:

%lhs = affine.vector_load %c[0, 0] : memref<64x512xf32>, vector<64x512xf32>

Hence all iterations of loop 1 needs to be executed right ?

bondhugula · November 12, 2024, 10:13am

Sorry, I wasn’t looking at the vector widths. This is indeed a bug since the fusion pass (or its underlying analysis) doesn’t look at the widths of the elements being accessed but only the subscripts of the memref. As such, such a bug would exist even between pairs of affine.store and affine.vector_load or vice versa. Can you please file an issue on Github and mark it as good starter/beginner issue? Thanks.

When the fusion pass was originally introduced, there weren’t affine.vector_load/store operations in MLIR and so this was later overlooked. A bailout in the presence of different-sized element types in the producer/consumer validity checking is a reasonable fix to start with. If it’s necessary to do fusion post such vectorization, we could consider handling more.

dragon007 · November 12, 2024, 11:04am

@bondhugula thanks for the feedback, I have created an issue in GitHub for the same : [MLIR][affine] Illegal affine loop fusion with vector types · Issue #115849 · llvm/llvm-project · GitHub

dragon007 · November 13, 2024, 3:50am

@bondhugula Further exploration shows that, transformation seems to be producing invalid IR like below where vector is being stored to memref of size <1x1xf32>

affine.vector_store %5, %alloc[0, 0] : memref<1x1xf32>, vector<64x64xf32>

%6 = affine.vector_load %alloc[0, 0] : memref<1x1xf32>, vector<64x512xf32>

I think the fix is not just legality but to make the transformations valid as well.

bondhugula · November 13, 2024, 4:25am

That’s correct - that’s a related issue. They could end up generating invalid IR due to the same oversights. One would run into a similar issue with affine-scalrep as well if it has (or has already been) extended to work with AffineRead/WriteOpInterfaces.

The fixes should be similarly straightforward - the elemental type can’t be ignored when looking at pairs of affine read/write interface ops – they could have been for affine store/load.

Topic		Replies	Views
Understanding the affine loop fusion pass MLIR	20	1614	May 10, 2023
Steps towards generalizing vectorization in Affine MLIR	14	2286	November 11, 2020
Failed to legalize operation 'affine.vector_load' MLIR	4	599	August 6, 2021
Should affine loop fusion work on the contents of another loop? MLIR	1	268	May 6, 2021
Beginner Q: Help with loops/affine/linalg MLIR	15	2171	April 29, 2021

Affine fusion legality for vector types

Related topics