Load issues from memref<3xi1> to vector<3xi1> and data layout

I can’t understand what I’m doing wrong when using the vector_load operation from the Affine dialect.
This works fine and prints ( 10, 23, 57 ), which are the elements I store in the i32 memref:

// RUN: mlir-opt %s -lower-affine -convert-vector-to-llvm | \
// RUN: mlir-cpu-runner -e main -entry-point-result=void \
// RUN:   -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext
func @main() -> () {
    %mem = alloca() : memref<3xi32>

    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %c2 = constant 2 : index

    %c10 = constant 10 : i32
    %c23 = constant 23 : i32
    %c57 = constant 57 : i32

    store %c10, %mem[%c0] : memref<3xi32>
    store %c23, %mem[%c1] : memref<3xi32>
    store %c57, %mem[%c2] : memref<3xi32>

    %v = affine.vector_load %mem[0] : memref<3xi32>, vector<3xi32>
    vector.print %v : vector<3xi32>

    return
  }

This prints ( 0, 0, 0 ), instead of the supposed ( 0, 1, 1 ). The only difference is the element type, which is i1 instead of i32:

// RUN: mlir-opt %s -lower-affine -convert-vector-to-llvm | \
// RUN: mlir-cpu-runner -e main -entry-point-result=void \
// RUN:   -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext
func @main() -> () {
    %mem = alloca() : memref<3xi1>

    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %c2 = constant 2 : index

    %true = constant true
    %false = constant false

    store %false, %mem[%c0] : memref<3xi1>
    store %true, %mem[%c1] : memref<3xi1>
    store %true, %mem[%c2] : memref<3xi1>

    %v = affine.vector_load %mem[0] : memref<3xi1>, vector<3xi1>
    vector.print %v : vector<3xi1>

    return
  }

Is there something wrong in my code or is it a bug?

You haven’t mentioned what target you are running on.

    store %false, %mem[%c0] : memref<3xi1>
    store %true, %mem[%c1] : memref<3xi1>
    store %true, %mem[%c2] : memref<3xi1>

IR like this isn’t meant to successfully lower to LLVM for x86 – looks like the lowering is making assumptions. You can’t directly address (load/store) anything less than 8 bits.

I’m actually targetting x86. Do you have any suggestion on how to deal with this? I’m not having any similar problem if loading / storing from a memref of i1s

This is likely a variant of the data layout shortcomings of MLIR: MemRef type and data layout.

In this particular, it is likely because the minimal addressable unit of memory on x86 is 1 byte.

I speculate the following happens:

  • memref<3xi1> likely contains 3 chars, each with 1 bit set: it is likely 3 bytes long.
  • vector<3xi1> is likely of size 1 byte for the data layout of your machine.
  • %v = affine.vector_load %mem[0] : memref<3xi1>, vector<3xi1> actually only loads 1 byte and that contains only a 0-padded version of your “bit 0”.

This is one of the cases for which we need a proper data layout that @ftynse is working on.

Here is a repro in which I also print the addresses of each bit in the memref<3xi1>: https://reviews.llvm.org/D95475.

Definitely makes sense. I will try to develop another solution while the problem is being tackled. Thanks for the clarification!

You can use vector-level programming.
What you can’t do is load a vector from a scalar memref (for non-i8 multiples).