[RFC] Distinguish between data and non-data in FIR Alias Analysis

As we are facing enhancement difficulties in distinguishing between the address of a box versus the address it wraps (its data) (PR # 87723 ) , I would like propose a more formal modeling for data and non-data in the FIR Alias Analysis. This should help in clearing current ambiguities, it will help cleaning up our tests and also simplify the code base.

To recap the issue, it is possible, while following the source of a memory reference through the use-def chain, to arrive at the same origin, even though the starting points were known to not alias.
Example

fir.global @_QMtopEa : !fir.box<!fir.ptr<!fir.array<?xf32>>> 

func.func @_QPtest() {
  %c1 = arith.constant 1 : index
  %cst = arith.constant 1.000000e+00 : f32
  %0 = fir.address_of(@_QMtopEa) : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
  %1 = fir.declare %0 {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QMtopEa"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
  %2 = fir.load %1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
  ...
  %5 = fir.array_coor %2 %c1 : (!fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.shift<1>, index) -> !fir.ref<f32>
  fir.store %cst to %5 : !fir.ref<f32>
  return
}

With high level operations, such as fir.array_coor, it is possible to reach into the data wrapped by the box (the descriptor) therefore when asking about the memory source of the %5, we are really asking about the source of the data of box %2.

When asking about the source of %0 which is the address of the box, we reach the same source as in the first case: the global @_QMtopEa. Yet one source refers to the data while the other refers to the address of the box itself.

Currently, to distinguish between the 2, a new source kind was introduced: SourceKind::Direct. This is leading to issues when handling box function arguments which can be passed by value or by reference as we would like to retain that they are of SourceKind::Argument.

I propose that we encode in the fir::AliasAnalysis::Source both the MLIR object and a flag indicating whether this was from data or box reference. As hinted, data would be defined as any memory reference that is not a box reference. Additionally, because it is relied on in HLFIR lowering, we allow querying on a box SSA value, which is interpreted as querying on its data.

So in the above example, !fir.ref<f32> and !fir.box<!fir.ptr<!fir.array<?xf32>>> is data, while !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>> is not data.

This generally applies to function arguments. In the example below, %arg0 is data, %arg1 is not data but a load of %arg1 is.

func.func @_QFPtest2(%arg0: !fir.ref<f32>, %arg1: !fir.ref<!fir.box<!fir.ptr<f32>>> )  {
  %0 = fir.load %arg1 : !fir.ref<!fir.box<!fir.ptr<f32>>>
...
}

The proposed changes can be seen in [flang] AliasAnalysis: More formally define and distinguish between data and non-data by Renaud-K · Pull Request #91020 · llvm/llvm-project · GitHub

2 Likes

This looks good to me. Loading the address of a box cannot alias with access to data pointed to by the box. I also find the naming of data vs non-data more intuitive than SourceKind::Direct etc. Thanks!