[RFC] Add operandIndex to sideeffect instance

cxy · May 29, 2024, 1:28pm

Background
MLIR has a flexible side-effect mechanism that allows users to track the read and write behavior of each op without explicitly doing so. It can be achieved simply by calling side-effects to query the read and write behavior of ops. However, there is a current drawback to this mechanism: the class EffectInstance, which represents side effects, can only describe the values produced by the side effects, which is not precise enough:

    foo.op(%1, %1)

If the side effect of the “foo.op” operation is to read the first operand and write to the second operand, in this scenario, we cannot determine which operands are being read from or written to.

Motivation example
This issue becomes particularly severe when we are optimizing simd memory operations:

    copy(A, B) // copy A -> B
    add(B, B, C) // first read B and C,  then add and write to B

We want to optimize ir above to

add(B, A, C) // first read A and C,  then add and write to B

Due to the reasons described in the previous section, we currently lack the means to optimize the program above.

Proposal
My plan is to add a field named operandIndex to the EffectInstance class, with a data type of unsigned, and allow users to query them. This change involves a many of analysis and optimization, so I propose submitting an RFC to gather opinions from everyone.

Any suggestion are welcome.

cxy · June 3, 2024, 12:52pm

I have implemented a version, corresponding PR: [mlir][side effect] refactor(*): Include more precise side effects by cxy-1993 · Pull Request #94213 · llvm/llvm-project · GitHub

To ensure compatibility, I have revised the above solution. The current approach is fully compatible with the original implementation, with the only change being the addition of an OpOperand* value to the union type of effect instance values.

Additionally, in this PR, I have modified the effect of the memory operations (excluding transform operations) that are currently marked with an effect to make the described information more precise.

zero9178 · June 3, 2024, 1:00pm

Generally in favour of the goal being achieved here. Specifying the effect on the operand rather than a value used as some operand is more precise.

Looking at your PR, is it possible to get rid of the Value case entirely as a next step or part of this change? I don’t think its desireable to be able to encode two things the same way (someone querying “what are the side effects of the value” and both an OpOperand and Value constructed case being applicable). You may also need to add OpResult to the pointer union as side effects may also be specified on results of the operation.

The API being kept as compatible to existing code as possible sounds great.

cxy · June 3, 2024, 4:18pm

It’s a good question, and I’ve considered this issue during implementation. It does seem inelegant to encode the same information in two ways. As you pointed out, if we remove Value , we may need to add OpResult , and possibly even BlockArg (block arg within the op region maybe?). In fact, if we implement it this way, we are still encoding redundant information – opResult and OpOperand also contain the same Value information. In the current situation, I actually think using Value is a more appropriate result, at least when using Value , the Value itself can be self-explanatory of both OpResult and BlockArg .

I think that we might be missing a data structure specifically designed to represent values and their corresponding locations. This data structure could potentially incorporate a pointUnion to encompass elements like OpOperand , OpResult , and BlockArg . In this particular context, the introduction of such a data structure could prove to be a more suitable approach.

zero9178 · June 3, 2024, 4:45pm

Redundant might have been the wrong word to use on my side. My thought was the following:
With OpOperand and Value (assuming the case where it applies on the operand), encoding it as OpOperand gives us a strict superset of the information of the Value case. Therefore the Value case is redundant as providing an OpOperand instead is strictly better. While the OpOperand and OpResult cases intersect, neither is a superset of the other.

Now as you pointed out, a Value can actually also be a BlockArgument. I personally think it’d be fine to only support OpResult and OpOperand as that also corresponds to what TableGen supports with its Arg and Res annotations. I am at least not sure what applying an effect to a block argument of an ops own region would mean. If no one is using it and it can be removed, then doing so would be beneficial IMO.

Worst case I think the pointer union of OpOperand, BlockArgument and OpResult (and reflecting this in the constructors) would be amazing as it’d avoid accidetly being able to pass in a Value for what should be an OpOperand or OpResult instead. That is the situation I’d like to avoid happening.

cxy · June 3, 2024, 5:20pm

Thank you for your explanation. I agree with you that OpResult is semantically more accurate than value , and using OpResult can also prevent value from being passed in incorrectly. As you pointed out, there are currently no examples of op has side effects on BlockArgument. I will revise the code according to your suggestions。

I would like to loop more to discuss the scheme of adding a data structure to represent the exact location of value . @matthias-springer @mehdi_amini @ftynse

matthias-springer · June 4, 2024, 8:18am

cxy:

    foo.op(%1, %1)
If the side effect of the “foo.op” operation is to read the first operand and write to the second operand, in this scenario, we cannot determine which operands are being read from or written to.

The side effect is still on the SSA value though. E.g., %1 could be a memref and that is the value that is being both read/written. So I don’t see yet why an OpOperand * would make sense here.

cxy:

    copy(A, B) // copy A -> B
    add(B, B, C) // first read B and C,  then add and write to B

We want to optimize ir above to

add(B, A, C) // first read A and C,  then add and write to B

It sounds like you just need a way to encode the order of the side effects, but don’t actually care about the OpOperand *. I.e.:

Read A
Write B
Read B
Read C
Write B

Isn’t that already encoded in the order of the side effects in the getEffects implementation? (SmallVectorImpl<MemoryEffects::EffectInstance> &effects is a vector not a set.)

cxy · June 4, 2024, 8:29am

I think we’re not aligned on what the problem is. The order of memory reads and writes is already in effect. The question now is, if I want to complete the above optimization (i.e., delete copy A->B, replace B with A in add), then we need to know how to modify the add op.

Here is the side effect information provided for the add op:

read B at 0
read A at 0
write B at 1

However, there are two operands in the add operation that uses the same value B. We only want to replace the opoperand that reads the value, but this information is not currently available through the side effect instance.

matthias-springer · June 4, 2024, 7:15pm

OK, this makes sense. And it reminds me of the BufferizableOpInterface where bufferizesToMemoryRead and bufferizesToMemoryWrite are properties of the OpOperand * (and not Value).

ftynse · June 5, 2024, 9:32am

I don’t have a strong opinion here, that’s why I haven’t commented so far. The arguments for adding the position look reasonable, we actually had a similar need in downstream aliasing/points-to analyses where we needed to know which operand is being written into not just the fact of writing.

For the sake of generality, it should remain possible to have side effects on BlockArgument. It may not make sense to have “memory read/write” effects on those, but the infrastructure is intended to support other kinds of effects, not only memory effects. Even in the memory effect domain, block argument can have e.g. the “allocate” effect in ops that represent scoped allocations (block arguments aren’t always just forwarded operands).

Code-wise, I don’t see a good solution. Storing a union of OpOperand and Value creates room to duplicate the value itself given that OpOperand is a pair (Value, int). Storing the position separately creates room to duplicate the position with that of OpResult/BlockArgument. Though the latter seems potentially a tiny bit more efficient because it avoids the indirection accessing the Value for the operand case, which is the most common in theory.

jeanPerier · June 18, 2024, 2:18pm

Is it really the intention that the effects of an operation happen in the order they appear in the vector from getEffects?

I understand this is not the question of your RFC, but are you relying on that?

I do not see that documented in the interface llvm-project/mlir/include/mlir/Interfaces/SideEffectInterfaceBase.td at 3417ff6b1c6b165fce92e7c647c65466092cf638 · llvm/llvm-project · GitHub

I may be fixing my own misconception of getEffects here, and with your add op, everything is fine, but what about operations where you cannot precisely describe the read/write order between the operands.

Take for instance some theoretical “transpose_and_assign(%A, %B)” operation that takes memref matrices and specifically document that it is not meant to process overlapping operands (i.e., the implementation could be a dumb a(i,j) = b(j, i) loop). It seems valid to me for getEffects to return of this operation to return a read effect on A and write effect on B without committing to any order. This info is already quite useful and sufficient for a lot of generic passes.

I think it it would be wrong for a pass to transform just because getEffects placed the read effect on %A before the write effect on %B in getEffects (how could it do differently?):

copy(%B, %A)                 // copy B -> A.        Read effect on B, Write effect on A
transpose_and_assign(%A, %B) // transpose A into B. Read effect on A. Write effect on B.

into

transpose_and_assign(%B, %B)

The only generic execution order should be baked in the operation order IMHO. If one need more granularity, I think some areEffectsOrdered() member should be added to the EffectOpInterfaceBase (would default to false, and could set to true when getEffects is guaranteed to give the effects in order for an operation).

If you are not relying on that effect order, then never mind and the question goes to @matthias-springer.

cxy · June 19, 2024, 2:05am

The order of occurrence of effects should be represented by the stage member variable. This is also explained in the code comments. The same stage indicates that the order of occurrence is not distinguished IMO.

jeanPerier · June 19, 2024, 8:03am

Thanks a lot for the reply! I missed that addition (just found [RFC] Add effect index in memroy effect), makes sense.

Topic		Replies	Views
[RFC] Add effect index in memroy effect MLIR	8	535	September 19, 2023
Best way to DCE ops with side effects MLIR	5	405	August 16, 2021
MLIR : Define Operation and Access C++ API Beginners	6	475	April 4, 2023
Multiple memories (address spaces) and MLIR side effects MLIR	0	159	March 30, 2023
Semantics modeling: Undefined Behavior and Side Effects MLIR	11	1255	October 6, 2022

[RFC] Add operandIndex to sideeffect instance

Related Topics