We need the placement representation in the IR in the design of DISC. I guess this would be a common issue for other host-device joint compiler stacks. A PlacerPass in the tensor world is usually more feasible for a number of reasons. So we need the placement representation in both tensor world and buffer world.
For tensor world:
Our current proposal is to add a custom attribute on mhlo dialect, an example is:
%209 = “xla_hlo.d_reshape”(%arg1, %208) {xla_dhlo.device = “cpu”} : (tensor<?xi32>, tensor<2xi32>) → tensor<?x1xi32>
Another example for a multioutput node:
%5387 = “xla_hlo.d_topk”(%5384, %5386, %73) {dimension = 1 : i64, xla_dhlo.device = [“gpu”, “gpu”]} : (tensor<?x22605xf32>, tensor<?x22605xi32>, tensor) → tuple<tensor<?x6xf32>, tensor<?x6xi32>>
This works fine in our current codebase. However, there’s a risk that other mhlo passes may not properly handle a custom attribute, mistakenly drop it for example, or the replaced ops might not correctly inherit the placement attribute in some kind of mhlo optimizations. So, I can think of two solutions at the moment:
1, Add an “official” attribute in mhlo Dialect, and hope all the mhlo layer optimization passes properly inherit it.
2, Add a memory space property in TensorType.
From my point of view, I personally prefer 1 since it’s not intuitive for a tensor to have a memory space. But I still have a little bit of concern for solution 1, which is, how can we actually make an attribute ‘official’?
Pls let know if you have any better ideas.
For buffer world:
It should be OK for us to just use the MemorySpace attribute in MemRefType. But I still have a few understandings may need to be confirmed:
1, the ‘MemorySpace’ may contains two levels of information: the memory hierarchy (alloc vs alloca), and the memory type (host0/host1/device0/device1/device2). How to interpret it is user-defined.
2, the 'MemorySpace of MemRefType will not be strictly associated with the ‘address space’ of llvm