Is synchronization missed for RAW dependent ops during thread distribution inside iree?

chongxing · June 6, 2022, 6:09am

@ThomasRaoux Thanks for explanations. It’s very clear.

Could I ask another question? You have achieved a lot on optimizing matmul for cuda. And have you thought of implementing efficient conv2d for cuda inside IREE? (This drives me to learn a bit more of linalg, and then thinking of above question.)

I believe you know, implicit gemm is one efficient way to implement conv in GPU, which seems not trivial for linalg. Once I asked related question and got some explanations/answer on Is it possible to add parameter for indexing_maps of linalg.generic?, roughly mod and div of index mapping is not supported, as hyper-tangular is required for subview /tensor_insert /tensor_extract operations.

As I see several attractive features of IREE, I would like to rethink, whether it is feasible to implement implicit gemm for gpu inside IREE. But have no answer yet: able to extend with supporting non hyper-tangular, or some solution without breaking hyper-tangular?
Thanks.

Topic		Replies	Views
Tile/LowerToLoops + Distribute to processors with Linalg ops MLIR	15	1295	August 14, 2020
Making linalg.matmul to GPU runnable code MLIR	6	1251	April 19, 2022
Open MLIR Meeting 8/26/2021: High performance code generation for GPU tensor cores Announcements	7	1223	January 27, 2022
Use MLIR/IREE for GPU CodeGen MLIR	27	4322	August 14, 2020
Traceability support in MLIR/iree? MLIR	2	478	April 21, 2023

Is synchronization missed for RAW dependent ops during thread distribution inside iree?

Related Topics