RFC: Supporting Sub-Channel Quantization in MLIR

sdasgup3 · October 23, 2024, 8:15pm

@stellaraccident … if it were me, I would just define unranked tensors as out of scope for this kind of thing.

Thank you for your feedback. I agree with your suggestion to remove unranked tensors from the scope of this proposal. Additionally, this allows us to remove the quantization_dimensions field from the type definition.

Currently, the quantization_dimensions field is necessary to identify the axes along which sub-channel quantization is applied, especially for unranked tensors where the rank is unknown at compile time. By excluding unranked tensors, we always know the tensor’s rank, and we can infer the quantization dimensions directly from the shape of the scales or zero-points. For example, with a tensor of rank 4 and scales of shape [1, 2, 1, 4], we can deduce that sub-channel quantization is applied along axes [1, 3].

I’m in favor of excluding the consideration of unranked-ness from the current proposal, but would want to confirm with a few others who may be impacted by the decision – cc @rafaelubal @zwei
@sirakiin

@stellaraccident If there were concrete feedback, it would be to at least compare/contrast the approach to what ONNX does in this area …

That make sense. I have added a section to highlight the comparison.

@sjarus Consistency between
In the absence of a block size for a specific axis i, we assume its value to be equal to the dim(tensor, i) and
size(block_sizes) = rank(tensor)
Is that a forward looking comment related to the subsequent explanation on handling unranked tensors ?

You’re right to point out the need for clarity regarding size(block_sizes) = rank(tensor). This constraint applies only to ranked tensors and should be explicitly stated as such in the RFC, that said, if we do end up removing unranked tensors from the scope of the proposal, this constraint will apply unconditionally.

We can still use In the absence of a block size for a specific axis i, we assume its value to be equal to the dim(tensor, i) as a syntactic sugar for ranked cases, simplifying the type definition by allowing users to omit block sizes for axes where it equals the dimension size.

@sjarus This proposal could generalize quantization such that all three could use a common infrastructure, though this may be potentially invasive.

I agree that this proposal offers an opportunity to unify the quantization infrastructure, but we are unclear on a few side effects of the decision especially related to how invasive the change would be, i.e. would these datatypes all sharing a common TypeID base cause issues in existing quantized type logic that uses isa.

I propose to leave consolidation out of scope of this proposal, but I fully intend to try consolidating to get a sense of how invasive the change is, and if feasible will send a PR for it. This would not be a semantic change, but will likely break existing cpp code as the scale/zp types change.

Topic		Replies	Views
[RFC] Improvements in the 'quant' dialect MLIR	12	727	August 2, 2024
[RFC] Add suport for QuantileQuantizedType in Quant dialect MLIR	8	308	September 3, 2024
MLIR Quantization Roadmap? MLIR	13	2189	March 10, 2020
RFC: Removing the ops from the `quant` dialect MLIR	28	1862	July 16, 2022
[RFC] Add per channel quantized convolution ops in Linalg MLIR	5	149	March 5, 2025

RFC: Supporting Sub-Channel Quantization in MLIR

Related topics