Complex number division (complex.div) seems to be converted using either ComplexToStandard or ComplexToLLVM. ComplexToStandard uses Smith’s algorithm, while ComplexToLLVM uses an algebraic method. (A brief explanation of the two calculation methods is here.) What is the reason for this difference in computation methods? Should they ideally be converted to the same algorithm?
Proposal
As mentioned above, complex number division can be computed using either Smith’s algorithm or an algebraic method. I propose making it possible to select between these methods. This is because Smith’s algorithm is less prone to overflow but is slower, while the algebraic method is faster but more susceptible to overflow.
My proposed approach is to add a new attribute (complexRange) to the complex dialect. The conversion process would then use this attribute to select the algorithm. Following the implementation in clang, the values for complexRange would be CX_Full, CX_Improved, CX_Promoted, CX_Basic, and CX_None. As an initial implementation, I propose that if CX_Basic is selected, the algebraic calculation is used for conversion; otherwise, Smith’s algorithm is used. I anticipate that other intermediate computations involving complex numbers may also have multiple methods. I intend to extend the use of the complexRange attribute to select the computation method for these operations in the future.
Likely done by different people at different times without communicating with each other. Find the people that modified the code recently and historically (via git blame) and @ them here for visibility.
Definitely!
This sounds reasonable, especially because of the different tradeoffs. However, I’d wait for more people to reply here to make sure we’d use all of those variants.
Without further information, we can start with only CX_Full and CX_Basic to switch between the two, and let future uses justify their existence on a case-by-case basis.
Question: What is the purpose of CX_None, and how should we handle it here?
Regardless of the outcome, this proposal will change the current behaviour upstream. We want to make it simple (but extendable) and only really change the behaviour once.
It’s unlikely that picking one of the two methods is the right thing to do. The most likely outcome is to implement both, as you suggest, and at most have one of them as the default behaviour (the safest, but slower method), and apply it to both lowering.
Due to its behaviour breaking nature, we need to announce it, implement it, deprecate, remove. The speed that this happens depends on the users’ ability to shift. It may be that all are happy with a single switch, but we still need to consider such cases and announce it beforehand.
But, let’s reach consensus first, then discuss next steps.
According to git blame, the conversion algorithm part seems to have remained unchanged since the file was initially added three years ago.
Understood, thank you!
CX_None means that there are no specifications for intermediate calculations. Therefore, I think we can convert it to a fast but unstable algebraic representation. However, I think this will behave the same as CX_Basic, so it might be unnecessary.
CX_Improved (Algorithm is the same as the current ComplexToStandard, but does not handle NaN and Inf.)
Converts to the Smith algorithm. Does not handle NaN and Inf.
CX_Promoted
Converts to algebraic using higher precision. Does not handle NaN and Inf.
CX_Basic (Algorith is the same as current ComplexToLLVM)
Converts to algebraic representation. Does not handle NaN and Inf.
CX_None
Means there are no specifications for intermediate calculations. Therefore, it converts to a fast algebraic representation (resulting in the same conversion as CX_Basic).
Understood. I will wait to get agreement from many people that a change is needed before proceeding.
@pifon2a , @akuegel
I believe you’ve implemented transformations for complex number division. Could you comment on why the calculation algorithm for complex number division is different?
As you noted, the primary difference between Smith (improved) and algebraic (basic) is that the Smith algorithm reduces the likelihood of overflow with intermediate results. There are still cases where the Smith algorithm produces incorrect results. The “promoted” implementation attempts to perform intermediate calculations at the next higher precision available (assuming you know what types the target supports). This can be faster than the Smith algorithm and handles the corner cases that Smith doesn’t.
I’m sure you saw all this in the clang link, but having been involved in the clang implementation, I wanted to mention it in this thread and offer to discuss those motivations further. I can’t speak to why complex.div was implemented as it has been, but I like the idea of adding an attribute to make the various options available.
If I remember correctly, the CX_None option was related to the clang command-line handling and merely offered a distinction between default behavior and explicitly chosen behavior.
Not all frameworks and domains are equally concerned with numerics and the recent change in the lowering of complex division was motivated by need in an GPU & ML focussed framework (and did result in breaking tests). So enabling different options for different environments makes sense to me.
Is this an actual attribute on a per op basis? Or option to conversion pass? Or could be both too, one could have on per op with fullback to pass line set default.
I was wondering as I would have expected it to be rather consistent for all the ops in a pipeline (e.g., for a given domain, the tool used would fix it rather than each op having an option). The benefit of having it as pass option (or at least starting with it) would be that it would be easy to add with default matching existing without any users being affected (opt in). While an attribute per op could affect many patterns.
Or dealers choice. I think it’s useful to differentiate as clang does to and I can see it useful for cases where (say) one has backends and targets with different capabilities that if the user hasn’t set one, the backend pipeline will choose. And one could even randomly choose between any of the others if unset (as this is meant to indicate “I don’t care”). For the first id probably do exactly what you suggest but may just not document it as being fixed.
I’m not sure what full represents. The others define behavior of intermediate value computations. But would seem that one would have option to match to runtime with different accumulate behavior independently of this. So seems to be “don’t expand I’ll match later”, so as pass option it would be “don’t add the pattern to rewrite set”. If this is intent though, one could also match to a runtime call before this pass.
I planned to add the attribute per operation. I considered this because, similar to the C99 CX_LIMITED_RANGE pragma, there’s a possibility of specifying the intermediate calculation method for each operation. However, neither gcc nor clang implement operation-level control of intermediate calculations via pragmas; instead, it’s controlled at the translation unit level. If that implementation is acceptable, it would be better to allow setting the intermediate calculation method per translation unit by passing options to conversion pass.
Thank you for the explanation. Regarding complex number division, it seems there’s no process to replace complex.div with a runtime call (it should probably be in LLVMToLibm). For now, instead of adding the CX_Full attribute value, would it be better to implement only CX_None (Smith algorithm) and CX_Basic (algebraic algorithm), and switch between them per operation or per translation unit?
I think so yes. I was leaving space for others to chime in. But IMHO this implements what is done/is needed today, would be used by different groups immediately and I don’t see it restricting space later on (and can easily be extended). So I think it’s plain goodness based on known needs/uses.
Thank you for your reply! Since it seems fine to not implement CX_Full this time, I’ll skip it.
While I may not have received sufficient responses, there’s no significant opposition, so could I proceed with the implementation as described below? Any further comments on how to achieve broader consensus or on the implementation details would be greatly appreciated.
Current Implementation of Conversion
complex.div is expanded to algebraic in ComplexToLLVM and to smith with handling NaNs and infinities in ComplexToStandard. I want to implement both calculation methods in both passes and allow for selection between them.
Proposal
Add a new attribute, complexRange<value>, to the complex dialect. value will have three options: CX_Improved, CX_Basic, CX_None.
CX_Full: Not to be implemented. Requires conversion to runtime, which currently doesn’t exist, and would likely cause ABI issues.
CX_Improved: To be implemented. Converts to smith.
CX_Promoted: Not to be implemented. No corresponding conversion currently exists in current MLIR.
CX_Basic: To be implemented. Converts to algebraic.
CX_None: To be implemented. Converts to algebraic. This distinguishes cases where the user explicitly specified none.
Allow ComplexArithmeticOp, the parent operation of complex.div, to have the complexRange<value> attribute. Absence of the attribute will also be allowed.
Implement both the smith algorithm and the algebraic algorithm in both ComplexToLLVM and ComplexToStandard passes. The conversion will be selected based on the value of complexRange<value> attached to complex.div. The default (when no attribute is present) and CX_Improved will use the smith algorithm; otherwise (CX_Basic, CX_None) it will use the algebraic algorithm.
Concerns
This change doesn’t alter the default behavior of ComplexToStandard, but it changes the default behavior of ComplexToLLVM from algebraic to smith, making it a breaking change. How should I notify users about this?
Future Tasks
Implementation of CX_Full, CX_Promoted, and their corresponding division calculation methods.
Selection of intermediate calculations using complexRange<value> for other complex operations.
Implement the two lowerings (well they are implemented, so perhaps just move to a utility function), add ComplexRange option to those two passes and then call either of the utility functions internally in the passes.
If needed later per op, then you can introduce an attribute. One can have the fallback behavior (check if attribute is set on op else query on pass) and so its just a refinement with little changes. But we can get feedback from folks to see if needed.
Given the current state, a per pass option seems sufficient & can be added in backwards compatible way (e.g., you could start with having each pass have a different default behavior, mark one of them’s default behavior as deprecated, make a post here and then flip the defaults to be the same - this allow users to update, its like a 1 line change later and so little churn or cost for anyone).
Thank you for your suggetion!
As you said, I think it’s better to add an option to the passes considering later modifications. I’ll implement it by adding a complexRange option to the passes and passing a value to it.
I plan to implement the following. Any comments would be appreciated.
Add a complex-range option to both ComplexToStandard and ComplexToLLVM.
The option will have three values:
CX_Improved : default for ComplexToStandard. Converts to Smith algorithm.
CX_Basic : default for ComplexToLLVM. Converts to algebraic algorithm.
CX_None : Converts to algebraic algorithm.
Implement two algorithms in each pass.
While the conversion algorithms are common, extracting them as utility functions might be difficult because the target dialects are LLVM and Arith. I plan to implement them within each DivOpConversion respectively.
Next step
After notifying users that the default for ComplexToLLVM will change from CX_Basic to CX_Improved, I will change the default. Would posting in the sub-category “Deprecation & Important Refactoring” be appropriate? If so, how long should I wait after posting?
I am not sure how much effort we save on not adding the attributes to the operations themselves, though. I think we will end up with them, eventually, when we will support cross module function inlining at MLIR level, and this is where we may get a mix of complex dialect operations with different complex range settings. I am not against adding the pass options, but I think it may become a throw away code.
Thank you for all the comments. I’ve created and submitted a patch adding the option to the passes.
I’m not sure how much this patch will reduce the additional modifications needed if we end up setting attributes per operation, but is it alright to leave it as is?