I put most of my reasoning in the commit message on the revision, but, to summarize, both ArithToLLVM and ArithToSPIRV have their own conversion patterns for the various arith.min* and arith.max* operations, which are actually different between the two converters (LLVM doesn’t try to implement any NaN semantics, SPIRV does). So, since all of the conversions can handle the arith.{min,max}* operations, we don’t need tho expansion code anymore. All it’s doing right now is providing interesting footgun opportunities, IMO.
Am I missing something? Is anyone relying on that code?
(Tangentially, having gone and looked, it appears that our lowering to LLVM doesn’t preserve the NaN semastics we specified in the docs, since LLVM’s maximum follows C and says max(nan, X) = max(X, nan) = X. But since that’s an existing bug, making a decision on that is future work, I’d think.)
Did I misread the LLVM documenation and get the NaN behavior backwards, or does the IEEE operation have the opposite semantics to what LLVM implements?
Thanks for bringing this up! I think part of the problem is that when we introduced the arith.minf and arith.maxf ops it wasn’t clear what behavior we wanted to model in MLIR regarding NaN, ±0.0 and friends. I think we just propagated the documentation from LLVM. In addition to the NaN misalignment, for example, I don’t think we ever honored the treating -0.0 as less than +0.0 behavior.
IMO, the first step to disentangle this would be to make a call on the variants we want to represent in MLIR, which is not easy at all.
In addition to the variants above we should also consider the variants from the ML frameworks, which may or may not fall into any of the above. IIRC, there were some subtleties and description gaps in some of the framework specs about the handling of the special floating-point numbers that make this problem even more challenging.
We should also take into account that whatever we decide about standalone fp min and max ops should be propagated to the reduction ops in some MLIR dialects (e.g., vector).
Perhaps aligning MLIR with the variants represented in LLVM would be a good way to go. I’m not familiar with the SPIRV ones. Are they significantly different?
The opencl ones match C99 (signaling NaNs may behave as quiet NaNs), the opengl variants are similar but less specific on NaNs. -0/+0 handling is not specified either.
So we may not be able to lower better defined min/max ops to SPIR-V.
We currently lower the min and max ops to spir-v by adding the NaN checks ourselves … except when -arith-expand-ops is used, which will cause that lowering to be skipped.
(Thinking out loud) I’m wondering if adding attributes to arith.minf and arith.maxf to describe the NaN and 0.0 handling would be reasonable in this case vs having independent ops for each variant. I know this approach is usually discouraged but I have the impression that the NaN/Inf/0.0 handling is a property that is not so common to be reasoned about during high-level transformations/optimizations (with obvious exceptions, of course) and it’s more a property used at lowering time. If that is the case, I see more value in representing the core fp min/max semantics with single ops than having too many variants of them precisely modeling semantic aspects that perhaps are nots so valuable at this level of abstraction. To some extent, this is not so different from the ordered/unordered behavior in fp compare operations…
I think having the right coverage here is important because the NaN/Inf/0.0 handling can be more expensive than the min/max operation itself and it wouldn’t make sense to enforce a specific handling for specs that do not require them.
I think the right answer is just have all the operations and pick which ones you want. They’re distinct operations, part of the problem is things trying to stuff different operations in the same name.
Would folks be OK with arith.maxf and arith.minf becoming changed from “nan inputs produces nan” to “what happens to nan inputs depends on the lowering”?
I think this wouldn’t work. The semantics of an operation shouldn’t be defined based on its lowering. They should be properly defined in the spec of the operation.
I’m ok with introducing a district operation for each variant if people feel that’s the best way to go. My only concern with this approach is that we may end up with a combinatorial number of variants that are all treated as a single op in 99% of the code at this level of abstraction. We could also introduce a MaxFOpInterface and MinFOpInterface, which would be implemented by each variant, for those cases where we want to treat them all as a single op, but again, if these interfaces are used 99% of the time I’m not sure about the value of having distinct operations for each variant.
Using attributes seems reasonable to me here, if the attributes don’t change fundamental properties of the operation (does it stay commutative/associative? etc.)
Agreed. Ok, so this probably needs to be a new RFC. … but while we’re here, are there any other float operations other than min and max that have all these variants?
Potentially any reduction operation that can have an fp min/max combiner (e.g., vector.reduction, vector.multi_reduction). Not sure if we have more in other dialects.