That’s correct. If you want to use #my_research.custom_float inside an MLIR vector type or use certain ops such as those from the arith dialect, we need (3) because those types/ops only accept types that upstream MLIR considers to be float types (i.e., FloatType).
Yes, that’s what I implemented in the above-mentioned PR. It comes at a compilation time cost though.
Thanks for the analysis. I think this provides a pretty reasonably derived worst case bound to what the cost is.
I don’t know how to make this decision, but the first thing I would want to ask is what the typical cost is for a moderately complicated real world program. My guess is that the cost is probably negligible in such cases. And that is a far easier position to argue a benefit from if it is true.
The “heaviest” integration test that we have in MLIR is the sparse compiler test suite. But that’s still a pretty small test.
Benchmark 1: mlir-opt sparse_binary.mlir --sparsifier="enable-runtime-library=false enable-buffer-initialization=true vl=2 reassociate-fp-reductions=true enable-index-optimizations=true"
BEFORE
Time (mean ± σ): 229.5 ms ± 4.4 ms [User: 331.0 ms, System: 103.6 ms]
Range (min … max): 222.7 ms … 241.3 ms 50 runs
AFTER
Time (mean ± σ): 230.8 ms ± 3.6 ms [User: 332.2 ms, System: 105.8 ms]
Range (min … max): 225.1 ms … 247.4 ms 50 runs
Do you have some larger examples/models in IREE that I can try? (And instructions how to compile them?) If I remember correctly, the IREE CI can also measure compilation time. Would that be a good indicator or is the data too noisy?
Just in case it might be tangentially relevant as a proof of existence for the parametrizability of the family of FP types, here are some reference (slow) conversion helpers between f32 and {f16,bf16,f8e4m3,f8e5m2,f8e4m3fnuz,f8e5m2fnuz}. Just to give a taste of what a parameter space might look like.
I did a few runs on big Fortran application using floating points (mainly WRF from SPEC FP 2017 (couple hundred Fortran files that take ~1100s to compile on an AMD EPYC 9334) and aermod_11 from Polyhedron (about 50kloc that takes about 80s to compile), I saw no measurable compile time impact with @matthias-springer patch.
Instrumenting the code, getFloatSemantics current number of calls in flang (at -O3) is roughly linear with the number of lines in these apps, so the extra cost there is not visible when compared to all the other compilation costs.
So I do not see problems with this change on our side.