Rethink on approach to low precision FP types

That’s correct. If you want to use #my_research.custom_float inside an MLIR vector type or use certain ops such as those from the arith dialect, we need (3) because those types/ops only accept types that upstream MLIR considers to be float types (i.e., FloatType).

Yes, that’s what I implemented in the above-mentioned PR. It comes at a compilation time cost though.

Thanks for the analysis. I think this provides a pretty reasonably derived worst case bound to what the cost is.

I don’t know how to make this decision, but the first thing I would want to ask is what the typical cost is for a moderately complicated real world program. My guess is that the cost is probably negligible in such cases. And that is a far easier position to argue a benefit from if it is true.

2 Likes

The “heaviest” integration test that we have in MLIR is the sparse compiler test suite. But that’s still a pretty small test.

Benchmark 1: mlir-opt sparse_binary.mlir --sparsifier="enable-runtime-library=false enable-buffer-initialization=true vl=2 reassociate-fp-reductions=true enable-index-optimizations=true"
  BEFORE
  Time (mean ± σ):     229.5 ms ±   4.4 ms    [User: 331.0 ms, System: 103.6 ms]
  Range (min … max):   222.7 ms … 241.3 ms    50 runs

  AFTER
  Time (mean ± σ):     230.8 ms ±   3.6 ms    [User: 332.2 ms, System: 105.8 ms]
  Range (min … max):   225.1 ms … 247.4 ms    50 runs

Do you have some larger examples/models in IREE that I can try? (And instructions how to compile them?) If I remember correctly, the IREE CI can also measure compilation time. Would that be a good indicator or is the data too noisy?

@marbre anything come to mind to test?

May also want to ask the flang guys.

Just in case it might be tangentially relevant as a proof of existence for the parametrizability of the family of FP types, here are some reference (slow) conversion helpers between f32 and {f16,bf16,f8e4m3,f8e5m2,f8e4m3fnuz,f8e5m2fnuz}. Just to give a taste of what a parameter space might look like.

May also want to ask the flang guys.

I did a few runs on big Fortran application using floating points (mainly WRF from SPEC FP 2017 (couple hundred Fortran files that take ~1100s to compile on an AMD EPYC 9334) and aermod_11 from Polyhedron (about 50kloc that takes about 80s to compile), I saw no measurable compile time impact with @matthias-springer patch.

Instrumenting the code, getFloatSemantics current number of calls in flang (at -O3) is roughly linear with the number of lines in these apps, so the extra cost there is not visible when compared to all the other compilation costs.

So I do not see problems with this change on our side.

3 Likes

That evidence is sufficient for me. +1 on opening up the type as @matthias-springer recommends.