Rethink on approach to low precision FP types

matthias-springer · December 6, 2024, 7:55pm

That’s correct. If you want to use #my_research.custom_float inside an MLIR vector type or use certain ops such as those from the arith dialect, we need (3) because those types/ops only accept types that upstream MLIR considers to be float types (i.e., FloatType).

Yes, that’s what I implemented in the above-mentioned PR. It comes at a compilation time cost though.

stellaraccident · December 10, 2024, 6:43pm

Thanks for the analysis. I think this provides a pretty reasonably derived worst case bound to what the cost is.

I don’t know how to make this decision, but the first thing I would want to ask is what the typical cost is for a moderately complicated real world program. My guess is that the cost is probably negligible in such cases. And that is a far easier position to argue a benefit from if it is true.

matthias-springer · December 12, 2024, 11:06am

The “heaviest” integration test that we have in MLIR is the sparse compiler test suite. But that’s still a pretty small test.

Benchmark 1: mlir-opt sparse_binary.mlir --sparsifier="enable-runtime-library=false enable-buffer-initialization=true vl=2 reassociate-fp-reductions=true enable-index-optimizations=true"
  BEFORE
  Time (mean ± σ):     229.5 ms ±   4.4 ms    [User: 331.0 ms, System: 103.6 ms]
  Range (min … max):   222.7 ms … 241.3 ms    50 runs

  AFTER
  Time (mean ± σ):     230.8 ms ±   3.6 ms    [User: 332.2 ms, System: 105.8 ms]
  Range (min … max):   225.1 ms … 247.4 ms    50 runs

Do you have some larger examples/models in IREE that I can try? (And instructions how to compile them?) If I remember correctly, the IREE CI can also measure compilation time. Would that be a good indicator or is the data too noisy?

stellaraccident · December 12, 2024, 8:22pm

@marbre anything come to mind to test?

May also want to ask the flang guys.

bjacob · December 13, 2024, 2:22am

Just in case it might be tangentially relevant as a proof of existence for the parametrizability of the family of FP types, here are some reference (slow) conversion helpers between f32 and {f16,bf16,f8e4m3,f8e5m2,f8e4m3fnuz,f8e5m2fnuz}. Just to give a taste of what a parameter space might look like.

github.com

iree-org/iree/blob/900ef1dda1a16d1d8f2c404bfd0dbb006f81eced/runtime/src/iree/base/internal/math.h#L266-L517


      
          //==============================================================================
          // FP16, BFloat16 and FP8 support
          //==============================================================================
          
          // NOTE: We used to have code here using built-in _Float16 type support.
          // It worked well (https://godbolt.org/z/3a6WM39M1) until it didn't for
          // some people (#14549). It's not worth the hassle, this is only used
          // in slow generic fallbacks or test code, and we weren't able to use
          // a builtin for bf16 anyway.
          
          // Define some helper constants for working with a floating-point format with
          // the given number of {exponent,mantissa} bits.
          #define IREE_MATH_FP_FORMAT_CONSTANTS(prefix, ebits, mbits, bias_tweak)      \
            const int prefix##exp_bits IREE_ATTRIBUTE_UNUSED = ebits;                  \
            const int prefix##mantissa_bits IREE_ATTRIBUTE_UNUSED = mbits;             \
            const int prefix##sign_shift IREE_ATTRIBUTE_UNUSED = ebits + mbits;        \
            const int prefix##exp_shift IREE_ATTRIBUTE_UNUSED = prefix##mantissa_bits; \
            const int prefix##sign_mask IREE_ATTRIBUTE_UNUSED = 1u                     \
                                                                << prefix##sign_shift; \
            const int prefix##mantissa_mask IREE_ATTRIBUTE_UNUSED =                    \

This file has been truncated. show original

jeanPerier · December 13, 2024, 5:12pm

May also want to ask the flang guys.

I did a few runs on big Fortran application using floating points (mainly WRF from SPEC FP 2017 (couple hundred Fortran files that take ~1100s to compile on an AMD EPYC 9334) and aermod_11 from Polyhedron (about 50kloc that takes about 80s to compile), I saw no measurable compile time impact with @matthias-springer patch.

Instrumenting the code, getFloatSemantics current number of calls in flang (at -O3) is roughly linear with the number of lines in these apps, so the extra cost there is not visible when compared to all the other compilation costs.

So I do not see problems with this change on our side.

stellaraccident · December 13, 2024, 7:44pm

That evidence is sufficient for me. +1 on opening up the type as @matthias-springer recommends.

Topic		Replies	Views
[RFC] Adding better support for higher precision floating-point MLIR	5	847	January 15, 2021
RFC: Add APFloat and MLIR type support for fp8 (e5m2) LLVM Project	18	3100	November 1, 2022
[RFC] Fix floating-point `max` and `min` operations in MLIR MLIR rfc	8	1276	August 21, 2023
Obtaining Min-Max Values for MLIR Data Types in C++ MLIR	4	361	August 24, 2023
Add LLVM type support for fp8 data types (F8E4M3 and F8E5M2) LLVM Project llvm	21	3181	March 19, 2024

Rethink on approach to low precision FP types

Related topics