Hello,
I would like to discuss options for attributing MLIR operations produced by Flang with arith::FastMathAttr
attribute. As you might be aware there are several patches merged or being in review related to this ([1], [2], [3], [4]). So now we need to start attaching the attribute to arith
, math
, etc. operations in Flang.
Here is a list of options that I’ve collected so far:
-
@jfurtek has a pass in review [5] that would attach the specified attribute value to all operations that support
arith::FastMathAttr
recursively.
- Have a special property in
fir::FirOpBuilder
that identifies the fastmath flags that need to be attached to operations created through this builder.
- Same as above, but do it in
mlir::OpBuilder
.
While (1) seems to be a convenient option, I think we may want to have places in Flang (e.g. at some point in lowering) where we would diverge from the options specified by the user and override the fastmath behavior by turing-off user allowed fastmath flags or turning-on more flags that user allowed. An example of the latter may be setting reassoc
for expressions where it is allowed by Fortran standard (10.1.5.2.4). For the former I do not have a good example, so something artificial is disabling contract
and reassoc
under some conditions for operations inlined to implement DOT_PRODUCT
(e.g. under some made up option -fordered-reductions
applied on top of -ffast-math
). So having just a single pass does not give us much flexibility.
(2) and (3) just follow the path of llvm::IRBuilder::setFastMathFlags() that defines fastmath flags for the next instructions created through the builder. We may also support something like llvm::IRBuilder::FastMathFlagGuard to manage local overrides of fastmath flags.
The builder’s create
methods will attach the current fastmath flags based on the check whether the created operation supports arith::ArithFastMathInterface
.
(2) seems like a good start to me because it is local to Flang (so does not interfere with other parts of LLVM and can be implemented relatively fast), and later after figuring out all the caveats with fir::FirOpBuilder
prototype we can propose the same change for mlir::OpBuilder
. For example, I expect it to be a problem for MLIRIR
(the component providing mlir::OpBuilder
) to depend on MLIRArithDialect
, while FIRBuilder
component already depends on all ${dialect_libs}
(for whatever reason).
Please let me know if you want to consider other options or if you prefer any of the listed ones.
[1] ⚙ D126305 [mlir][arith] Initial support for fastmath flag attributes in the Arithmetic dialect (v2)
[2] ⚙ D136312 [mlir][math] Initial support for fastmath flag attributes for Math dialect.
[3] ⚙ D136080 [flang] Add -ffp-contract option processing
[4] ⚙ D137072 [flang] Add -f[no-]honor-infinities and -menable-no-infs
[5] ⚙ D137114 [mlir][arith] Add pass to globally set fastmath attributes for a module
2 Likes
I agree that Option 2 is the best way forward.
For Transformation/Conversion passes in Flang that operate on floating point numbers, I guess it will be the responsibility of the pass to honour and propagate the Fast Math attributes. I believe we do not have any of the former but the Conversion to LLVM has several usage or creation of Floating point operations (eg from CodeGen.cpp below).
auto rrn = rewriter.create<mlir::LLVM::FAddOp>(loc, eleTy, xx, yy);
auto rin = rewriter.create<mlir::LLVM::FSubOp>(loc, eleTy, yx, xy)
BTW, would you like us (@DavidTruby) to make similar changes in the complex
Dialect and passes as is in the math
dialect?
Thank you for the reply, Kiran! I will make the changes for FirOpBuilder
and the Bridge
(where we create the main builders). FWIW, I am going to propagate the LangOptions
to the FirConverter
via LoweringOptions
.
Yes, the transformation passes should take into account fastmath attribute of an existing operation that they try to transform and propagate them further. To be able to do this in CodeGen
we need to support fastmath attribute for FIR complex arithmetic operations (fir::AddcOp
, fir::MulcOp
, etc.).
Of course, your help is welcome! Since there are multiple things to do, I think we need to prioritize the work based on benchmarks analysis.
Here is the data that I have so far (I made the analysis on x86):
- The biggest gainer from fastmath flags is CPU2006/454.calculix: it gets ~2x speed-up from fastmath flags added for loop nest at line 675 in
e_c3d
routine. The FP arithmetic operations are produced during lowering, so my FirOpBuilder
work should cover this.
1.1. There should be slight speed-up in CPU2017/503.bwaves and CPU2006/410.bwaves as well.
1.2. Polyhedron/induct2_11 should gain from marking mlir::math::SqrtOp
with fastmath.
- Polyhedron/induct2_11,test_fpu2_11 both gain from adding fastmath flags to operations of the simplified
DOT_PRODUCT
code.
- I did not see hot cases where having fastmath for
complex
arithmetic provides improvement. I guess it may slightly improve performance here and there. I believe @jfurtek has already started adding fastmath support into complex
dialect, so we need to make sure to coordinate the changes with him. On our side, we can add fastmath support for FIR complex arithmetic as noted above.
With this said, it will be great if you can work on (2) (of course, this is just my suggestion). Of the top of my head:
- We may add
arith::FastMathAttr
support to fir.call
operations such that we can mark Fortran runtime DOT_PRODUCT
calls.
- Then in
SimplifyIntrinsicsPass
we will be able to propagate the attribute to the inline implementations by configuring FirOpBuilder
accordingly.
- We will have to resolve an issue with separate Fortran modules containing
DOT_PRODUCT
calls and compiled with different fastmath options: if we name the simplified functions the same and we keep using linkonceodr
linkage, the linker will take one of the versions, so if it takes “slow” version, then we lose performance, and if it takes “fast” version, then we may produce inaccurate results in the module compiled with stricter fastmath settings. I do not think we should resolve this with functions naming, because there may be too many versions for all combinations of fastmath flags. We can try to just inline the simplified code instead of keeping it in functions or there may be other options.
- Note that with
HLFIR
lowering we are planning to actually inline the transformational operations (ref and ref), so starting to inline them now in SimplifyIntrinsicsPass
agrees with the future handling.
Another thing that we can do is to see whether it is profitable to apply Math::PolynomialApproximation under -ffast-math
(e.g. under afn
in particular). If it is, then we will have to modify the pass to trigger rewrites only for operations with appropriate fastmath flag and add the pass into our optimization pipeline (e.g. under some aggressive optimization level).
1 Like
Thanks @szakharin for the detailed information.
I have seen that wrf (12%) and fotonik (9%) also benefit from fast-math
in both the gfortran and classic-flang compilers. So I hope we will get the benefits with llvm/flang as well.
We can do the work for the SimplifyIntrinsicsPass
. But @Leporacanthicus is away for a couple of weeks so it might have to wait till then.
Right, I missed fotonik! My estimation is 3.3% speed-up, but we may benefit more later, when aliasing information is present to enable vectorization.
I did not look at wrf yet.
With the latest changes wrf gained 7.99% and fotonik only 1.56% on x86. The hot fotonik loops are not vectorized, so fastmath gain seems to be limited.
I will make changes in SimplifyIntrinsicsPass
that should help bwaves a little bit.