Unlikely branches can have expensive contents hoisted

Hey all - me again,

So I’m looking at llvm.expect specifically for branch hints. In the following example LLVM will hoist the pow/cos calls into the entry block even though I’ve used the llvm.expect intrinsic to make it clear that one of the calls is unlikely to occur.

target datalayout = “e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-pc-windows-msvc-coff”

define dllexport double @foo(i32 %val) {
entry:
%0 = icmp slt i32 %val, 42
%1 = call i1 @llvm.expect.i1(i1 %0, i1 false)
%2 = sitofp i32 %val to double
br i1 %1, label %true, label %false

true:
%3 = call double @llvm.cos.f64(double %2)
br label %merge

false:
%4 = call double @llvm.pow.f64(double %2, double 4.242000e+01)
br label %merge

merge:
%var.1.0 = phi double [ %4, %false ], [ %3, %true ]
ret double %var.1.0
}

declare i1 @llvm.expect.i1(i1, i1)
declare double @llvm.cos.f64(double)
declare double @llvm.pow.f64(double, double)

This seems counter intuitive to me - I’ve told LLVM that one of the calls will probably not happen, and I expected it to preserve the call in the unlikely branch so we don’t pay the cost for something unlikely to be used.

I also injected a pass locally that adds, for branches whose condition is llvm.expect, the branch weight metadata - but LLVM will still always fold the branch away ensuring that the expensive call is always called.

The part of SimplifyCFG that does this is FoldTwoEntryPHINode from what I can tell.

So is there anything I can do here without modifying LLVM? Have I missed something?

Otherwise I guess I’d have to change FoldTwoEntryPHINode to not do this in the presence of branch weights / expect?

Thanks for any help,

Cheers,
-Neil.

Hey all - me again,

So I’m looking at llvm.expect specifically for branch hints. In the following example LLVM will hoist the pow/cos calls into the entry block even though I’ve used the llvm.expect intrinsic to make it clear that one of the calls is unlikely to occur.

target datalayout = “e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-pc-windows-msvc-coff”

define dllexport double @foo(i32 %val) {
entry:
%0 = icmp slt i32 %val, 42
%1 = call i1 @llvm.expect.i1(i1 %0, i1 false)
%2 = sitofp i32 %val to double
br i1 %1, label %true, label %false

true:
%3 = call double @llvm.cos.f64(double %2)
br label %merge

false:
%4 = call double @llvm.pow.f64(double %2, double 4.242000e+01)
br label %merge

merge:
%var.1.0 = phi double [ %4, %false ], [ %3, %true ]
ret double %var.1.0
}

declare i1 @llvm.expect.i1(i1, i1)
declare double @llvm.cos.f64(double)
declare double @llvm.pow.f64(double, double)

This seems counter intuitive to me - I’ve told LLVM that one of the calls will probably not happen, and I expected it to preserve the call in the unlikely branch so we don’t pay the cost for something unlikely to be used.

I also injected a pass locally that adds, for branches whose condition is llvm.expect, the branch weight metadata - but LLVM will still always fold the branch away ensuring that the expensive call is always called.

The part of SimplifyCFG that does this is FoldTwoEntryPHINode from what I can tell.

So is there anything I can do here without modifying LLVM? Have I missed something?

Otherwise I guess I’d have to change FoldTwoEntryPHINode to not do this in the presence of branch weights / expect?

Passing -two-entry-phi-node-folding-threshold=1 seems to prevent this folding, but that may not be what you need.

As it doesn’t look like FoldTwoEntryPHINode checks for branch hints, it may make sense to change FoldTwoEntryPHINode.

I would think that it is, again, a cost model issue.
https://godbolt.org/z/cTKnfK (latency)

Cost Model: Found an estimated cost of 3 for instruction: %3 = call double @llvm.cos.f64(double %2)

Cost Model: Found an estimated cost of 3 for instruction: %4 = call double @llvm.pow.f64(double %2, double 4.242000e+01)

Is that actually correct? I’d expect it to be somewhat larger…

Roman.

Beyond implementing my own TargetTransformInfo is there a way for me to alter the cost here? Just to see whether that’d be enough to prevent FoldTwoEntryPHINode from doing the wrong thing.

Also I think Hiroshi is probably right - even if the cost of the very unlikely branch is low, I’ve specifically told the optimizer that the code will nearly never be executed, so it seems wrong that it’d hoist even a genuinely low cost call.

So perhaps FoldTwoEntryPHINode needs fixed and the cost model needs altered. Gulp!

Cheers,
-Neil.