My previous test was for opt 14.0.6
.
I moved to trunk and now there’s a difference in how many passes are printed. At least for the (different) file I test.
I reduced the file such that the IR and the passes are different (amount or which pass) for dumped_Oz and Oz.
This is the reduce file for one case.
define void @a(i64 %0, i64 %1, i64* %2, i64* %3, i64 %4, i64 %5, i32 %6) {
%8 = alloca i64, i32 0, align 8
%9 = alloca i64, i32 0, align 8
store i64 0, i64* %2, align 8
store i64 %0, i64* %2, align 8
%10 = load i64, i64* null, align 8
%11 = sub i64 0, 1
%12 = load i64, i64* %2, align 8
%13 = mul i64 1, %0
%14 = trunc i64 %0 to i32
call void @b(i32 0, i32 %6)
ret void
}
define internal void @b(i32 %0, i32 %1) {
%3 = alloca i32, i32 0, align 4
%4 = alloca i32, i32 0, align 4
%5 = alloca i32, i32 0, align 4
store i32 %0, i32* %3, align 4
store i32 %0, i32* %4, align 4
%6 = load i32, i32* undef, align 4
%7 = load i32, i32* %3, align 4
%8 = icmp sgt i32 1, %0
br i1 %8, label %9, label %14
9: ; preds = %2
%10 = load i32, i32* %4, align 4
store i32 0, i32* %5, align 4
br label %11
11: ; preds = %9
br label %12
12: ; preds = %11
call void @c()
call void @b(i32 1, i32 1)
%13 = load i32, i32* %5, align 4
call void @b(i32 %1, i32 undef)
br label %14
14: ; preds = %12, %2
ret void
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #0
declare void @c()
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
Resulting IR for Oz:
define void @a(i64 %0, i64 %1, ptr nocapture writeonly %2, ptr nocapture readnone %3, i64 %4, i64 %5, i32 %6) local_unnamed_addr {
store i64 %0, ptr %2, align 8
tail call void @c()
%8 = icmp slt i32 %6, 1
br i1 %8, label %tailrecurse.i, label %b.exit
tailrecurse.i: ; preds = %7, %tailrecurse.i
tail call void @c()
br label %tailrecurse.i, !llvm.loop !0
b.exit: ; preds = %7
ret void
}
declare void @c() local_unnamed_addr
!0 = distinct !{!0, !1}
!1 = !{!"llvm.loop.peeled.count", i32 1}
Resulting IR for dumped_Oz:
define void @a(i64 %0, i64 %1, ptr nocapture writeonly %2, ptr nocapture readnone %3, i64 %4, i64 %5, i32 %6) local_unnamed_addr {
store i64 %0, ptr %2, align 8
%8 = icmp slt i32 %6, 1
br label %tailrecurse.i
tailrecurse.i: ; preds = %tailrecurse.i, %7
tail call void @c()
br i1 %8, label %tailrecurse.i, label %b.exit
b.exit: ; preds = %tailrecurse.i
ret void
}
declare void @c() local_unnamed_addr
In the diff of the two pass traces, after ~45 printed passes, Oz does nothing in a loop-rotate
pass but dumped_Oz does.
The additional passes that are run by Oz are loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,licm<allowspeculation>
. I think at this point Oz has a loop left that dumped_Oz does not. As the additional passes are parts of the pipeline, I don’t think there are additional dynamically used passes in Oz in this case unless these passes don’t print anything when they are executed.
Is there a way to dump or inspect the state of opt when doing an optimization?
Or print the configurations/settings/heuristics of individual passes?
I’ve heard of the remarks/opt-viewer.py
but have not used them so far. I’ll check them next.