Instrumented BB in PGO

Hello,

I have a question regarding PGO instrumented BBs (I use IR-level instrumentation).

It seems that instrumented BBs do not match between the two compilations for profile-gen and profile-use for some cases. Here is an example from SPECcpu 2006 lbm (a simple case consisting of just two modules).
In the first compilation, we have 5 instrumentation points for the main function as follows:

$ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc -debug-only=pgo-instrumentation
Dump Function main Hash: 61483163021 after CFGMST
Number of Basic Blocks: 10
BB: FakeNode Index=0
BB: if.then Index=5
BB: for.body Index=4
BB: for.body.lr.ph Index=3
BB: entry Index=1
BB: for.inc Index=8
BB: if.then5 Index=7
BB: if.end Index=6
BB: for.end Index=2
BB: for.end.loopexit Index=9
Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
Edge 0: 8–>4 c W=247031
Edge 1: 6–>8 c W=159375
Edge 2: 4–>6 *c W=127500
Edge 3: 1–>2 c W=4500
Edge 4: 4–>5 W=127
Edge 5: 5–>6 * W=127
Edge 6: 6–>7 W=95
Edge 7: 7–>8 * W=95
Edge 8: 0–>1 W=12
Edge 9: 2–>0 * W=12
Edge 10: 3–>4 W=8
Edge 11: 9–>2 W=8
Edge 12: 1–>3 W=7
Edge 13: 8–>9 * W=7
Split critical edge: 4 → 6
Adding Instrumentation in BB Name=for.body.if.end_crit_edge
Adding Instrumentation in BB Name=if.then
Adding Instrumentation in BB Name=if.then5
Adding Instrumentation in BB Name=for.end
Adding Instrumentation in BB Name=for.end.loopexit

After a training run, we get profile data for the main function as follows, but these count values are put into incorrect BBs in the second compilation.
Block counts: [0, 300, 4, 1, 1]

$ opt -analyze -pgo-instr-use _all_combined.bc -debug-only=pgo-instrumentation
Dump Function main Hash: 61483163021 after CFGMST
Number of Basic Blocks: 10
BB: FakeNode Index=0
BB: for.body.lr.ph Index=3
BB: if.end Index=6
BB: entry Index=1
BB: if.then Index=5
BB: for.body Index=4
BB: for.end.loopexit Index=9
BB: for.inc Index=8
BB: if.then5 Index=7
BB: for.end Index=2
Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
Edge 0: 8–>4 c W=247031
Edge 1: 6–>8 c W=159375
Edge 2: 4–>6 *c W=127500
Edge 3: 1–>2 c W=127058
Edge 4: 0–>1 W=135
Edge 5: 2–>0 * W=135
Edge 6: 4–>5 W=127
Edge 7: 5–>6 * W=127
Edge 8: 6–>7 W=95
Edge 9: 7–>8 * W=95
Edge 10: 3–>4 W=8
Edge 11: 9–>2 W=8
Edge 12: 1–>3 W=7
Edge 13: 8–>9 * W=7
5 counts
0: 0
1: 300
2: 4
3: 1
4: 1
SUM = 306
Split critical edge: 4 → 6
Setting BB Name=for.body.if.end_crit_edge with CountValue=0
Setting BB Name=for.end with CountValue=300
Setting BB Name=if.then with CountValue=4
Setting BB Name=if.then5 with CountValue=1
Setting BB Name=for.end.loopexit with CountValue=1

The CountValue 300 should go to the BB=if.then (Index 5), not for.end (Index 2). Actually because of this incorrect setting, the entry count of the main function is set 300, instead of 1 (after populating the count values).
The reason for this problem is that CFGMST edges are ordered in a different way due to different weight values (edges 0 → 1 and 2 → 0 get W=12 in the first compilation, while they get W=135 in the second compilation). The weight values are computed based on block frequency info and branch probability info, but somehow they produce different values between the two compilations.

How can we assume that CFGMST is constructed in the same way between the two compilations so that we can always set profile results into correct basic blocks?

Thank you,
–Toshjio

Hello,

I have a question regarding PGO instrumented BBs (I use IR-level
instrumentation).

It seems that instrumented BBs do not match between the two compilations
for profile-gen and profile-use for some cases. Here is an example from
SPECcpu 2006 lbm (a simple case consisting of just two modules).
In the first compilation, we have 5 instrumentation points for the main
function as follows:

$ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc
-debug-only=pgo-instrumentation
Dump Function main Hash: 61483163021 after CFGMST
Number of Basic Blocks: 10
BB: FakeNode Index=0
BB: if.then Index=5
BB: for.body Index=4
BB: for.body.lr.ph Index=3
BB: entry Index=1
BB: for.inc Index=8
BB: if.then5 Index=7
BB: if.end Index=6
BB: for.end Index=2
BB: for.end.loopexit Index=9
Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
Edge 0: 8-->4 c W=247031
Edge 1: 6-->8 c W=159375
Edge 2: 4-->6 *c W=127500
Edge 3: 1-->2 c W=4500
Edge 4: 4-->5 W=127
Edge 5: 5-->6 * W=127
Edge 6: 6-->7 W=95
Edge 7: 7-->8 * W=95
Edge 8: 0-->1 W=12
Edge 9: 2-->0 * W=12
Edge 10: 3-->4 W=8
Edge 11: 9-->2 W=8
Edge 12: 1-->3 W=7
Edge 13: 8-->9 * W=7
Split critical edge: 4 --> 6
Adding Instrumentation in BB Name=for.body.if.end_crit_edge
Adding Instrumentation in BB Name=if.then
Adding Instrumentation in BB Name=if.then5
Adding Instrumentation in BB Name=for.end
Adding Instrumentation in BB Name=for.end.loopexit

After a training run, we get profile data for the main function as
follows, but these count values are put into incorrect BBs in the second
compilation.
Block counts: [0, 300, 4, 1, 1]

$ opt -analyze -pgo-instr-use _all_combined.bc
-debug-only=pgo-instrumentation
Dump Function main Hash: 61483163021 after CFGMST
Number of Basic Blocks: 10
BB: FakeNode Index=0
BB: for.body.lr.ph Index=3
BB: if.end Index=6
BB: entry Index=1
BB: if.then Index=5
BB: for.body Index=4
BB: for.end.loopexit Index=9
BB: for.inc Index=8
BB: if.then5 Index=7
BB: for.end Index=2
Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed)
Edge 0: 8-->4 c W=247031
Edge 1: 6-->8 c W=159375
Edge 2: 4-->6 *c W=127500
Edge 3: 1-->2 c W=127058
Edge 4: 0-->1 W=135
Edge 5: 2-->0 * W=135
Edge 6: 4-->5 W=127
Edge 7: 5-->6 * W=127
Edge 8: 6-->7 W=95
Edge 9: 7-->8 * W=95
Edge 10: 3-->4 W=8
Edge 11: 9-->2 W=8
Edge 12: 1-->3 W=7
Edge 13: 8-->9 * W=7
5 counts
0: 0
1: 300
2: 4
3: 1
4: 1
SUM = 306
Split critical edge: 4 --> 6
Setting BB Name=for.body.if.end_crit_edge with CountValue=0
Setting BB Name=for.end with CountValue=300
Setting BB Name=if.then with CountValue=4
Setting BB Name=if.then5 with CountValue=1
Setting BB Name=for.end.loopexit with CountValue=1

The CountValue 300 should go to the BB=if.then (Index 5), not for.end
(Index 2). Actually because of this incorrect setting, the entry count of
the main function is set 300, instead of 1 (after populating the count
values).
The reason for this problem is that CFGMST edges are ordered in a
different way due to different weight values (edges 0 --> 1 and 2 --> 0 get
W=12 in the first compilation, while they get W=135 in the second
compilation). The weight values are computed based on block frequency info
and branch probability info, but somehow they produce different values
between the two compilations.

Different BFI produced for otherwise identical compilation is a bug we
should fix (can cause other problems too). Can you file a bug about it?

thanks,

David

Hi David,

Thank you.
I just submitted a bug report 27024 (PGO instrumentation profile data is not reflected in correct basic blocks).

Thank you,
–Toshio

graycol.gif

thank you. I have assigned the bug to xur@.

David

graycol.gif