Right now, all profile information is funneled through two analysis passes prior to any part of the optimizer using it.

First, we have BranchProbabilityInfo, which provides a simple interface to the simplest form of profile information: local and relative branch probabilities. These merely express the likelihood of taking one of a mutually exclusive set of exit paths from a basic block. They are very simple, and the foundation of the profile information. Even the other analysis is merely built on top of this one.

Second we have BlockFrequencyInfo which attempts to provide a more “global” (function-wide, not actually program wide) view of the statistical frequency with which any particular basic block is executed. This is nicely principled analysis that just computes the probabilistic flow of control through the various branches according to their probabilities established in the first analysis.

However, I think that BlockFrequencyInfo provides the wrong set of information. There is one critical reason why. Let’s take a totally uninteresting CFG:

A → B1, B2

B1 → C1, C2

B2 → C3, C4

C1 → D1, D2

C2 → ret

C3 → D3, D4

C4 → ret

D1 → E1, E2

D2 → ret

D3 → E3, E4

D4 → ret

You can imagine this repeating on for as many levels as you like. This isn’t an uncommon situation with real code. BlockFrequencyInfo computes for this a very logical answer:

---- Block Freqs ----

a = 1.0

a → b1 = 0.5

a → b2 = 0.5

b1 = 0.5

b1 → c1 = 0.25

b1 → c2 = 0.25

b2 = 0.5

b2 → c3 = 0.25

b2 → c4 = 0.25

c1 = 0.25

c1 → d1 = 0.125

c1 → d2 = 0.125

c2 = 0.25

c3 = 0.25

c3 → d3 = 0.125

c3 → d4 = 0.125

c4 = 0.25

d1 = 0.125

d1 → e1 = 0.0625

d1 → e2 = 0.0625

d2 = 0.125

d3 = 0.125

d3 → e3 = 0.0625

d3 → e4 = 0.0625

d4 = 0.125

One way of thinking about this is that for any basic block X which is predicated by N branches of unbiased probability (50/50), the frequency computed for that block is 2^(-N).

The problem is that this doesn’t represent anything close to reality. Processors’ branch prediction works precisely because very, *very* few branches in programs are 50/50. Most programs do not systematically explore breadth first the full diversity of paths through the program. And yet, in the absense of better information our heuristics would lead us to believe (and act!) as though this were true.

Now, I’m not saying that the computation of block frequencies is wrong. Merely that it cannot possibly be used for at least one of it purposes – it’s relative frequency (to the entry block “basis” frequency) is completely useless for detecting hot or cold regions of a function – it will simply claim that all regions of the function are cold. What it *is* useful for is establishing a total ordering over the basic blocks of a function. So it works well for some things like code layout, but is grossly misleading for others.

There are several possible solutions here. I’ll outline my proposal as well as some other ideas.

BlockWeights instead of BlockFrequencies. My idea is that we don’t really care about the depth of the control dependence for a particular basic block. We care about the accumulated *bias* toward or away from a basic block. This is predicated on the idea that branches are overwhelmingly predictable. As a consequence, evenly distributed probabilities are really just *uncertainties*. My proposed way of implementing this would take the exact algorithm already used in BlockFrequency, but instead of computing a frequency for an edge based on the probability of the branch times the frequency of the predecessor, instead compute it as the frequency of the predecessor times how “biased” the branch is relative to other branches in the predecessor. Essentially, this would make a branch probability set of {0.5, 0.5} produce edge frequencies equal to the predecessor’s edge frequency.

The result of such a system would produce weights for every block in the above CFG as ‘1.0’, or equivalent to the entry block weight. This to me is a really useful metric – it indicates that no block in the CFG is really more or less likely than any other. Only *biases* in a specific direction would cause the block weights to go up or down significantly. This would immediately make the analysis useful to consumers such as the vectorizer, unroller, or when we have the capability, the inliner and an outliner which respect cold regions of functions.

One alternative proposed by Nick was using the average of block frequencies across a domtree as the denominator to which a particular block’s frequency is compared. I haven’t really thought as much about this because it seems quite expensive to compute and is less intuitive to me, but I wanted to mention it. Nick might be able to provide an explanation of it that would help folks.

Related to the idea of using the domtree, I can think of many layers on top of the existing block frequencies that could be used to accurately do “cold region” detection, but they would all be significantly more analysis on top of the existing system and so are somewhat less appealing to me.

Perhaps others have still more ideas? Also comments or suggestions on mine are very welcome.

I have a really simple patch that implements my suggestion using the existing frequency infrastructure (as it is almost exactly the same). I would also want to rename everything to avoid confusion as these would no longer be “frequencies” in any strict sense. I can provide test data as needed on the result of this change if people like the direction.

-Chandler