[RFC] Control-Depent Function Returns and Redefining Control Dependencies

Hi all,

For a project in the context of the Linux kernel, I have been working on tracking address, control and data dependencies on IR-level, using two module passes.

To cut a long story short, I came across the following example of a control dependency. The pseudo code is based on some optimised IR. Think of READ_ONCE() and WRITE_ONCE() as atomic reads / writes:

int *x, *y;

int foo()
{
/* More code */

loop:
/* More code */

if(READ_ONCE(x)) {
WRITE_ONCE(y, 42);
return 0;
}

/* More code */

goto loop;

/* More code */
}

I have seen control dependencies being defined in terms of the first
basic block that post-dominates the basic block of the if-condition,
that is the first basic block control flow must take to reach the
function return regardless of what the if condition returned.

E.g. [1] defines control dependencies as follows:

A statement y is said to be control dependent on another statement x
if (1) there exists a nontrivial path from x to y such that every
statement z != x in the path is post-dominated by y, and (2) x is not
post-dominated by y.

However, assuming that foo() will return, the above definition of control dependencies cannot be applied here as the WRITE_ONCE() is in the same basic block as the return statement. It therefore trivially post-dominates all other BB’s => no control dependency.

Here is an alternative definition I came up with:

A basic block B is control-dependent on a basic block A if
B is reachable from A, but control flow can take a path through A
which avoids B. The scope of a control dependency ends at the first
basic block where all control flow paths running through A meet.

I had tried using LLVM’s PostDominatorTrees for the above case, but without luck, which is why I adapted my definition.

What do you think? I might be missing something here. Comments and pointers very welcome :slight_smile:

I have also shared the above on the Linux kernel mailing list in case anyone is interested [2].

Many thanks,
Paul


[1]: Optimizing Compilers for Modern Architectures: A Dependence-Based
Approach, Randy Allen, Ken Kennedy, 2002, p. 350
[2]: [PATCH RFC] tools/memory-model: Adjust ctrl dependency definition - Paul Heidekrüger

Can you explain a bit more of what you’re trying to do? Maybe it’s easier to start there than attempting to define dominators.
If you are looking to track which reads may depend on which writes we have MemorySSA.

The basic block containing WRITE_ONCE() (‘y’ in the textbook definition) is control dependent on the basic block that terminates with if(READ_ONCE(x)) ( ‘x’ in the textbook definition).

Assuming you structure/simplify your CFG so that all functions have a single exit block, there would be an edge from the basic block containing WRITE_ONCE() (‘y’) to that exit block. The exit block post-dominates every other basic block in the function including the basic block containing if(READ_ONCE(x)) (‘x’), but the basic block containing WRITE_ONCE() itself does not, so the “x is not post-dominated by y” is true here.

As for condition (1) in the text book definition, I believe “nontrivial path” means “x” is not the same basic block as “y”, and since there is just one edge going from x to y, there are no other (‘z’) basic blocks in this path to check to see if they are post-dominated by ‘y’, so this condition is true as well.

Redefining a term to mean something subtly but importantly different from the standard definition is a recipe for disaster.

Control-dependence is defined in terms of the postdominator tree. If you look at algorithms that rely on control-dependence, then this strict reliance on the postdominator tree is actually necessary. Note that where control-dependence is discussed, it’s actually necessary to consider that there is a pseudo-entry block which dominates the entry block, a pseudo-exit block which post dominates every return statement in the function and every loop with no exit blocks, and an additional pseudo-edge between the pseudo-entry and pseudo-exit block that bypasses the entire function. (I don’t have examples to hand of where this matters, but I know from experience that if you neglect this, your resulting control-dependence graph would lead you to erroneously conclude that some basic blocks have the same control dependencies in a few cases).

This definition does sometimes produce counterintuitive results in loops. But it’s important to remember that the standard control-flow-based definitions already produce counterintuitive results. Consider the following code:

while (1) {
  // ....
  if (test) {
    // Syntactically, this block is in the loop.
    // Query LoopInfo, and you'll find that this is *not* part of the loop, however.
    break;
  }
}

When a loop has a single exiting block, then that means that said block postdominates every basic block inside of a natural loop. This does cause the resulting control-dependence graph to be somewhat inverted compared to what you would naively expect from the corresponding if statements–everything in the loop is control-dependent on the loop exiting block, and things get extra spicy when the loop exiting block is not always executed every loop iteration. (I’d include diagrams here, but to my knowledge, Discourse still chokes on HTML emails, so there’s no way for me to do so). While this does look weird, when it comes time to relying on the resulting graph to make assertions about blocks–for example, basic blocks that share the same control-dependencies being such that if any one block is executed N times, all are executed exactly N times–the definition you get is correct, and trying to tweak it to make it look more “normal” leads to incorrect results.

In short, then, control-dependencies end up being counterintuitive when the condition you’re inquiring about is a condition that controls whether or not the loop will finish executing.