Working on the rest of PR10063: destructors and the CFG are causing issues with -Wreturn-type

So it appears that the remainder of the problems in PR10063 stem from the following pattern in a CFG:

int f5(int i) {
other o;
switch (i) {
default:
my_abort_struct();
}
}

Which has the CFG:

[ B4 (ENTRY) ]
Predecessors (0):
Successors (1): B2

[ B1 ]
1: [B2.2].~other() (Implicit destructor)
Predecessors (1): B3
Successors (1): B0

[ B2 ]
1:
2: other o;
3: i
4: [B2.3]
T: switch [B2.4]
Predecessors (1): B4
Successors (1): B3

[ B3 ]
default:
1: my_abort_struct()
2: [B3.1] (BindTemporary)
3: ~my_abort_struct() (Temporary object destructor)
Predecessors (1): B2
Successors (1): B1

[ B0 (EXIT) ]
Predecessors (1): B1
Successors (0):

Everything we need is here, but there is an unfortunate structure: the block (B3) with the no-return destructor is nat an immediate predecessor of the exit block. As a result, we never look within B3 for no-return destructors, and fail to recognize that B1 and B0 are dead, and that there is no possibility of reaching the exit block without returning.

There is a comment in AnalysisBasedWarnings which indicates that the correct solution may be to actually prevent B1 (or any other block) from being added as a successor of B3, and explicitly add the exit block as a successor so we don’t end up with an orphaned exit block. Is this the right approach to take? Can you provide any tips for how to implement this? I looked into it for a while, and it didn’t seem straightforward. I can find all of the places where we add implicit destructors to the CFG, and when one of them is added that in attributed no-return, clear the successors of the current block and add the exit block as a successor. However, it looks like in some cases we process implicit destructors before actually adding successor(s) to a block. In that case, clearing won’t help. What do you think is the cleanest place to put this in the CFGBuilder, and what is the cleanest strategy? Maybe we should track on the CFGBlock itself whether it is a ‘no-return’ block, and when adding successors check that bit and refuse to add any. Or we could catch when lazily creating a new block, and if the previous one contains any no-return elements clear its successors?

I also looked into an alternate solution – rather than only inspecting the immediate predecessors of the exit block in -Wreturn-type, we could actually walk the CFG from the exit block up the graph, stopping along each edge when we either reach a dead block, a return, or a no-return. This is more expensive, but a more isolated fix.

Thoughts? I’d really like to get these issues with the CFG fixed as we are running into more and more problems with this as we start to use CFG-based warnings and analyses more heavily. How I wish the dominant assertion mechanism in our codebase didn’t use a no-return destructor… ;]

Hey Ted (and any other analyzer hackers out there)

Everything we need is here, but there is an unfortunate structure: the block (B3) with the no-return destructor is nat an immediate predecessor of the exit block. As a result, we never look within B3 for no-return destructors, and fail to recognize that B1 and B0 are dead, and that there is no possibility of reaching the exit block without returning.

There is a comment in AnalysisBasedWarnings which indicates that the correct solution may be to actually prevent B1 (or any other block) from being added as a successor of B3, and explicitly add the exit block as a successor so we don’t end up with an orphaned exit block. Is this the right approach to take? Can you provide any tips for how to implement this? I looked into it for a while, and it didn’t seem straightforward. I can find all of the places where we add implicit destructors to the CFG, and when one of them is added that in attributed no-return, clear the successors of the current block and add the exit block as a successor. However, it looks like in some cases we process implicit destructors before actually adding successor(s) to a block. In that case, clearing won’t help. What do you think is the cleanest place to put this in the CFGBuilder, and what is the cleanest strategy? Maybe we should track on the CFGBlock itself whether it is a ‘no-return’ block, and when adding successors check that bit and refuse to add any. Or we could catch when lazily creating a new block, and if the previous one contains any no-return elements clear its successors?

So I dived into this with fresh eyes and I think I figured the various bits of it out. Now I understand better both what was going wrong, and what the FIXME about severing edges is really talking about.

The fundamental problem I’m hitting is that we special case the formation of CFGBlocks when a no-return construct is encountered. When such a block is hit, we ignore any successor-sequence already built and instead create a block with a direct and immediate successor of the exit block. This allows the analysis to quickly find the no-return CFGBlocks (and elements), and also ensures that the CFGBlocks that would have succeeded a no-return construct are correctly modeled as unreachable blocks.

In order to handle this situation when working with implicit destructors, we have to from a new block whenever we encounter one of these destructors, and correctly setup its successors. I’ve attached a patch which I think mostly does this. It adds a FIXME to the only part of the CFG builder where we don’t do this, because that part injects destructors into the beginning of a block. I need to think a bit more about how the block datastructures work before I’ll feel comfortable fixing that, as it’ll be a different form of split from the others. This code path only comes up when patching up gotos, so I’m not particularly concerned if the no-return modeling there is imperfect. =]

With this patch I get the following CFG for the code snippet I posted originally:

[ B4 (ENTRY) ]
Predecessors (0):
Successors (1): B2

[ B1 ]
1: [B2.2].~other() (Implicit destructor)
Predecessors (0):
Successors (1): B0

[ B2 ]
1:
2: other o;
3: i
4: [B2.3]
T: switch [B2.4]
Predecessors (1): B4
Successors (1): B3

[ B3 ]
default:
1: my_abort_struct()
2: [B3.1] (BindTemporary)
3: ~my_abort_struct() (Temporary object destructor)
Predecessors (1): B2
Successors (1): B0

[ B0 (EXIT) ]
Predecessors (2): B1 B3
Successors (0):

Here we correctly model that B1 cannot be reached, and B3 has the immediate edge to B0. If I change the code snippet to declare a variable of my_abort_struct type instead of create a temporary (thus exercising the other CFG-buiding path I changed in this patch) I get this CFG:

[ B4 (ENTRY) ]
Predecessors (0):
Successors (1): B2

[ B1 ]
1: [B2.2].~other() (Implicit destructor)
Predecessors (0):
Successors (1): B0

[ B2 ]
1:
2: other o;
3: i
4: [B2.3]
T: switch [B2.4]
Predecessors (1): B4
Successors (1): B3

[ B3 ]
default:
1:
2: my_abort_struct s;
3: [B3.2].~my_abort_struct() (Implicit destructor)
Predecessors (1): B2
Successors (1): B0

[ B0 (EXIT) ]
Predecessors (2): B1 B3
Successors (0):

Same story, the CFG correctly models the no-return destructor. For a somewhat trickier case, with 3 variables, the middle forming a no-return destructor you can even see that this correctly marks which destructor is unreachable!
code:

int f7(int i) {
other o;
switch (i) {
default:
other o1;
my_abort_struct s;
other o2;
}
}

CFG:

[ B5 (ENTRY) ]
Predecessors (0):
Successors (1): B2

[ B1 ]
1: [B2.2].~other() (Implicit destructor)
Predecessors (1): B3
Successors (1): B0

[ B2 ]
1:
2: other o;
3: i
4: [B2.3]
T: switch [B2.4]
Predecessors (1): B5
Successors (1): B4

[ B3 ]
1: [B4.2].~other() (Implicit destructor)
Predecessors (0):
Successors (1): B1

[ B4 ]
default:
1:
2: other o1;
3:
4: my_abort_struct s;
5:
6: other o2;
7: [B4.6].~other() (Implicit destructor)
8: [B4.4].~my_abort_struct() (Implicit destructor)
Predecessors (1): B2
Successors (1): B0

[ B0 (EXIT) ]
Predecessors (2): B1 B4
Successors (0):

I think this patch is a really useful improvement in accuracy, but it does add some non-trivial overhead to the process of adding implicit destructors into the CFG for automatic variables… Does it seem acceptable? I don’t see particularly cleaner ways of doing this… =/

If this patch is OK, or after any performance tuning you’d like to see here, I think I also better understand the FIXME in the analysis file. Do you actually want to switch the CFG to have no successors for blocks which contain a no-return element? That seems like it would be a very clean way of modelling this and would simplify the analysis for -Wreturn-type significantly. Let me know your thoughts here; I see a clean way to implement this after a couple of refactorings to consolidate the code I’m changing in this patch. (Currently, there is duplicate code. I left the refactoring and consolidation for a follow-up to make reviewing this one easier.)

Thanks!

fix-implicit-dtor-noreturn.patch (8.77 KB)

I think this patch is a really useful improvement in accuracy, but it does add some non-trivial overhead to the process of adding implicit destructors into the CFG for automatic variables… Does it seem acceptable? I don’t see particularly cleaner ways of doing this… =/

I think this is a reasonable approach.

By non-trivial overhead, are you talking about something that is actually measurable in terms of -fsyntax-only time? Do you have any numbers?

If this patch is OK, or after any performance tuning you’d like to see here, I think I also better understand the FIXME in the analysis file. Do you actually want to switch the CFG to have no successors for blocks which contain a no-return element?

No at all. That’s definitely not the invariant we are going for. We want all blocks to be reverse-reachable from the exit block. This is needed for servicing reverse dataflow analyses.

One possibility is to create a special empty block, “noreturn”, that all blocks ending with a noreturn destructor/call jump to. That block would then have “Exit” as its successor. -Wreturn-type could just walk the predecessors of Exit, and skip this block.

I think this patch is a really useful improvement in accuracy, but it does add some non-trivial overhead to the process of adding implicit destructors into the CFG for automatic variables… Does it seem acceptable? I don’t see particularly cleaner ways of doing this… =/

I think this is a reasonable approach.

By non-trivial overhead, are you talking about something that is actually measurable in terms of -fsyntax-only time? Do you have any numbers?

Not yet, easy to get though. The overhead is the SmallVector<…, 10> which I use to reverse the VarDecl*s prior to appending them. That shouldn’t have an observable impact, but I’ll measure it. =] I just wanted to see if I was on the right track at all.

If this patch is OK, or after any performance tuning you’d like to see here, I think I also better understand the FIXME in the analysis file. Do you actually want to switch the CFG to have no successors for blocks which contain a no-return element?

No at all. That’s definitely not the invariant we are going for. We want all blocks to be reverse-reachable from the exit block. This is needed for servicing reverse dataflow analyses.

Cool, it seemed a bit odd to me as well. I’ll follow up on this point on your other email.

That would definitely work.

The other idea I had was to sink an ‘isNoReturn’ bit into the CFGBlock itself and just mark blocks which appropriately. Then it would just walk the predecessors and skip all blocks with the bit set. It would also prevent having an extra block in the CFG. This might be no more costly than the added CFGBlock as we’d need some way to distinguish it from any other CFGBlock… I’d have to look at the implementation details of it though.

I’m happy with a special bit as well instead of a dedicated block.

I think you are on the right track.

Keep in mind, that we can also possibly change the internal representation of a CFGBlock if it makes it easier to do the splitting, etc., while still maintaining good performance. For example, we could possibly remove operator from CFGBlock, if removing the random access feature makes it easier to implement such changes with good performance.

Not yet, easy to get though. The overhead is the SmallVector<…, 10> which I use to reverse the VarDecl*s prior to appending them. That shouldn’t have an observable impact, but I’ll measure it. =] I just wanted to see if I was on the right track at all.

I think you are on the right track.

Cool. I ran some performance numbers. Here is my methodology:

First I created a new cfg stress test which I will check in. its much like the others, but this one creates over 32k variable declarations spread through out nested scopes so that they have overlapping lifetimes. Each is clustered in 32 variable declarations within a particular scope. These are a mixture of types with noreturn and normal destructors. This is a worst case scenario!!!

That explodes the number of CFG blocks from 2152 blocks to 33850!!! This is due to the correctness change of actually modeling that the block terminates after each noreturn destructor.

Despite this explosion in the number of CFG blocks, I can measure no regression in -fsyntax-only performance between the two… In fact, with my patch, it appears to be faster for some reason! I don’t really understand this other than that my patch causes the push_back style growth of the BumpVector instead of insert and assignment… Even then I suspect that we’re just well below the measuring sensitivity:

% perf stat -r5 ./bin/old_clang -fsyntax-only -Wreturn-type …/tools/clang/INPUTS/cfg-nested-var-scopes.cpp

Performance counter stats for ‘./bin/old_clang -fsyntax-only -Wreturn-type …/tools/clang/INPUTS/cfg-nested-var-scopes.cpp’ (5 runs):

1083.346099 task-clock-msecs # 0.996 CPUs ( ± 0.194% )
112 context-switches # 0.000 M/sec ( ± 0.356% )
1 CPU-migrations # 0.000 M/sec ( ± 28.571% )
15025 page-faults # 0.014 M/sec ( ± 0.003% )
2739266060 cycles # 2528.523 M/sec ( ± 0.170% )
1781584392 instructions # 0.650 IPC ( ± 0.061% )
332083620 branches # 306.535 M/sec ( ± 0.061% )
22089492 branch-misses # 6.652 % ( ± 0.160% )
48828985 cache-references # 45.072 M/sec ( ± 0.564% )
936082 cache-misses # 0.864 M/sec ( ± 0.327% )

1.087202120 seconds time elapsed ( ± 0.201% )

% perf stat -r5 ./bin/clang -fsyntax-only -Wreturn-type …/tools/clang/INPUTS/cfg-nested-var-scopes.cpp

Performance counter stats for ‘./bin/clang -fsyntax-only -Wreturn-type …/tools/clang/INPUTS/cfg-nested-var-scopes.cpp’ (5 runs):

1066.387627 task-clock-msecs # 0.997 CPUs ( ± 0.245% )
110 context-switches # 0.000 M/sec ( ± 0.407% )
1 CPU-migrations # 0.000 M/sec ( ± 16.667% )
16428 page-faults # 0.015 M/sec ( ± 0.004% )
2696143767 cycles # 2528.296 M/sec ( ± 0.213% )
1842263749 instructions # 0.683 IPC ( ± 0.062% )
343370993 branches # 321.995 M/sec ( ± 0.068% )
22275654 branch-misses # 6.487 % ( ± 0.269% )
46956180 cache-references # 44.033 M/sec ( ± 0.245% )
1126887 cache-misses # 1.057 M/sec ( ± 0.339% )

1.069884336 seconds time elapsed ( ± 0.247% )

Unless you see something fishy, I’ll plan on committing this and starting on some of the cleanups.

Keep in mind, that we can also possibly change the internal representation of a CFGBlock if it makes it easier to do the splitting, etc., while still maintaining good performance. For example, we could possibly remove operator from CFGBlock, if removing the random access feature makes it easier to implement such changes with good performance.

Yea, this might be interesting long term… however before we go that route I want to have a benchmark that actually slows down. Buliding the CFG is fast right now… ridiculously fast… so my focus will be elsewhere. =]

Looks great. Please charge ahead!