Before doing that tidyup I decided to have a go at the "simple" thing
- doing unreachable code on templates instead of their instantiations.
No doubt it's not the tidiest implementation (suppressing all the
other analysis based warnings when 'dependent' is true is a bit
intrusive/repetitive) but it seems to show the concept fairly well.
I don't think we'd discussed in detail what problems might arise from
this approach - certainly we know cases that we can't prove (such as a
dependent function call which may or may not be noreturn) but
non-dependent calls to noreturn functions, for example, seem to work
correctly (without regressing your test cases that cover the dependent
noreturn case).
Are there other cases I've missed where this approach could be problematic?
Hi David,
Thank you for being persistent.
I think this approach is worth pursuing further only if there is an
algorithmic justification that it is doing the right thing. Since each
instantiation of a template can have slightly different control-flow (mainly
due to noreturn, but there may be other cases we haven't considered), we'd
need to be able to say that the "control-flow" of the uninstantiated
template represents some kind of superset of the possible control-flow when
the template is instantiated (or at least a superset of the statements that
are reachable).
If 'noreturn' is the only issue, then this may actually work. The
uninstantiated template may have 'noreturn' calls, but by definition those
will appear in the instantiated template as well. Ab instantiation may have
additional 'noreturn' calls, but my understanding is that this will only
cause more code to become unreachable in the particular instantiation. Thus
as far as 'noreturn' is concerned, the uninstantiated template represents a
correct under-approximation of the unreachable code in all instantiation.
Is this logic sound?
I think the generalization goes something like this:
To get a false positive with this approach it would be necessary for
edges to be added to the CFG during the transformation from template
pattern to template instantiation.
So far as I can think this shouldn't be the case - my only concern
would be exceptions. But it seems catch blocks are considered
reachable even when the corresponding try has no code at all, so, at
least for now, these aren't an issue. (but if we started adding in
edges to catch blocks only when they were reachable, we'd have to err
on the side of caution for any dependent expressions that might throw
- adding edges to all catch blocks in such a case, rather than adding
no edges initially & only adding the required edges at instantiation
time)
If it is, my main concern then comes down to the following:
(1) Are there are other issues that we are not considering where the
uninstantiated template doesn't faithfully provide an under approximation of
the reachable code?
I've tested the exception case above & that seems to be always
reachable anyway. I'll try my hand at looking through the template
instnantiation code to see if there's any cases where we might end up
adding edges rather than removing them.
(2) If we decide this approach is viable, which uninstantiated templates do
we analyze? For performance reasons, I don't think we should analyze all
the uninstantiated templates, as we may find ourselves repeatedly analyzing
a huge portion of the Boost and the STL, etc. My suggestion is that when we
encounter an instantiation, we go and check if we have already analyzed the
uninstantiated template for this warning, and if not, do the analysis.
By repeatedly, you mean in each translation unit, right? Because we
shouldn't have to visit them more than once per TU (we won't be
visiting them per insntantiation). But, yes, the idea of analyzing
every boost template someone includes when they're only using a subset
does sound a little costly - I might be interested to compare times,
though.
Only analyzing instantiated templates (& keeping a list of those
templates we've already seen/analyzed) seems OK to me, assuming that
didn't represent a prohibitive memory cost to maintain the set of
visited templates (can't see why it should, but it is a small
time/space tradeoff all the same).
(3) Somewhat tangential, we would want to analyze explicit template
instantiations an their own, since they represent "concrete" functions that
were provided, not instantiated ones.
I'll double check this, but I assume my prototype changes already
account for this, since they didn't regress your test cases.
(4) I don't know the if the CFGBuilder is going to work all that well on
uninstantiated templates. There may be bugs here, or other fundamental
issues (which i don't know about).
I'm planning to keep plodding along compiling LLVM & Clang with
-Weverything (with a few things disabled) so I'll see what comes of
it. With my prototype I didn't have any regressions across the Clang
test suite, for what that's worth.
(5) We wouldn't want to do this approach wholesale for all analyses. I
think it mainly makes sense for the reachable code warnings, e.g.
-Wunreachable-code or warning if a function doesn't return. Analyses like
-Wuninitialized, etc., would want to keep working on instantiated templates
since (a) they need the more precise semantics of the instantiated templates
to work correctly and (b) it is okay if we flag a warning in one
instantiation that doesn't appear in an other (since that is a genuine
issue).
The prototype patch attached to my previous email already accounted
for this (I think) - just not in a very nice way.
To formalize this requirement, though: Warnings that we only want to
emit in reachable code should use a pessimistic definition of
reachable code (err on the side of assuming code is unreachable) to
avoid false positives. Warnings about unreachable code should use a
pessimistic definition of unreachable code (err on the side of
assuming code is reachable) to avoid false positives.
So, yes, unreachable warnings can be done on templates but array
bounds checking, return paths, etc, should be done on instantiations.
Though, ideally, we could find some way to emit
instantiation-independent problems once, rather than once for every
instantiation (see the array bounds test case modifications in my
previous patch for an example that exposes this problem) - either by
analyzing the template first & trying to avoid
instantiation-independent cases when we analyze the instantiation, or
simply by filtering out the duplicates in some way whenever we visit
an instantiation.
Thanks again for your help,
- David