new -Wuninitialized implementation in Clang

Hi fellow Clangers,

During the last two weeks, I have been working on implementing -Wuninitialized in Clang. Clang currently doesn't implement this warning, and is a glaring feature deficiency when compared with GCC.

Unlike GCC's implementation, which is based on the backend optimizer, Clang's implementation is based on dataflow analysis in the frontend. This means the warning still works at -O0.

Now that there is a prototype of this feature in TOT Clang, I wanted to open up the list to general discussion of this feature, its deployment, and what expectations users should have.

For some background, because GCC's implementation of -Wuninitialized is based on the optimizer, the results of the warning can differ depending on the flags passed to the compiler. For example:

(1) The warnings can vary depending on the optimization level selected.

(2) The warnings can vary depending on the target architecture.

(3) The warnings can vary depending on which version of GCC you are using.

While I am not 100% certain, I also suspect that GCC's implementation is not completely sound, and will not flag warnings in some cases. My hypothesis is that this is done to avoid spurious false positive warnings, leaving the impression that the analysis is smarter than it actually is (while possibly missing real issues).

My design goal for -Wuninitialized was threefold:

(a) Works under -O0. Users care about seeing these warnings the most when they are doing their debug builds.

(b) Has consistent results that are invariant of the optimization flags, target architecture, phases of the moon, etc.

(c) Provides predictable results that are (for the most part) sound and complete.

(d) Has marginal impact on compile time.

The last three goals mean that the analysis can only do limited reasoning about control-dependencies, e.g.:

  int x;
  ...
  if (flag)
    x = ...
  ...
  if (flag)
    use(x);

Inherently analyzing this code correctly requires path-sensitive analysis, which inherently has exponential cost in the general case. There are tricks where we can mitigate such algorithmic complexity for some common cases, but handling these control-dependencies in general is something that really is in the purview of the static analyzer. Amazingly, GCC often doesn't flag warnings in such cases, but I suspect that it is because GCC is silently dropping warnings in some cases where it deems it can't accurately reason precisely about a given variable.

My proposal is that Clang's analysis errs on the side of producing more warnings instead of worrying about such control-dependencies. This means that for the above code example that Clang would emit a warning, even when no use of an uninitialized variable is possible. My rationale is twofold:

(a) The cost of initializing a variable is usually miniscule.

(b) Users get predictable results, and the compiler doesn't play games when deciding when to emit a warning in the face of control-dependencies that it cannot reason about.

*** Question #1 ***

Is this a reasonable level of behavior we can set for this warning that users will accept? I have received reports from Nico and Chandler that initially this warning produced copious warnings on Chrome, but now my understand that the number of warnings is down to 11 (which seems quite manageable to me, given that 2 of the issues were cases that GCC didn't flag).

As a bonus, Clang's warnings also include Fixit hints to the user on how to initialize the variable to silence the warning. With the proper editor support, I think responding to Clang's warnings requires minimal effort from the developer.

For examples of what Clang's -Wuninitialized warns about, we have several tests available now in the clang test suite:

  test/Sema/uninit-variables.c
  test/SemaObjC/uninit-variables.m
  test/SemaCXX/uninit-variables.cpp

*** Question #2 ***

Another point I want to raise (which was brought up by Chandler) is whether or not uninitialized variables checking should continue to be under the -Wuninitialized flag? If we go with the behavior that I propose, we are deviating from GCC's behavior with a more "sound" (but noisy) analysis. Since several checks in clang are bundled under the -Wuninitialized flag, there may be some argument to splitting these under separate flags.

Is there a general desire to do this? If so, why?

My personal feeling is that things should be kept simple: keep a single -Wuninitialized flag that turns on all uninitialized variables checking. Users can then either suppress the warning by initializing their variables (which I argue is cheap and overall good for code cleanliness) or use pragmas to shut up the warning entirely in regions of their code. The latter may be a gross solution, but it is there for those who want it.

Thoughts? Comments?

Cheers,
Ted

Since several checks in clang are bundled under the -Wuninitialized flag, there may be some argument to splitting these under separate flags.

Is there a general desire to do this? If so, why?

My primary desire to allow controlling this with more granularity is to help ease and stage the rollout of these types of checks on a very large codebase.

However, I don’t think having such granularity is actually in conflict with this principle, which I agree on:

My personal feeling is that things should be kept simple: keep a single -Wuninitialized flag that turns on all uninitialized variables checking.

I think we can have a pretty reasonable setup using the existing diagnostic group infrastructure:

  • one flag that turns all uninit checking on (or off with the no-… variant)
  • several flags that turn specific types / strategies on (or off with the no-… variant)
  • last flag (of either form) wins

This would allow patterns like “-Wuninitialized -Wno-foo-uninitialized” to get everything except a specific feature that doesn’t work for a project or codebase, and “-Wno-uninitialized -Wfoo-uninitialized” to get just one type of warning that a project cares about.

Most users should focus on the simple big switch, and never care about the details.

Users can then either suppress the warning by initializing their variables (which I argue is cheap and overall good for code cleanliness) or use pragmas to shut up the warning entirely in regions of their code. The latter may be a gross solution, but it is there for those who want it.

The problem in some cases isn’t how you silence the warning, but the necessity of code changes to do so. That’s why I’m suggesting some flag based controls.

Something else I’d like to explore (perhaps on a different thread) is whether we could devise a more friendly syntax for a user to explicitly acknowledge the lack of initialization on a variable as intentional. Pragmas are really gross, and not a precise instrument. I’m imagining:

HugePODStruct foo [[uninitialized]]; // Performance sensitive!

as something that can be used in those very rare cases where it matters, without polluting the code with pragmas, and loosing the checking of other variables that aren’t so sensitive.

FWIW, I’m willing to contribute patches toward both proposals made here if we can agree on exactly what they should look like.

Ted Kremenek wrote:

My design goal for -Wuninitialized was threefold:

(a) Works under -O0. Users care about seeing these warnings the most when
they are doing their debug builds.

(b) Has consistent results that are invariant of the optimization flags,
target architecture, phases of the moon, etc.

(c) Provides predictable results that are (for the most part) sound and
complete.

(d) Has marginal impact on compile time.

The last three goals mean that the analysis can only do limited reasoning
about control-dependencies, e.g.:

  int x;
  ...
  if (flag)
    x = ...
  ...
  if (flag)
    use(x);

Inherently analyzing this code correctly requires path-sensitive analysis,
which inherently has exponential cost in the general case. There are
tricks where we can mitigate such algorithmic complexity for some common
cases, but handling these control-dependencies in general is something
that really is in the purview of the static analyzer. Amazingly, GCC
often doesn't flag warnings in such cases, but I suspect that it is
because GCC is silently dropping warnings in some cases where it deems it
can't accurately reason precisely about a given variable.

Hmm, I would personally like to not initialize variables that I know are not
read from, as in your above code example. I personally regard it as bad
coding style to initialize variables with values not really used by the
code.

Would it be reasonable to have a flag that enables clang to try harder to
not warn in such cases? I haven't any clue on how much this slows down
clang's performance, but I think I'm willing to take a compile-time speed
drop equivalent to a -O2.

I don't have a strong opinion either way, I figured I'd just give some
feedback on my experience.

I went through the chrome codebase and looked at all the new
uninitialized warnings now reported by -Wuninitialized. Here is a
patch that suppresses them all: Issue 6368094: -Wuninitialized fixes - Code Review

After last week's improvements to -Wuninitialized, about 20 warnings
are still left. Most of them are false positives. The ones that
aren't, are not very interesting or hard to understand. Examples:

http://codereview.chromium.org/6368094/diff/1/base/third_party/dmg_fp/dtoa.cc
clang complains that the variable mlo declared in line 3572 might be
read uninitialized in line 4185 (you can click "Expand all" to see all
the code after the declaration.) This may or may not be true, the
function is very long and full of gotos, and clang doesn't tell me
_why_ it thinks this might be true. My guess was that the goto in line
4040 was taken, and before that one of the two "goto one_digit;". If
this is correct, then this is indeed a false positive because in both
cases mhi is set to 0 and mlo is read only if mhi != 0.

There are many (~10) warnings because of -Wuninitialized not tracking
control dependencies (from what I understand), e.g.
http://codereview.chromium.org/6368094/diff/1/chrome/common/gpu_messages.cc

Yet code that looks very similar does not warn:

    double expiry;

    if (!state->GetBoolean("include_subdomains", &include_subdomains) ||
        !state->GetString("mode", &mode_string) ||
        !state->GetDouble("expiry", &expiry)) {
      continue;
    }

This code (not chrome code), which _does have_ a legitimate
uninitialized read, on the other hand does _not_ warn:

bool g(int* a) {
  return true;
}

int f() {
  int a;
  if (g(&a)) return a;
  return a;
}

My personal impression is that it can be difficult to understand when
this new mechanism warns, and most of the time when it warns, it's
warns about false positives.

To be fair, there are a few real issues that it found:
http://codereview.chromium.org/6368094/diff/1/ipc/ipc_message_utils.cc
This was missing a check, but the warning doesn't go away even with
the correct check because of one of the known false positives
mentioned above :slight_smile:

http://codereview.chromium.org/6368094/diff/1/webkit.patch
Here the code is basically:
int var;
if (virtualFunction()) var = 0;
if (virtualFunction()) use(var);
A subclass might return a different result for virtualFunction() on
each call, so this is a valid warning. But when I stored
virtualFunction() in a bool, the warning didn't go away either (again
due to control dependencies?)

This is valid too, I think:
http://codereview.chromium.org/6368094/diff/1/chrome/browser/dom_ui/options/browser_options_handler.cc
After some digging, I learned that chrome's LOG_FATAL can in theory
not terminate the process; but even after I made it always do that,
the warning didn't go away (maybe rightfully so – as I said, it's a
bit hard to understand _why_ these warnings happen.

If it were possible to disable -Wmaybe-uninitialized while keeping
some of -Wuninitialized around, I'd do that for chrome. If that's not
possible, I'd probably try to add the pointless initializations in
this CL and turn on -Wuninitialized.

Have people generally found that this new mechanism reports useful
things in practice?

Nico

Yet code that looks very similar does not warn:

   double expiry;

   if (!state->GetBoolean("include_subdomains", &include_subdomains) ||
       !state->GetString("mode", &mode_string) ||
       !state->GetDouble("expiry", &expiry)) {
     continue;
   }

Because this idiom is so common and because it is localized I was able to engineer the analysis to have just the necessary path-sensitivity to handle this case.

This code (not chrome code), which _does have_ a legitimate
uninitialized read, on the other hand does _not_ warn:

bool g(int* a) {
return true;
}

int f() {
int a;
if (g(&a)) return a;
return a;
}

The analysis is not inter-procedural. It gives up when it sees the address of a variable taken.

Note that GCC will only warn about this case if it decides to inline the call to 'g'. If 'g' is in another source file, or simply isn't inlined, GCC doesn't emit a warning either.

My personal impression is that it can be difficult to understand when
this new mechanism warns, and most of the time when it warns, it's
warns about false positives.

I disagree. I think the behavior is very predictable. The analysis conservatively scans control-flow branches for uses of uninitialized variables, and takes a conservative estimate at confluence points. This is the the textbook example of uninitialized variables.

The cases where it doesn't warn are:

1) The address of a variable is taken. This is necessary because without pointer analysis one cannot determine if the value has escaped and has been initialized somewhere else.

2) The few cases where we can engineer the analysis to be smart enough to get just amount of path-sensitivity to not warn. The most common case I saw was the if(x && y && ...) idiom, and that can be easily handled because it is localized.

We can make the analysis more predictable by removing (2), but it would make the warning far more noisy.

Hmm, I would personally like to not initialize variables that I know are not
read from, as in your above code example. I personally regard it as bad
coding style to initialize variables with values not really used by the
code.

That's a fair argument, although I would also argue that it is bad style to have variables that could potentially *never* be initialized.

Would it be reasonable to have a flag that enables clang to try harder to
not warn in such cases? I haven't any clue on how much this slows down
clang's performance, but I think I'm willing to take a compile-time speed
drop equivalent to a -O2.

It's not just a matter of making the analysis smarter. That inherently requires exponential time in the general case. Nobody is willing to take that kind of compile-time hit.

Some control-dependencies can possibly be handled by doing some amount of abstract interpretation on boolean values, and then composing those with the uninitialized values dataflow analysis. I'm not certain what kind of cost that would be in practice. As an optimization, that kind of analysis could *possibly* be done as a secondary pass once a use of an uninitialized value is discovered. That way the additional analysis cost is only paid when code uses this kind of coding idiom.

For example, suppose we have:

  int x;
  ...
  if (y)
   x = ...
  ...
  if (y)
    use(x)

The analysis believes that 'x' is used uninitialized because it thinks there is a false path where the first branch is 'false' and the second branch is 'true'. This is easy to recover from the CFG since the analysis computes dataflow values for each basic block. A reverse dataflow analysis from the use to the definition that accounts for the values of flags could then possibly prune out the paths that aren't possible, and thus show that the uninitialized use is infeasible. Note that this would only work for *very simple* cases, but that might be enough to get the behavior people are expecting with a marginal analysis cost.

I follow this philosophy as well (but my coworkers think I'm crazy). While blindly initializing all variables to, say, zero will decrease the likelihood of nondeterministic runtime behavior, it doesn't necessarily contribute to program correctness. What if the above example was "fixed" to suppress a spurious warning, but is then extended during maintenance:

int x = 0;
...
if (flag)
  x = ...
...
if (flag)
  use(x);
...
use(x);

and use(0) is wrong. While that may seem contrived, I saw it happen in a loop once, and use(0) turned out to be expensive.

John

I'm not certain if the second statement is truly fair. This is code that has already been vetted for uses of uninitialized values. Your point is taken, however, that the remaining false positives are noise that GCC doesn't warn about. The question I'd rather we focus on asking is what should we do about those cases?

While my suggestion above about annotations to indicate acknowledged lack of initialization were intended primarily for performance sensitive code paths, the approach could be used in code paths where humans trivially conclude that initialization isn’t needed, and would prefer explicitly saying that rather than doing a redundant explicit initialization.

Improving the smarts of the diagnostic to simply detect the initialization is appealing in the sense that it makes the warnings go away without code changes, but less appealing as you’ve indicated because it actually makes the warnings less predictable and more expensive to provide.

Hmm, I would personally like to not initialize variables that I know are not
read from, as in your above code example. I personally regard it as bad
coding style to initialize variables with values not really used by the
code.

Me as well. Also, think about the tools like valgrind - initializing
here will silent a warning here.
Btw, also, what's about clang analyzer? It seems, this approach will
make life of the more sophisticated tools much harder.

Ted Kremenek wrote:

<snip>

Hmm, I would personally like to not initialize variables that I know are not
read from, as in your above code example. I personally regard it as bad
coding style to initialize variables with values not really used by the
code.

I follow this philosophy as well (but my coworkers think I'm crazy). While blindly initializing all variables to, say, zero will decrease the likelihood of nondeterministic runtime behavior, it doesn't necessarily contribute to program correctness. What if the above example was "fixed" to suppress a spurious warning, but is then extended during maintenance:

int x = 0;
...
if (flag)
x = ...
...
if (flag)
use(x);
...
use(x);

and use(0) is wrong. While that may seem contrived, I saw it happen in a loop once, and use(0) turned out to be expensive.

I strongly disagree. How can use(0); be worse than use(garbage that can potentially be zero or that can potentially crash your app causing data corruption);

My personal impression is that it can be difficult to understand when
this new mechanism warns, and most of the time when it warns, it's
warns about false positives.

I'm not certain if the second statement is truly fair. This is code that has already been vetted for uses of uninitialized values.

Yes. Then again, this is probably true for many existing code bases :slight_smile:

If you're interested in data on how this does on new code, I can
temporarily check in my workaround changes for chrome, turn the
warning on on our clang/mac bot, and in two weeks send a report on the
things it warned about.

Your point is taken, however, that the remaining false positives are noise that GCC doesn't warn about. The question I'd rather we focus on asking is what should we do about those cases?

Suggestions:
* I found your above explanation of how the analysis works very
helpful. Maybe there could be a page with this explanation, it could
also list common false positives and recommended work-arounds.
* Maybe add special cases for more common patterns to the checker. In
chrome, |bool b = get(&a); b = b && get(&b); if(!b) return; use(a,
b);| is somewhat common, and complements the existing |if(x && y &
...)| special case.
* If it warns, provide a way to explain the warning ("Variable could
be uninitialized on this path: declared here, else branch here, then
goto here, use here")

Unrelated: From an implementation perspective, s it reasonable to
propagate __attribute__((noreturn))? I noticed that |int a; _exit(0);
use(a);| does not warn, but |void g() { _exit(0); } int a; g();
use(a);| does.

Nico

This requires inter-procedural analysis, which compiler warnings shouldn't get in the business of doing. Instead, I believe we should warn that 'g' isn't labeled attribute 'noreturn' when it could be and have the user fix their code.

Suggestions:
* I found your above explanation of how the analysis works very
helpful. Maybe there could be a page with this explanation, it could
also list common false positives and recommended work-arounds.

Yes, documenting the expected behavior makes a lot of sense.

* Maybe add special cases for more common patterns to the checker. In
chrome, |bool b = get(&a); b = b && get(&b); if(!b) return; use(a,
b);| is somewhat common, and complements the existing |if(x && y &
...)| special case.

The 'bool b = ...' case isn't trivial to handle because it inherently requires tracking the value of 'b' across multiple statements and possibly multiple basic blocks. It basically requires its own dataflow analysis to handle it generally enough for its behavior to be consistent and predictable.

* If it warns, provide a way to explain the warning ("Variable could
be uninitialized on this path: declared here, else branch here, then
goto here, use here")

I'm mixed on this one. This could possibly be done with notes, but I somewhat feel that this should remain the purview of the static analyzer. The whole point of this warning is to be simple and to catch obvious uses of uninitialized values.

The reason that I am mixed is that if an uninitialized value warning requires a complicated explanation of what the code did to trigger the warning, I see two possibilities:

1) The code is too complicated for the simple -Wuninitialized analysis to handle, probably because of control-dependencies.

2) The code's logic to initialize a variable is inherently complicated anyway.

Concerning (1), the options I see are:

(a) make the analysis smarter (if possible), either erring on suppressing warnings or just doing a better job.

(b) have the user disable the warning (for the particular case)

Concerning (2), for me if it takes longer than 5 seconds to be able to reason if the code initializes a variable then I think the code is structured poorly. In such cases, even if the warning is wrong (and the variable is initialized) there is value in the warning because it indicates the code is overly complex. From what I can tell, however, that's probably an outlier opinion on this thread.

True, but the example I gave for this (
http://codereview.chromium.org/6368094/diff/1/base/third_party/dmg_fp/dtoa.cc
) is 20 year old code that works well and that we'd ideally never want
to touch. (I guess we shouldn't compile it with all warnings turned on
then, but so far we hadn't had problems with that.)

Obviously having non-deterministic behavior like that is bad. However, when you declare x uninitialized, the static analyzer *can* correctly determine that the first use(x) is valid while the second one is not. It would really suck to make our smart analysis less effective because everyone is structuring their code to please the dumb analysis.

I suppose for those willing to eat the performance cost, you could just compile without -Wunintialized and always uses the static analyzer's uninitialized checker.

-Henry

Ted Kremenek wrote:

<snip>

Hmm, I would personally like to not initialize variables that I
know are not
read from, as in your above code example. I personally regard it as
bad
coding style to initialize variables with values not really used by
the
code.

I follow this philosophy as well (but my coworkers think I'm
crazy). While blindly initializing all variables to, say, zero will
decrease the likelihood of nondeterministic runtime behavior, it
doesn't necessarily contribute to program correctness. What if the
above example was "fixed" to suppress a spurious warning, but is
then extended during maintenance:

int x = 0;
...
if (flag)
x = ...
...
if (flag)
use(x);
...
use(x);

and use(0) is wrong. While that may seem contrived, I saw it happen
in a loop once, and use(0) turned out to be expensive.

I strongly disagree. How can use(0); be worse than use(garbage that
can potentially be zero or that can potentially crash your app causing
data corruption);

It is worse because it masks a logical bug which which is masked out by this initialization.
Better to have an occasional crash than something which seems to 'work'.

- fariborz

If I could choose I rather would get correct warnings instead of false
positives that in the require me to initialize everything everywhere.

I am not concerned with the cost. The extra compile time is offset by
being able to trust the warning instead of digging through to verify
it.

:slight_smile:

I asked David Li (one of our gcc people) about gcc's -Wuninitialized
warning, and here are his answers:

Hi fellow Clangers,

During the last two weeks, I have been working on implementing -Wuninitialized in Clang. Clang currently doesn't implement this warning, and is a glaring feature deficiency when compared with GCC.

Unlike GCC's implementation, which is based on the backend optimizer, Clang's implementation is based on dataflow analysis in the frontend. This means the warning still works at -O0.

Now that there is a prototype of this feature in TOT Clang, I wanted to open up the list to general discussion of this feature, its deployment, and what expectations users should have.

For some background, because GCC's implementation of -Wuninitialized is based on the optimizer, the results of the warning can differ depending on the flags passed to the compiler. For example:

(1) The warnings can vary depending on the optimization level selected.

(2) The warnings can vary depending on the target architecture.

(3) The warnings can vary depending on which version of GCC you are using.

While I am not 100% certain, I also suspect that GCC's implementation is not completely sound, and will not flag warnings in some cases. My hypothesis is that this is done to avoid spurious false positive warnings, leaving the impression that the analysis is smarter than it actually is (while possibly missing real issues).

It is true that gcc's warning is not sound -- i.e, false negatives,
but they are very rare. However it is not because gcc uses some
heuristics to suppress possible spurious warning, but because the some
optimization may make it go away -- e.g, constant propagation. This is
in fact bugs, not features. Again, these are rare, and gcc's uninit
analysis is sound -- no false negatives. Note that the soundness is
about non-aliased scalar variables. If the variable is aliased with
indirect accesses (may alias) or address exposed to function calls, no
intra-procedural analysis is sound.

[He means that gcc will miss warnings when a variable may be
initialized inside a function call, but that it always errs on the
side of false positives for variables whose addresses don't escape.]

My design goal for -Wuninitialized was threefold:

(a) Works under -O0. Users care about seeing these warnings the most when they are doing their debug builds.

(b) Has consistent results that are invariant of the optimization flags, target architecture, phases of the moon, etc.

(c) Provides predictable results that are (for the most part) sound and complete.

(d) Has marginal impact on compile time.

The last three goals mean that the analysis can only do limited reasoning about control-dependencies, e.g.:

int x;
...
if (flag)
x = ...
...
if (flag)
use(x);

Inherently analyzing this code correctly requires path-sensitive analysis, which inherently has exponential cost in the general case. There are tricks where we can mitigate such algorithmic complexity for some common cases, but handling these control-dependencies in general is something that really is in the purview of the static analyzer. Amazingly, GCC often doesn't flag warnings in such cases, but I suspect that it is because GCC is silently dropping warnings in some cases where it deems it can't accurately reason precisely about a given variable.

False -- gcc does not silently drops warnings -- that is evil.
Instead, Gcc has predicate aware analysis. Gcc's uninit analysis by
itself is sound.

My proposal is that Clang's analysis errs on the side of producing more warnings instead of worrying about such control-dependencies. This means that for the above code example that Clang would emit a warning, even when no use of an uninitialized variable is possible. My rationale is twofold:

(a) The cost of initializing a variable is usually miniscule.

True about the cost -- and can possibly optimized further by compiler
(sinking etc).

False about the people's tolerance level on false positives. When
there are too many of them, people simply ignore all of them.

However, this might not be a problem for clang's FE based
implementation because most of the false positives are actually
exposed due to inlining which gcc has to face. The downside (a clear
drawback) for clang's FE based implementation is that it may have too
many false negatives for aliased scalars.

(b) Users get predictable results, and the compiler doesn't play games when deciding when to emit a warning in the face of control-dependencies that it cannot reason about.

Gcc does not play games on this -- but we certainly talked about it
when being accused of too many false positives.

Fantastici feedback. If GCC is using real predicate analysis to suppress false paths, then that is likely we would need to do in Clang's -Wuninitialized to meet with user expectations of this warning. I'm glad I was wrong about GCC possibly being unsound about suppressing warnings; that makes it clearer what can probably be done on Clang's side to meet with comparative false positive/false negative rates.