warning request: partial array initialisation

Hi,

I'd really like to be able to get a warning when "almost all" elements
of an array are initialised:

  #define N 8
  const int a[N] = { 3, 1, 4, 1, 5, 9, 2 };
  warning: some but not all array elements initialised

This would be useful because we have a large codebase where N is
defined in some common header and there are lots of arrays of length N
scattered through the code. Occasionally someone increments N, but
forgets to update all the array initialisers (because it's hard to
find them).

I'm not sure what heuristics should be used to define "almost all".

Can clang or some static analyser already do this? "clang -Wall
-Wextra" doesn't seem to warn about it.

Thanks,
Jay.

Just as a side note: if you want to check if Clang has any diagnostics
for a piece of code, you can use "-Weverything" to turn on /all/ Clang
diagnostics.

- David

Thanks. But it still doesn't warn!

Jay.

At least in C++, this warning wouldn’t make a lot of sense because all of the array elements are initialized. [dcl.init.aggr] 8.5.1/7: “If there are fewer initializers in the list than there are members in the aggregate, then each member not explicitly initialized shall be value-initialized (8.5).”

We would have a problem with lots of code which intended to initialized the first few elements, and zero out the rest.

Maybe someone has clever ideas about how to syntactically differentiate between these two use cases?

Chandler Carruth wrote (on Tue 19-Jun-2012 at 17:50 +0100)

....

#define N 8
const int a[N] = { 3, 1, 4, 1, 5, 9, 2 };
warning: some but not all array elements initialised

...

Maybe someone has clever ideas about how to syntactically differentiate
between these two use cases?

  const int a[N] = { ..... } __attribute__((all_explicitly_initialised))

Chris

Would a heuristic be acceptable ?

If the number of initializer is in [.9 * N, N) then it looks likely it is a bug. The .9 factor could probably benefit from some tuning, maybe as low as .5 would pass.

– Matthieu.

OK, so pretend the warning said "some but not all array elements
explicitly initialised".

I realise that this construct is well defined (like most other
constructs that compilers warn about). I'd still like to be warned
about it, because it's slightly suspicious. I'm not suggesting that
the warning should be on by default, or included in -Wall, or anything
like that.

Thanks,
Jay.

I really dislike this style of diagnostic, because it’s essentially impossible to tell innocent users what to do when the warning fires on their correct code.

user: Why am I getting this warning? My code is correct!
me: Well, we have a heuristic to try to detect when there is a bug, and it triggers on your code.
user: But my code doesn’t have a bug. The standard says right there that this is a supported use case.
me: I understand, but there is code that looks exactly like your code that is in fact a bug.
user: So what rule do I use when writing code so I don’t tickle this warning when my code is correct, but I get the warning when it has a bug?
me: …

I have no answer for that last question. I can’t say ‘0.9 * N’ for lots of reasons. We would change it to 0.8 at some point, users can’t count past 10, and they certainly can’t multiply or divide, and so the rule wouldn’t help. I can’t say “follow the standard” because the standard directly contradicts this warning. I can’t realistically say “always initialize every element” because often that’s either impractical, impossible, or just plain poor style. (Hence the standard allowing this partial formation…)

When we’re designing warnings that try to catch bugs, we need to focus on getting the warning into a state that is easier to explain and cope with for our users. For example:

  1. A warning that is always a bug in the code is easy: the user can’t really complain, their code is simply wrong.
  2. A warning that directs innocent users to some alternate syntax which hase equivalent semantics but is more explicit / clear / less bug-prone.

#2 is really hard to get right, and we should be overly conservative in employing it. Here are a few cases where I think we do a good job here:

  • Demanding explicit '()'s to group mixed ‘&&’ and ‘||’ operations, due to poor understanding of their precedence
  • Warning about functions declared at block scope unless they are explicitly marked with ‘extern’ due to their confusion with declaring a variable and calling its constructor (most vexing parse, etc).
  • Warning when a literal is converted to an unexpected type, such as converting ‘false’ to a null pointer.

In all of these cases, there is a significantly more clear way to express the intent of the programmer, and we have a very large body of evidence that these warnings almost unilaterally fire on buggy or incorrect code based on the experience turning them on across large code bases. Just having one of these isn’t enough, it’s important to have both. It’s also important to be able to argue for the more clear formation through the presence of bugs, not through a stylistic or aesthetic preference.

We’d like to avoid warnings that don’t catch enough bugs and have reasonable enough alternatives to end up eventually under ‘-Wall’. Having an off-by-default warning slows down the compiler and tends to cause the code to rot.

Also, this seems really like a coding convention or style you would like to enforce, and so I suspect a separate tool would be better suited to it.

Chandler Carruth wrote (on Tue 19-Jun-2012 at 18:58 +0100):

When we're designing warnings that try to catch bugs, we need to focus on
getting the warning into a state that is easier to explain and cope with for
our users. For example:

1) A warning that is *always* a bug in the code is easy: the user can't
really complain, their code is simply wrong.
2) A warning that directs innocent users to some alternate syntax which hase
equivalent semantics but is more explicit / clear / less bug-prone.

Fair point. I agree with the general principle that the compiler
shouldn't warn when there's no better way to express what you really
want; and if I really want

  static int a[1000000] = { 1, 2, /* all the rest 0 */ };

then there's no better way of writing it.

Also, this seems really like a coding convention or style you would like to enforce, and so I suspect a separate tool would be better suited to it.

Yup. I'd love a general purpose scriptable tool for doing this kind of
thing. Maybe it's time I looked at Coccinelle...

Thanks,
Jay.

Have you looked at the Tooling stuff Manuel and others are building up? It is specifically targeting these use cases.

http://clang.llvm.org/docs/LibTooling.html

C99 initializers? Same rules as for structs...

Joerg

const int a[] = { 3, 1, 4, 1, 5, 9, 2 };
static_assert(arraysize(a) == N, “N was increased, you need more initializers”);

Sebastian

Ooh, can I play?

// In a library...
struct end_t {} constexpr end {};
template<typename T> struct WithEnd { T a; end_t end; };

// error, no conversion from end_t to int
WithEnd<const int[N]> a = { 3, 1, 4, 1, 5, 9, 2, end };

... or ...

// In a library...
template<typename T, size_t...Ns> struct ArrImpl {
  constexpr ArrImpl(const decltype(Ns, declval<T>()) &...arg) : arr{arg...} {}
  T arr[sizeof...(Ns)];
};
template<typename T> struct ArrBuilder;
template<typename T, size_t N> struct ArrBuilder<T[N]> {
  typedef typename ArrBuilder<ArrImpl<T,N-1>>::type type;
};
template<typename T, size_t ...Ns> struct ArrBuilder<ArrImpl<T, 0, Ns...>> {
  typedef ArrImpl<T, 0, Ns...> type;
};
template<typename T, size_t N, size_t ...Ns> struct
ArrBuilder<ArrImpl<T, N, Ns...>> {
  typedef typename ArrBuilder<ArrImpl<T, N-1, N, Ns...>>::type type;
};
template<typename T> using Arr = typename ArrBuilder<T>::type;

// error, no matching constructor
Arr<const int[8]> a = { 1, 2, 3, 4, 5, 6, 7 };

I think Sebastian got the easiest alternative!

– Matthieu

Just to follow up on this (at the risk of going seriously off-topic) I
managed to get most of what I wanted with a Coccinelle/Python script: