[analyzer] RFC, garbage value => out of bounds

Hello!

I would like to change the analyzer so the a[i] value is not undefined when i is out of bounds.. to improve the Clang warnings.

Code example:

    void dostuff(int);

    void f(int nr) {
        int a[2] = {1,1};
        for (int i = 0; i < nr; i++)
            dostuff(a[i]);
    }

Output from Clang analyzer:

    /home/danielm/ossa/uninit.c:7:5: warning: Function call argument is an uninitialized value
        dostuff(a[i]);
        ^~~~~~~~~~~~~

The array a is fully initialized. So imho the message is a FP.

It is better to write "array index out of bounds". Like this:

    /home/danielm/ossa/uninit.c:7:13: warning: Access out-of-bound array element (buffer overflow)
        dostuff(a[i]);
                ^~~~

So.. if I change the analyzer the warning will not be shown unless the array-index check is enabled.

Do you have opinions?

Best regards,
Daniel Marjamäki

..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62
E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se

Have you looked e.g. at the HTML version on how the analyzer arrived at
this decision? It typically makes things like index out of bounds much
clearer. It is often not easy to say which warning is more appropiate,
so I'm not sure how much sense shuffling here really makes.

Joerg

I would like to change the analyzer so the a[i] value is not undefined when i is out of bounds.. to improve the Clang warnings.

Code example:

     void dostuff(int);

     void f(int nr) {
         int a[2] = {1,1};
         for (int i = 0; i < nr; i++)
             dostuff(a[i]);
     }

Output from Clang analyzer:

     /home/danielm/ossa/uninit.c:7:5: warning: Function call argument is an uninitialized value
         dostuff(a[i]);
         ^~~~~~~~~~~~~

The array a is fully initialized. So imho the message is a FP.

I think the message is misworded, it should be "is an undefined expression".
Actually a "potentially undefined expression", assuming the parameter is never passed a value >1.

It is better to write "array index out of bounds". Like this:

     /home/danielm/ossa/uninit.c:7:13: warning: Access out-of-bound array element (buffer overflow)
         dostuff(a[i]);
                 ^~~~

It would be relevant what happens if the expression is more complicated.

E.g. what happens for cases like
   a[a[i]]
   ptr[i-2]
   a[i > 4 ? i / 2 : i]
or if it is not directly fed to a function, as in
   a[i] + 5

Regards,
Jo

Thanks!

Well.. if you don't think my suggestion would make the warnings better I can work on other stuff.. but..

Imho the 'array index out of bounds' would be more correct.

The real code is not this simple. So at first it looks like a FP when the message claims that the fully initialized array contains garbage values.

Also I classify this as a FP. In the real code it is known that nr is not greater than 2. Clang is just guessing for no reason that it can be bigger than 2. So to silence this, as far as I know we'll have to add a redundant assertion or something.

It's more straight forward to see what assertion is needed, if the warning claims that the array index is out of bounds.

The 'array index' check is currently an alpha check. I assume it's because of some FP, maybe array index is not calculated properly always. Unfortunately right now such wrong array index affects the garbage values core check also. I would prefer to limit the FP to the alpha checking.

Best regards,
Daniel Marjamäki

If the function can be called from unknown locations, clang has no way of knowing this.
I suspect that making the code a private member function could make clang confident that it knows all calling locations.

The assert might really help. It would also document to future maintainers that the nr parameter is constrained to a range, so it would be very much non-redundant.

The problem that nr has not been proven to be greater than 2 was already covered by Joerg; it's worth it to have a look at the html diagnostic for the original case to see if it was proven that such caller context exists, otherwise we shouldn't ideally be seeing this warning. In fact, probably a heuristic based on detecting SymbolRegionValue<VarRegoin{ParmVarDecl}>-like values and staying more conservative with them than with other values may work somehow, not sure.

> the a[i] value is not undefined when i is out of bounds.

Out of bounds access causes undefined behavior. In practice, the program either crashes or produces an undefined value. I agree that the term "uninitialized" may not quite make sense here though.

The analyzer ideally should stop analysis of the branch (in this example there's only one branch, so the whole analysis interrupts) after undefined behavior occurs - which is what enabling array bounds checker should cause. Because array bounds checker is alpha (not sure why), the analysis continues, modeling the undefined value read by a pointer to perfectly defined array (which doesn't make the value, say, a[2] defined) with an undefined symbolic-value marker (UndefinedVal), and then another checker warns on usage of such undefined value.

So the problem we run into here is that the original message is much clearer in a simple case of:

   void foo() {
     int n;
     bar(n) // Function call argument is an uninitialized value
   }

In this example i'd certainly prefer "uninitialized" over "undefined". In your example, i'd probably prefer "undefined", and certainly prefer a significantly different warning, such as "Function call argument is a garbage value [obtained from an out-of-bound array access]". Unfortunately, discriminating between these two cases is right now technically difficult (the signleton UndefinedVal object does not provide any origin information). Then, again, we should not have seen this warning, because due to undefined behavior the execution shouldn't reach here anyway.

And i guess that there aren't many places when you can obtain undefined values without first causing undefined behavior. Uninitialized values are certainly one of such rare places. Though better warnings would be great anyway.

Maybe it's worth it to turn UndefinedVal into a true symbol in order to properly track its origin?

Hello!

It seems to me that you disagree with me that it would be preferable to write an array-index-out-of-bounds warning.

There is undefined behaviour A and you want to complain about undefined behaviour B that is caused by A.

I think that is strange. But I will accept that.

Best regards,
Daniel Marjamäki

..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62
E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se