Inconsistent warnings with -Wundefined-reinterpret-cast

Hi all!

I stumbled across some C++ code today that uses reinterpret_cast to reshape a 1D array into a 2D array. I wondered if this is actually legal or if it would cause undefined behavior. There is an elaborate post on Stack Overflow [1] that argues in favor of this method, quoting sections of the C++ standard but other people disagree [2].

So I went ahead and tested different variants of such conversions with Clang’s -Wundefined-reinterpret-cast flag:

        Compiler Explorer

The results can be summarized as:
1. Casting from int[4] to int(&)[2][2] or int(&)[2] causes a warning
2. Casting from int(*)[4] to int(*)[2][2] or int(*)[2] and _immediately_ dereferencing causes a warning
3. Casting from int(*)[4] to int(*)[2][2] or int(*)[2] without immediate dereferencing seems fine
4. Casting from int[4] to int(*)[2] seems fine
5. Dereferencing those pointers from 3. and 4. afterwards also seems fine

This seems to be a bit inconsistent. From the results I assume that the cast itself is fine but dereferencing the casted pointer is undefined behavior. However, if those operations are split across multiple source code lines, there is no warning. Is this a bug or a natural limitation of the error checking within Clang?

Cheers,
Michael

[1] c++ - Reshaping a 1-d array to a multidimensional array - Stack Overflow
[2] c++ - In which situations reinterpret_cast should be avoided for reshaping arrays? - Stack Overflow

OK, I think I can answer my own question. Here is the current code of the undefined-reinterpret-cast check:

    https://github.com/llvm/llvm-project/blob/b94c215592bdba915455895b2041398dfb2ac44a/clang/lib/Sema/SemaCast.cpp#L1874

In the comment it states:

    // The cases that is checked for is:
    // *reinterpret_cast<T*>(&a)
    // reinterpret_cast<T&>(a)
    // where accessing 'a' as type 'T' will result in undefined behavior.

So casting a pointer and dereferencing it at some later point in time is not covered by the check. It can also miss cases like the following:

    int matrix[2 * 2] = {0, 1, 2, 3};

    // missed because matrix is not referenced:
    int(&m1)[2][2] = *reinterpret_cast<int(*)[2][2]>(matrix);

    // missed because casted pointer is dereferenced via indexing operator:
    int(&m2)[2][2] = reinterpret_cast<int(*)[2][2]>(&matrix)[0];

On the question whether these casts (and dereferencing the resulting pointers) actually cause undefined behavior, I noticed a comment by Davis Herring on c++ - Reshaping a 1-d array to a multidimensional array - Stack Overflow. I read it as: This is undefined behavior because the array referenced by the casted pointer does not actually exist and therefore the argument about array-to-pointer conversion made by the author of that post does not hold.

Cheers,
Michael

OK, I think I can answer my own question. Here is the current code of the undefined-reinterpret-cast check:

https://github.com/llvm/llvm-project/blob/b94c215592bdba915455895b2041398dfb2ac44a/clang/lib/Sema/SemaCast.cpp#L1874

In the comment it states:

// The cases that is checked for is:
// reinterpret_cast<T>(&a)
// reinterpret_cast<T&>(a)
// where accessing ‘a’ as type ‘T’ will result in undefined behavior.

So casting a pointer and dereferencing it at some later point in time is not covered by the check.

Yes; unfortunately, Clang warnings tend to only perform a very local analysis, so they don’t tend to catch this kind of issue when it’s split across multiple expressions. (The static analyzer should be able to spot this kind of problem, but I don’t know if it has a checker corresponding to this warning.)

It can also miss cases like the following:

int matrix[2 * 2] = {0, 1, 2, 3};

// missed because matrix is not referenced:
int(&m1)[2][2] = reinterpret_cast<int()[2][2]>(matrix);

// missed because casted pointer is dereferenced via indexing operator:
int(&m2)[2][2] = reinterpret_cast<int(*)[2][2]>(&matrix)[0];

On the question whether these casts (and dereferencing the resulting pointers) actually cause undefined behavior, I noticed a comment by Davis Herring on https://stackoverflow.com/a/15284276. I read it as: This is undefined behavior because the array referenced by the casted pointer does not actually exist and therefore the argument about array-to-pointer conversion made by the author of that post does not hold.

Yes, Davis’s comment matches my understanding.