In theory something like UBSan (-fsanitize=undefined) or the Integer sanitizer (-fsanitize=integer) should help catch this at runtime, but I can’t seem to get them to fire on this code. Not sure if it’s a missing feature/bug or some necessary false negative.
I looked into the code and I think it’s a missing feature. The question is whether it is worth implementing.
Are we talking about signed or unsigned overflows? Both may be useful but second isn't UB.
> The question is whether it is worth implementing.
There are some CWEs for it:
* CWE-197: Numeric Truncation Error (CWE - CWE-197: Numeric Truncation Error (4.8))
* CWE-192: Integer Coercion Error (CWE - CWE-192: Integer Coercion Error (4.8))
-Y
I mean both types of overflow, in both the data are lost.
For testing I created a template safe_cast, attaching demo code and the same
results I would like from the code translated with non-existent
"-fsanitize=integer-overflow-assign"
/* unsigned -> signed, overflow */
safe_cast<int8_t >(UINT32_MAX);
safe_cast<int16_t>(UINT32_MAX);
safe_cast<int32_t>(UINT32_MAX);
/* unsigned -> signed, no overflow */
safe_cast<int64_t>(UINT32_MAX);
/* unsigned -> unsigned, overflow */
safe_cast<uint8_t >(UINT64_MAX);
safe_cast<uint16_t>(UINT64_MAX);
safe_cast<uint32_t>(UINT64_MAX);
/* unsigned -> unsigned, no overflow on 64bits */
safe_cast<size_t >(UINT64_MAX);
/* unsigned -> unsigned, no overflow */
safe_cast<uint64_t>(UINT64_MAX);
/* signed -> unsigned, overflow */
safe_cast<uint8_t >((-1));
safe_cast<uint16_t>((-1));
safe_cast<uint32_t>((-1));
safe_cast<uint64_t>((-1));
safe_cast<size_t >((-1));
/* signed -> signed, overflow */
safe_cast<int8_t >(INT32_MIN);
safe_cast<int16_t>(INT32_MIN);
/* signed -> signed, no overflow */
safe_cast<int32_t>(INT32_MIN);
safe_cast<int64_t>(INT32_MIN);
Are we talking about signed or unsigned overflows? Both may be useful but
second isn't UB.
The subject is not overflow, it's truncating conversion. The semantics here
are well-defined -- for a conversion to unsigned, they're defined in the
language standard, and for a conversion to signed, they're
implementation-defined. In the latter case, essentially all modern
implementations define the conversion as a 2s complement truncation.
So... neither of these belong in -fsanitize=undefined. They might make
sense in -fsanitize=integer, though.
In my experience, conversion checks[1] cause too many false positives
to be generally useful and
this is why these checks aren't presently done by Clang (in
-fsanitize=integer or otherwise).
One idea I explored to improve the situation is to distinguish
explicit from implicit conversion errors,
which might be good enough for general use. That said, I'm unsure if
this really improves
the signal-to-noise ratio or simply lowers the volume of errors
reported, I'd have to spend
some time going back through my evaluation results. That's not to say either
or both of these checks aren't potentially very useful to you in your
situation :).
I have a preliminary implementation of conversion check support
available on github
(clang[2], corresponding compiler-rt[3]) that may be of interest.
I've let these forks fall behind trunk quite a bit, but if you or
others think this would be
a useful set of checks I'd be willing to polish them and port to mainline
(potentially for inclusion). Let me know, I've actually already done
most of this work privately so really am just looking for a compelling argument
to wrap it up :).
In short:
* I've implemented the sort of checks you describe
* My own research suggests they're too noisy for general use
* If you think you'd find them useful regardless, we should chat
~Will
[1] Here I mean any conversion where the result interpreted by the
destination type is not the same mathematical integer as the source
interepeted by the source type. This can happen either due to
truncation or sign errors. Note that truncation of bits might not be
an error, consider -1 cast from int32_t to int8_t.
[2] https://github.com/dtzWill/ioc-clang
[3] https://github.com/dtzWill/ioc-compiler-rt
One idea I explored to improve the situation
is to distinguish
explicit from implicit conversion errors,
which might be good enough for general use.
AFAIK that's more or less what MSVC++ does with /RTCc (/RTC (Run-time error checks) | Microsoft Learn).
In my experience this works reasonably well - I haven't seen false positives in fairly large codebase.
That said, I'm unsure if this really improves
the signal-to-noise ratio or simply lowers the volume of errors
reported, I'd have to spend
some time going back through my evaluation results.
Did you try building some OSS projects with ioc-clang? It'd be really interesting to hear about your findings.
Let me know, I've actually already done
most of this work privately so really am just looking for a compelling argument
to wrap it up :).
Would availability of similar checks in MSVC be a good argument?
-Y
I do not think this is a false positive. In my view it is the loss of data
or poor application design when it is necessary to convert data and/or lose
accuracy.
A very good idea is to distinguish implicit and explicit conversion,
explicit conversion is clearly expressed intent of the programmer that he
wants to change the integer type. Therefore, I see hidden implicit
conversion as more dangerous
Data truncation by the implicit conversion behavior can be expected if you
know and follow exactly the conversion rules. Unfortunately, in most cases
it is commonly programmers ignorance/error and this leads to unwanted
side-effect.
If I had voting rights, I voted for this feature
MF