Richard, This is very helpful. Thank you for sharing.
As John pointed out, looking at multiple modification of an lvalue over a
sequence
point seems interesting. And it doesn't look like something that UBSan is
currently
getting. Although it might be difficult to get all violations of that
type, catching
some simpler ones seems like a good goal.
There are two different things you might be interested in here. One is
easy, the other is useful ![:wink: :wink:](https://emoji.discourse-cdn.com/google/wink.png?v=12)
The easy (but not very useful) one is catching undefined behavior due to
unsequenced reads/writes to the same location (this is undefined behavior,
and is Object Model item 4 in the document). This is "easy" because (1) we
already have implementation experience (in the static check for this bug)
of finding unsequenced operations, and we would "just" need to generate the
code to check whether any of the unsequenced and potentially-conflicting
operations are actually operating on the same location. Unfortunately, I
don't expect this to catch very many bugs which the static check misses;
this will probably only catch cases where the modified objects are the same
but not "obviously" the same, such as:
int n;
int &f() { return n; }
int g(int, int);
int k = g(f()++, n);
However, providing the sequencing information to the IR can allow further
optimizations (for instance, a sequencing-based alias analysis can discover
that "f()"s result does not alias "n" in the above code), so there may be
value in this beyond detecting UB bugs.
The useful (but not easy) one is catching cases where the operations are
indeterminately sequenced, rather than unsequenced (this happens when the
accesses are not directly part of the same full-expression, for instance if
one of them is within a function call, and does not provoke undefined
behavior). It's hard because:
1) runtime instrumentation to catch this is hard (this is similar to what
TSan checks for), and
2) removing false positives is hard (most cases of indeterminately
sequenced accesses are benign, because the end result of the program is the
same either way).
Example false positive, to give you a taste of why this is hard:
std::map<int, string*> stuff;
string build(int);
string &get(int k) {
string *&p = stuff[k];
if (!p) p = new string(build(k));
return *p;
}
void f(string &a, string &b);
void g() {
f(get(1), get(2)); // indeterminately-sequenced modifications to 'stuff'
...
// ... but this is almost certainly fine, if the results of the 'build'
calls
// don't depend on the order in which they're called
}
One reasonable heuristic for detecting issues here would be to provide a
mechanism to randomize the evaluation order (subject to the sequencing
rules and with a seed for debugability), and then either (a) just run the
tests for the program, or (b) run the program in parallel with multiple
different orders and check that the "observable behavior" (library IO
calls, etc) is the same.
[As an aside, thinking about "sequence points" is likely to prove unhelpful
to you. The "sequence points" rules are known to be broken, and the
"sequenced before" rules in C11 and C++11 are intended to describe how the
"sequenced before" rules were really meant to work.]