Writing static analyzers to detect non-deterministic behavior?

In the past, I had added the ability in LLVM to uncover 2 types of non-deterministic behaviors: iteration of unordered containers with pointer-like keys and sorting of elements with the same keys.

Now, I wanted to add checkers for these (and other types of non-deterministic behaviors) so that they could be applied more widely. I also realize that not all of these may be doable at compile-time.

With that in mind, I have pushed a patch which I adds a new category of checks for non-determinism: https://reviews.llvm.org/D50488.

The first checker which I have pushed here will simply check if raw pointers are being sorted. In subsequent patches, I will fine tune this by handling more complex cases of sorting of pointer-like keys.

I have another patch which checks for iteration of unordered containers but that may be very noisy in its current form.

I would like comments/suggestions from the community on:

1. Are static analysis checks to detect non-determinism something worth doing/doable?

2. How about writing sanitizers to detect non-deterministic behavior? Would that be too costly in terms of run time and instrumentation?

--Mandeep

  1. I’m not sure it’s do-able. I don’t know of any nice way to track whether an ordered walk of an unordered container leaks out into the final output of the program. Only iterating over an unordered container is probably not a sufficient hint (it’d have a high false positive rate to warn on every instance of that) - and I don’t have any great ideas about whether that false positive rate could be sufficiently reduced with maybe some heuristics about connecting one iteration to another container population, and to the final program output…

  2. Maybe - but I would think that’d still end up using heuristics to guess at whether one iteration order impacts the order of another container, etc.

Thanks for your response David.

  1. I’m not sure it’s do-able. I don’t know of any nice way to track whether an ordered walk of an unordered container leaks out into the final output of the program. Only iterating over an unordered container is probably not a sufficient hint (it’d have a high false positive rate to warn on every instance of that) - and I don’t have any great ideas about whether that false positive rate could be sufficiently reduced with maybe some heuristics about connecting one iteration to another container population, and to the final program output

The idea I had in mind was to check the commutativity of each operation inside the iteration of the unordered container. I realize we may not be able to do this for every operation and even if we could that would still be a heuristic and prone to false positives. Maybe we can start with a simpler problem to tackle instead of the iteration order. That’s why in the patch I pushed I wrote a very simple checker for std::sort with raw pointers.

  1. Maybe - but I would think that’d still end up using heuristics to guess at whether one iteration order impacts the order of another container, etc.

Yes, maybe iteration order is more difficult to problem but we should be able to detect non-deterministic sorting order of keys with same values. The idea I had in mind is similar to what we do in llvm::sort (which is random shuffle of the container). We instrument the user code to sort a container twice - first is the user’s normal std::sort, the second time we random shuffle the container and then sort again, and then check if there is a difference in the two outputs. The run time cost would be the cost of the shuffle + the extra sort + the cost of checking the two containers [O(n lg n) + 2 O(n)].

–Mandeep

Thanks for your response David.

  1. I’m not sure it’s do-able. I don’t know of any nice way to track whether an ordered walk of an unordered container leaks out into the final output of the program. Only iterating over an unordered container is probably not a sufficient hint (it’d have a high false positive rate to warn on every instance of that) - and I don’t have any great ideas about whether that false positive rate could be sufficiently reduced with maybe some heuristics about connecting one iteration to another container population, and to the final program output

The idea I had in mind was to check the commutativity of each operation inside the iteration of the unordered container. I realize we may not be able to do this for every operation and even if we could that would still be a heuristic and prone to false positives. Maybe we can start with a simpler problem to tackle instead of the iteration order. That’s why in the patch I pushed I wrote a very simple checker for std::sort with raw pointers.

You’re referring to the llvm::sort work, that randomizes the elements due to the instability of std::sort? (or some other work on std::sort and raw pointers?)

Changes like llvm::sort I don’t see as “static/dynamic analysis” - but changing behavior & relying on the products own tests to validate the behavior is still correct. (unlike static or dynamic analysis which produces warnings/errors on the relevant codepaths).

  1. Maybe - but I would think that’d still end up using heuristics to guess at whether one iteration order impacts the order of another container, etc.

Yes, maybe iteration order is more difficult to problem but we should be able to detect non-deterministic sorting order of keys with same values. The idea I had in mind is similar to what we do in llvm::sort (which is random shuffle of the container). We instrument the user code to sort a container twice - first is the user’s normal std::sort, the second time we random shuffle the container and then sort again, and then check if there is a difference in the two outputs. The run time cost would be the cost of the shuffle + the extra sort + the cost of checking the two containers [O(n lg n) + 2 O(n)].

Though different results there doesn’t necessarily represent a bug in the program - it’s a question of whether that different order impacts the program behavior. Could experiment - but I’d guess that the false positive rate would be a bit high…

(that said, I’m a bit of a pessimist in general - so take my perspective with a grain of salt :slight_smile: )

  • Dave

Iterating over unordered_map and unordered_set (or user-defined? llvm::DenseSet/DenseMap?) would seem like usual suspects for such checks.

I don’t know of any nice way to track whether an ordered walk of an unordered container leaks out into the final output of the program

Precisely, so if it’s not doable for a checker, it’s probably not reliably doable for an engineer as well.

In my experience, most iterations over unordered containers tend to affect the final output,
and if project owners really care about avoiding non-determinism it would IMO make sense to ban those.

Iterating over unordered_map and unordered_set (or user-defined? llvm::DenseSet/DenseMap?) would seem like usual suspects for such checks.

I don’t know of any nice way to track whether an ordered walk of an unordered container leaks out into the final output of the program

Precisely, so if it’s not doable for a checker, it’s probably not reliably doable for an engineer as well.

I wouldn’t think that was the case - you might iterate one hashed container and build another deep in other function calls - I’ve certainly debugged through many layers of that sort of thing when poking around in LLVM.

In my experience, most iterations over unordered containers tend to affect the final output,

I’d be surprised if that was the case - though a bit hard to measure. I would guess LLVM has maybe O(hundreds) of iterations over hashed containers that don’t effect the final output - but that’s just a guess.

and if project owners really care about avoiding non-determinism it would IMO make sense to ban those.

Seems a bit severe to me. But I could be wrong.

  • Dave