[RFC] Strict weak ordering checks in the debug libc++

danlark1 · April 26, 2023, 9:14am

Summary

The proposal aims to fix strict weak ordering problems in comparison-based algorithms to prevent security issues and uncover bugs in the user code. The proposal includes adding debug checks for strict weak ordering within _LIBCPP_ENABLE_DEBUG_MODE.

Background

Not so long ago we proposed and changed std::sort algorithm. However, it was rolled back in 16.0.1 because of failures within the broken comparators. That was even true for the previous implementation, however, the new one exposed the problems more often.

At Google we constantly find failures and out of access memory bugs. Every single one of them results in a security issue. Say, if we sort 5 elements with the broken comparator, it might work fine but if the attacker manages to get more than 30 elements, it can results in a out of bound access with the current implementation.

We want to add debug checks for the strict weak ordering. It imposes several issues about the complexity of those as the default definition of this takes triples, and O(N^3) algorithms even in debug checks is not something libc++ user should experience.

We already added some debug capabilities as range randomization which served us well to find brittle golden tests and, more importantly, bugs. libc++ debug mode has irreflexivity checks and randomization can help finding some problems. We still can do better.

Proposal

Within the _LIBCPP_ENABLE_DEBUG_MODE, we are going to introduce iterator out of bounds checks where appropriate. One of such changes is considered in an ongoing review.

Within the _LIBCPP_ENABLE_DEBUG_MODE, we are going to introduce the strict weak ordering checks for the following set of algorithms:

Most important ones that can result in out of bounds accesses:

std::sort, std::stable_sort, std::nth_element, std::partial_sort
std::make_heap, std::sort_heap

Less important ones (they can only return an incorrect result for now):

std::{min,max,minmax}_element
std::{lower,upper}_bound
std::lexicographical_compare

Algorithms for checking strict weak ordering have obvious O(N^3) complexity for forward iterators. O(N^3) is going to hurt performance. For such iterators we are going to check only min(20, size) random or first elements and exclude primitive types with the default std::less<> comparator.

We know better O(N^2) algorithms for checking strict weak ordering. It allows us to check even more elements for functions which require random access iterators (std::sort, std::nth_element, std::stable_sort, std::partial_sort, std::{make,sort}_heap). Say, we will check min(100, size) elements per call. On the downside, such checks do not show a triple which violates strict weak order but only tell if the bug exists. This algorithm works in the following way: sorts the range somehow with the algorithm that does not crash, then if std::is_sorted is false, there is a bug in the comparator. Otherwise, we go through the ranges of equal elements and check they are equal among each other and are less than all that come after. The proof can be read in a repo.

At Google we found that checks for 100 elements rather than for 20 elements uncover 20% more bugs. However, checks for 500 elements uncovered only 1 more bug. We want to claim that most sorts are quite small and 95%+ of them are within the range of a hundred. We don’t want to hurt performance a lot and that’s why the final proposal looks like this:

For random access iterators we are going to use an O(N^2) algorithm for min(100, last - first) elements. This includes std::make_heap, std::sort_heap, std::sort, std::stable_sort, std::partial_sort, std::nth_element.
For forward iterators we are going to use an O(N^3) algorithm for min(20, last - first) elements. This includes all other algorithms.

Alternatives considered

We can compromise in some cases.

One option will be to drop the O(N^2) algorithm because it is some work to prove that it does check the strict weak ordering correctly, however, as said before, 20 elements uncover fewer bugs.

Another option is to drop O(N^3) algorithms for std::{min,max}_element algorithms as they cannot result in an out of bound access for most incorrect comparators.

Last option is not to follow the approach at all. Given the number of questions asked on stackoverflow and bugs it uncovers, we strongly believe this is the right call to make. It has the potential to save many hours and sleepless nights. There were cases where at Google we could not understand why the failure was happening for weeks and the bug existed for years.

Looking forward to your feedback.

-Daniel @danlark1

philnik · April 26, 2023, 5:19pm

Thanks for working on this! This proposal looks quite interesting, and would indeed be very helpful to have. I think it would make sense to split this up into two patches, since the random access and forward algorithms seem to be orthogonal to each other.
One thing I don’t really understand is why the random access algorithm can’t show a triple which doesn’t satisfy strict weak ordering. If you check is_sorted, you can also use is_sorted_until instead and print the triple in case is_sorted_until(first, last, pred) != last. What am I missing there?

danlark1 · April 26, 2023, 10:23pm

One thing I don’t really understand is why the random access algorithm can’t show a triple which doesn’t satisfy strict weak ordering. If you check is_sorted , you can also use is_sorted_until instead and print the triple in case is_sorted_until(first, last, pred) != last . What am I missing there?

Thanks for the feedback.

O(N^2) algorithm cannot show the triple which violates strict weak ordering relation as the resulting space is bigger than O(N^2). After sorting, if is_sorted_until(first, last, pred) != last, the returned iterator does not really have to be in the triple that violates the condition.

danlark1 · April 26, 2023, 10:25pm

FYI: @ldionne, I want your approval before I submit anything of that sort to libcxx

philnik · April 26, 2023, 10:36pm

Ah OK, that makes sense. But we could run the forward iterator algorithm over it to find out, right? Or is the forward iterator variant not strictly more powerful than the random access variant? Since we crash anyways, we might as well spend some time giving a more useful crash message.

danlark1 · April 26, 2023, 10:54pm

Yes, we can run an O(n^3) algorithm to find a triple after we discover that the bug exists.

ldionne · April 27, 2023, 3:06pm

Thanks for the proposal! This is indeed really useful and I think we should have a way to find these sorts of issues. However, I wonder how your approach relates to the existing __debug_less machinery we have (currently in libcxx/include/__algorithm/comp_ref_type.h).

The current machinery basically allows wrapping the comparator such that it checks that !comp(b, a) holds whenever comp(a, b) is true (aka antisymmetry). We could augment it to also check !comp(a, a) and !comp(b, b), which would get us irreflexivity. The only thing we can’t do with this approach is check transitivity, but the upside is that it doesn’t change the complexity of any algorithm and it’s pretty simple to implement.

Are you aware of our current machinery, and is your proposal motivated by the fact that not checking transitivity is too big of a blind spot?

yugr · April 27, 2023, 6:10pm

Another important case is violation of comparator axioms in STL containers (std::map and std::set). I would suggest to add similar check in destructors and maybe calls to clear method (SortChecker, existing dynamic checker of strict weak ordering violations, currently does the latter).

danlark1 · April 27, 2023, 9:48pm

Thanks for the feedback!

We are aware of the irreflexive checks inside the debug mode. We think they are good but not enough to catch majority issues in the business logic of the user provided code. For example, irreflexive checks are done only for elements that are compared, there are N^2/2 pairs to check but the algorithm does only O(N log N) in the worst case, so quite a few are missed and the resulting sorting can be, well, strange.

Our suggestion has its own limitations but more general to catch issues, for example, with NaNs or if the user compares floats with epsilon.

ldionne · May 1, 2023, 1:33pm

Hmm, you do have a point that we don’t compare nearly all pairs of elements. In that case, I think it would be worth having such checks but they’d obviously be enabled only in Debug mode (for now, until we have more granular knobs).

I looked at GitHub - danlark1/quadratic_strict_weak_ordering. IIUC, we can’t use the exact same sorting algorithm that we normally use when we want to enable those checks – that’s correct right? If so, that’s a downside of this approach, since enabling the debug mode should ideally not change (too much) the algorithms that the library uses. Otherwise, it becomes impossible to debug an issue in e.g. the usual std::sort algorithm by enabling the debug mode, since it uses a different algorithm (which may not exhibit the same issues) all of a sudden.

But all in all, this seems like something useful and I think a review would be welcome – we can iterate on the details there.

danlark1 · May 1, 2023, 7:17pm

Thanks!

IIUC, we can’t use the exact same sorting algorithm that we normally use when we want to enable those checks

If we first put checks for OOB reads first in the ongoing review, from the user perspective it is going to look like this

If an assert is triggered inside std::sort, then it is definitely a violation of strict weak ordering
If no asserts are triggered, we don’t know and we execute our checks for first 100 elements

If both checks pass, there is still a chance for sorts of >100 elements to have a violation but we cannot go for O(N^2) in this algorithm without hurting debug perfomance by too much, so the tradeoff is inevitable.

In short, we can use the exact sort if we put the OOB checks right.

For std::nth_element, std::partial_sort it is going to look like this

We add OOB ASSERT checks
call the algorithm, then call sorting, then call checks
Then call the randomization of the parts that do not have guarantees – we already have that in the debug mode