In C++20, we now have a bunch of stuff inside std::ranges, and now CallDescriptions that expect to match on e.g. std::next, now will also match on std::ranges::next, due to the fuzzy matching logic. This can introduce some problems.
Checkers will be triggered more often, without much benefit.
If the checker validates the types and values before use, that’s great, but it still wastes time.
If the checker does not check the types, etc., then it’s more likely to crash because of violating some assumptions. At worst, it will just silently continue its journey and completely mismodel some functions.
We should think about a solution to fix this.
I would mandate strict qualified name matching, but I know that e.g. the LLVMConventions checker relies on this fuzzy matching, thus it would break. What we could do is split the concept of CallDescriptions into two, have a FuzzyCallDescription that we have ATM, and have a strict version replacing the existing CallDescription.
This way we could at least isolate the affected checkers.
One additional benefit could be that by making matching more strict, we could get closer to caching the lookups inside CallDescriptionMaps, but let’s focus on this first.
This conceptual split of FuzzyCallDescription and CallDescription seems to be a good idea. How would you want to implement this: would you introduce two separate classes, or is it enough to add something like a ‘fuzzyMatch’ boolean flag or a std::optional namespace field where nullopt acts as a wildcard?
Is it useful to have a static type difference between FuzzyCallDescription and CallDescription? Are there use cases where we’d want to represent the two concepts with the same type (perhaps a checker where an option toggles fuzzy and strict matching)?
IDK. I’d probably prefer a separate class. I think that should give us more flexibility than just a flag.
I haven’t thought of implementation details, because I was wondering what should be the semantics of such a class.
If I recall, an explicit wildcard might be strong enough to express what the llvm conventions checker needs, but I’m not 100% certain. I had some PoC for it actually. I haven’t finished the patch, so it could be possibly implemented like that.
I think we can have both by using slicing. I find this approach more readable compared to flags, and usually more ergonomic. Here is what I’m thinking of.
But I’m not sure if we should focus too much on the implementation just yet.
I’m rather just raising awareness that we have a problem here, and we might want to do something about it. I’m not volunteering either
About wildcards:
ATM a CallDescriptionMap linearly tries each CallDescription to see if it matches the Call. This makes code vulnerable to relying on the order the entries are tried, and in case we have some unintentional eager match, it won’t get to the intended CallDescription, thus unintentionally invoking the wrong handler.
I’ve seen this happening once, and it was a painful bug to debug as I didn’t expect this.
Wildcards could lead to the same issue unless we impose rules like: if in the CallDescriptioMap we have a wildcard rule and a regular rule match, we prefer the regular one. However, what should happen if two wildcard rules match in the map?
If only we could “cache” the function decl being matched once we learned the rule would match.
The goal would be to optimize for the likely case, where the call doesn’t match the CallDescription.
To make this test fast, the idea would be that once we find out that the given call matches the CallDescription, we cache its decl and later only compare the decls in constant time.
The unfortunate thing is that C++ has overloads that only differ in types and CV qualifiers, thus as of now a CallDescription will match per overload-set; defeating this thought experiment with caching - unless we would cache all the decls of the overload-set.
A wildcard rule would behave similarly, as we cannot tell in advance what decls it would match.
Maybe we could calculate these ahead-of-time. This is sort of what the StdLibraryFunctionsChecker does.