RFC: Python callback for data formatters type matching

jingham · August 31, 2022, 1:27am

| slackito
August 31 |

| - |

Trying to move this forwards once again, given the performance problems I’ve encountered with quadratic lookups in my attempt of adding a flag to make a formatter match derived classes, I’m more inclined to go with the Python callback: the pushback wasn’t very strong, the performance impact seems acceptable (and it should be minimal after the first lookup for each type given the result will be cached), and I think it’s worth the added flexibility.

For what it’s worth, I think the quadratic lookups with the flag solution should be fixable by adding caching to some intermediate steps, but that feels like a much more intrusive change: different parts of the matching process happen in different classes, and it’s no longer enough to cache just the type name and the end result of the lookup, we’d have to include in the cache whether any given match candidate skips typedefs/pointers/references/base classes and if we found a formatter for it in those conditions.

Assuming we go with the callback option, I’m not sure how to expose it from the API. Right now all the calls that add formatters to categories take a SBTypeNameSpecifier to specify what to match. So the easiest way (and what I did in my original patch) is to add the matching callback to SBTypeNameSpecifier as an optional member. Thus having a catch-all ".*" regex and a callback.

However, having a pointless regex is not ideal:

jingham:

BTW, if after some thought it seems like this is really going to be a feature for .* formatters then you could just make it unconditional so you aren’t bothering with a pointless regex compare. Just keep these in a separate group and run them after the cached matches and exact & regex formatters. That probably won’t make the implementation much simpler, but it would remove unnecessary complexity from the UI.

and I’m not sure what’s the best way to add this to the API.

If we skip the regex, it doesn’t make a lot of sense to put a lone callback in an SBTypeNameSpecifier, as we’re no longer specifying a name. So I’m thinking about introducing a new SBFormatterMatchingCallback (or similar) type, so we can introduce new overloads in SBTypeCategory:
category.AddTypeFormat(SBFormatterMatchingCallback cb, SBTypeFormat type_format)
category.AddTypeSummary(SBFormatterMatchingCallback cb, SBTypeSummary type_summary) 
category.AddTypeFilter(SBFormatterMatchingCallback cb, SBTypeFilter type_filter)
category.AddTypeSynthetic(SBFormatterMatchingCallback cb, SBTypeSynthetic type_synthetic)
# (and matching Delete* functions)
and then, for each category, have another separate map for callback-based formatters in FormatterContainerPair in addition to the existing ones for exact matchers and regex matchers (and rename it to FormatterContainerGroup`?)

Does this make sense? Any advice would be much appreciated, as I’m still not very familiar with the codebase and I would like to do it in the most idiomatic way possible, without “fighting” the existing design.

To me, it depends a bit on whether you want to support C++ callback matchers at the SB layer or not. The rest of the SB API’s (both Python & C++) that deal with Type summaries/synthetic children. only let you supply python function data or a python function or class name for the actual summary providing code. They don’t take C++ function pointers. So it would be consistent to also only allow matchers in the current scripting language. In that case, at the SB API level, you really are specifying a name, but it has the meaning of “python function name” or “python class name” depending on how you want to implement the callback.

Note, it’s correct to grouse a bit that “NameSpecifier” isn’t really accurate, since these don’t specify anything, they are matchers. OTOH, that’s equally incorrect for regex matchers. So you wouldn’t be abusing the name any more than we already are.

So at the SB layer, using the SBTypeNameSpecifier makes sense, perhaps even more that with regex - there’s no sense in which a regex is a name… And it would avoid introducing another class and a bunch of similar overloads which would keep the API set more compact.

The only problem is that you have to invent an API to make these that doesn’t conflict with the extant ones. Maybe (though this is over specified) add:

SBTypeNameSpecifier::SBTypeNameSpecifier(const char *name, bool is_regex, bool is_function);

Or make an enum with {by_name, by_regex, by_function} and then use that in place of the booleans? Anyway, it seems like “type name”, “matcher regex” and “matcher function” are all just a single string, so using this class for them doesn’t seem particularly awkward.

For sure, at the lldb_private layer, you will want to separate these out and put them in different maps. But again, I’m not sure you need a separate class for that. But if you do, that’s less of an issue since at that point you are in lldb_private, and can be a little uglier for the sake of programming convenience.

Jim

Topic		Replies	Views
UnicodeDecodeError for serialize SBValue description LLDB	24	214	May 2, 2016
Regular Expressions LLVM Dev List Archives	19	146	June 18, 2009
lldb tests LLDB	10	115	March 30, 2012
Using data formatters to display QString LLDB	10	218	April 29, 2014
Help with C++ API LLDB	18	237	August 29, 2013

RFC: Python callback for data formatters type matching

Related topics