Introduction
lldb currently only supports matching data formatters with types based on the name of the type (either by plain string matching or regex). This is enough most of the time: when you’re writing a prettyprinter, you usually know the name of the type you’re dealing with. However, in some cases it’s useful to be able to inspect the type as a part of the matching logic.
The motivating example for this RFC is prettyprinting protobuf (Protocol Buffers) messages. The protobuf compiler takes a message definition like this:
message SomeData {
optional uint32 field1 = 0;
repeated string field2 = 1;
...
}
and generates a C++ struct with a member for each message field:
class SomeData : public protobuf::Message {
// - fields from the original message
// - presence bits to indicate if optional fields have been set
// - API member functions: accessors, serialization...
// - etc.
}
With gdb, we can have a single python script that can prettyprint any protobuf message. We can do this because gdb allows prettyprinters to inspect the value that is being printed before selecting a particular prettyprinter: messages can have any arbitrary name, so the way to detect if a value is a protobuf message is to look up the inheritance hierarchy looking for the protobuf::Message
base class. This is currently unfeasible to do with lldb, because lldb chooses the data formatter based on the type name alone.
Proposal
We’d like to propose adding an optional Python callback that would be called during formatter matching. Once we have a string/regex match with a candidate type, if no matching function was specified we stop looking because we have a match (this is the current behavior). However, if we have a callback function, we would then run the callback and pass the type to it. The callback can then inspect the type and decide whether or not to keep the match. If it returns True
, we’ll keep the match and run that formatter, if False
, we’d ignore that match and continue looking in the usual order.
We have a working prototype consisting of the following modifications:
- Add a
matching_function_name
member toSBTypeNameSpecifier
to allow API users to set callbacks on formatter registration. - Add to
FormattersMatchCandidate
objects theTypeImpl
for the type being printed, and a pointer to the currentScriptInterpreter
. We need this in order to run the callback on matches, and pass the type to it. - Add a new method to
ScriptInterpreter
and some SWIG glue to actually run the callback. - Finally, some modifications across the formatter matching code to plumb the new data down to the point where matching is performed.
Here’s some example code that shows what a type summary using a Python matching function would look like in that prototype:
def has_member_x(t, internal_dict):
return len([x for x in t.get_members_array() if x.GetName() == "x"]) > 0
def print_member_x(valobj, internal_dict):
return "something with a member x=%s" % valobj.GetChildMemberWithName("x")
cat = lldb.debugger.CreateCategory("test")
t = lldb.SBTypeNameSpecifier(".*", True, "has_member_x")
cat.AddTypeSummary(t, lldb.SBTypeSummary.CreateWithFunctionName("print_member_x"))
cat.SetEnabled(True)
You can take a look at the (WIP, not ready for review yet) patch here: ⚙ Diff View
With this new feature, we could solve the example problem above by registering a formatter with a catch-all ".*"
regex and an appropriate callback to see if the type we got derives from protobuf::Message
.
Performance impact
This feature should have negligible performance impact when not used. However, adding a formatter that uses a catch-all regex with a matching function will cause the callback function to be run once for each new type that gets printed (the formatter will be cached after first use for each particular type).
In order to quantify the performance impact in this case, We’ve prepared a test program that defines a struct with 1000 indirect members, each of a different type (source). Then we’ve added in lldb a summary function like the example above, using a matching function. Finally, we’ve timed the execution of lldb.debugger.HandleCommand("fr var big")
twice, to see any differences between the first and successive runs.
For the baseline we’ve added a summary function with a simple regex and timed that, twice.
We’ve run two separate experiments:
- one where the formatters don’t match any of the struct members, to measure the overhead from calling a simple matching function repeatedly, not the formatter itself.
- one where the formatter always matches, and returns a fixed string. This allows us to check the execution time when the formatters are cached.
The following tables show the average running time over 10 runs:
Non-matching formatters | First time | Second time |
---|---|---|
catch-all regex + matching function | 0.193 s | 0.154 s |
non-matching regex | 0.152s | 0.113 s |
Matching formatters | First time | Second time |
---|---|---|
catch-all regex + return True matching function | 0.187 s | 0.139 s |
catch-all regex only | 0.181 s | 0.135 s |