A few months ago I was trying to think about how to improve lldb’s watchpoint capabilities, and how to take better advantage of the AArch64 capabilities, eventually coming up with an idea of “watchpoint locations” and set it aside until I could work this out more thoroughly ( AArch64 watchpoints - reported address outside watched range; adopting MASK style watchpoints - #6 by jasonmolenda ).
Hardware watchpoints have strict size and alignment restrictions, they are closely tied to the actual hardware level implementation to be performant. Commonly they are restricted to watching 8 bytes, at an 8 byte alignment, on 64-bit targets. lldb today cannot watch regions larger than 8 bytes.
My intention is to break up the user requested watch region – 24 bytes, say – into multiple physical hardware register requests. To do this, watchpoint locations will be added. A watchpoint will have the user requested size and address. It will have one or more watchpoint locations which have correct alignment and size for hardware watchpoints. A 24-byte C++ object can be watched by using three 8-byte hardware watchpoint registers, for instance.
Watchpoint locations can solve unaligned watch requests. If a user watches 8 bytes spanning two 8-byte aligned regions, this can be satisified by using two hardware watchpoints, one watching half of the requested region from each doubleword.
aarch64 hardware also has a type of watchpoint that can watch power-of-2 sized regions, power-of-2 properly aligned. So a 24-byte C++ object can be watched with a single 32-byte, 32-byte aligned, “mask” watchpoint. This watchpoint use requires that lldb ignore changes to the 8 bytes the user did not request be watched. To achieve this, a new type of watchpoint will be added in addition to “read” and “write”: “modify”. A modify watchpoint will only stop for the user when the memory bytes being watched by the user have changed. A write which puts the exact same value in the memory will not stop. I think this will be the most common desired behavior (“who is trashing this object”) versus the less common “show me all code that touch this object”.
WatchpointLocations
A user created Watchpoint will have one or more WatchpointLocations.
Unlike a Breakpoint and BreakpointLocations, WatchpointLocations are largely a read-only display of how a watchpoint was actually implemented, for the user. You cannot disable/enable individual WatchpointLocations. You cannot put conditions or commands on a WatchpointLocation. When we stop on a watchpoint, we will not display it as which watchpoint location was hit. SB API will have API to list the WatchpointLocations on a Watchpoint, but nothing more.
WatchpointLocations have a similar name to BreakpointLocations, but they are much simpler objects.
New ‘modify’ watchpoint type
Watchpoints have read
or write
types now, specified by the gdb remote serial protocol Z2,addr,kind
(set watchpoint) and z2,addr,kind
(remove clear watchpoint) where kind
is 2 (write), 3 (read), or 4 (access, aka read+write).
With mask watchpoints, we can only watch power-of-2 regions of memory so when the user asks to watch 24 bytes, we will need to watch at least 32 bytes. We don’t want to show the user when the unwatched 8 bytes are accessed, so we need to detect those accesses and keep them as Private stops, where we continue silently.
lldb will add a new watchpoint kind: modify
, where lldb only stops when the memory region being watched changes. lldb will read & compare the memory region and if it has not changed, it will continue silently.
For read
watchpoints, we cannot know (on AArch64 at least) which bytes were accessed, so we will have false positive stops if a larger-than-requested region is being watched.
The default type in command line lldb will be modify
.
There are some other cases where we need to watch larger regions than requested on other targets. On Intel x86-64, you can watch 1, 2, 4, 8 bytes within a doubleword. But within that doubleword, the bytes you watch must be properly aligned. You can watch bytes 2 and 3, but you cannot watch bytes 1 and 2, without using a 4-byte watchpoint range [0,3].
Remote stub tells lldb what watchpoints it supports
New gdb-remote packet qWatchpointSupportInfo
where the remote stub can detail what its watchpoint capabilities are.
Should the capabilities be expressed in terms of
- “intel watchpoints” (doubleword aligned 1, 2, 4, 8 bytes watched, must be aligned to that amount within the dword)
- “aarch64 bas watchpoint” (doubleword aligned, any contiguous bytes within the dword).
- “aarch64 mask watchpoint” (any power-of-2 size 8 bytes to 2GB, aligned to that same power-of-2 boundary).
Or should we construct a generic description of likely watchpoint capabilities and the stub describes its capabilities in those terms? In my earlier Discourse, @labath mused that this might look like an array of (1) region size, (2), region alignment, (3) number of consecutive bytes that can be watched in the region. This first sketch doesn’t cover the Intel watchpoint behavior of what byte ranges within a doubleword can be watched, but that’s not a criticism of the idea of a generic description itself. Watchpoint capabilities are such a low level specific feature of chips, I think it would be possible to come up with a description that applies to nearly all targets in common use.
The “pre-canned behaviors” system means that lldb has to include that for all supported targets (that differ), but it does simplify the engine which calculates the necessary WatchpointLocations from a user’s specified Watchpoint.
We can fall back to a base assumption that a remote target can watch sizeof(void*) size regions aligned to size(void*) boundaries, which puts on at a similar behavior we have today.
WatchpointLocationCreator
Take a user requested watchpoint (addr, size) and knowledge of what watchpoints this stub can support, and create a list of physical watchpoints that we will set to implement that request.
A simple doubleword size watchpoint doubleword aligned will be a single watchpoint location. e.g. watching 8 bytes at 0x1004 (0x1004-0x100b) will need to use two doubleword aligned watchpoints. One watchpoint at 0x1000 watching 0x1004-0x1007 and one watchpoint at 0x1008 watching 0x1008-0x100b.
If the stub supports power-of-2 mask watchpoints, and the user watches 24 bytes at 0x1000, we can implement that with
- three 8-byte watchpoint BAS registers (no false positive hits).
- one 32-byte watchpoint MASK register, ignoring changes to the last 8 bytes.
- one 16-byte MASK watchpoint at 0x1000, one 8-byte BAS watchpoint at 0x1010 (no false positive hits).
Should WatchpointLocationCreator change it’s priority from “fewest watchpoint registers” to “more accurately watching” given how many other watchpoints are set currently?
What if the user sets one 24 byte watchpoint to watch one object, we use 3 watchpoint registers, and then they set a second 24-byte watchpoint?
Optimizing for “fewest watchpoint registers” can result in very poor choices for pathologically aligned watchpoint requests, if we try to cover them with one watchpoint register, on a target supporting AArch64 BAS and MASK watchpoints.
For example, watching 16 bytes at 0x01fffff8. A MASK watchpoint size 0x4000000 watching addresses 0x0-0x4000000 can watch these 16 bytes – watching 64MB of memory when only 16 bytes were user-requested. The next smallest mask watchpoint would be 0x2000000, which can only watch 0x0-0x2000000 – the first 8 bytes of the user request.
Using two watchpoint registers, this can be watched with an 8 byte BAS watchpoint at 0x1fffff8 and an 8 byte BAS watchpoint at 0x2000000; breaking the region into two makes it much easier to find a nearby alignment that will cover the memory block.
debugserver today has an algorithm for watchpoints which handles all of this for AArch64 BAS+MASK watchpoints, including using two watchpoint registers to handle a mis-aligned watchpoint request. This is hidden from lldb.
WatchpointSite Creation
When multiple WatchpointLocations exist, it would be ideal to handle them wholistically. There are three cases to think about
- Two watchpoints within a single doubleword (or in the same watched region, more generally)
- Two watchpoints which are contiguous in memory and can be covered by two hardware watchpoint registers, or one larger hardware watchpoint register request
- A user creates multiple watchpoints for the same address/size.
(1) is the most important case, where you have two uint32_t’s in a doubleword and are watching both of them. Today, these are sent down as two watchpoint requests which a stub will (most likely) use two hardware watchpoint registers on the same doubleword, each watching half of it.
For example, W1 watches bytes 0…3 and W2 watches bytes 4…7. Today this is sent to the stub as two watchpoints on the same address, watching different bytes. When a write to that doubleword happens, the CPU will pick one of the registers as a match, and use its byte-mask to decide whether to stop or not. lldb will miss accesses to the watchpoint register which is not checked first by the hardware. We can see (1) with mask watchpoints as well, e.g. watching a 24 byte object and an 8 byte object directly after that. The first watch might be done with a 32-byte mask watchpoint and the second watchpoint overlaps with that - the hardware will only detect hits for one of these watchpoint registers.
In the case of (2), this is an optimziation possiblity. If the user is watching two 8-byte properly aligned doublewords, and the stub can watch 16 bytes with a single hardware watchpoint register, then only one physical watchpoint register is used.
In the case of (3), this may be an error, need to see how Breakpoints handle this. The important aspects would be how to handle ignore count/commands/conditions that might be attached to either of the two identical watchpoints.
We would need to re-evalute our WatchpointSites every time a watchpoint is added/deleted or enabled/disabled.
Watchpoint/Breakpoint Handling Unification
Jim Ingham has noted that when we hit a breakpoint or watchpoint, we have many similar things that need to be evaluated/done. Decrement an ignore count, increment a hit count, evaluate a condition, run commands on the Breakpoint/Watchpoint.
Breakpoints have these attributes on BreakpointLocations or on their containing Breakpoint, whereas we will always have these on Watchpoints, not the individual Locations.
We have WatchpointOption
and BreakpointOption
separate classes, both should be covered by Breakpoint/Stoppoint*
with the difference of when we call a WatchpointCallback
taking a Watchpoint versus BreakpointCallback
taking a BreakpointLocations
?
SBWatchpoint API changes
Add methods GetNumLocations()
, GetLocationAtIndex()
.
I don’t think we need WatchpointLocation IDs like Breakpoint IDs which include the breakpoint and breakpoint location number. None of those API are needed. Users will not manipulate individual WatchpointLoctions. We should expose the locations through the SB API through GetNumLocations()
, GetLocationAtIndex()
.
SBWatchpoint::GetWatchAddress()
and SBWatchpoint::GetWatchSize()
will return the user’s specified watchpoint start & size. The watchpoint registers with aligned address & sizes can be queried via the SBWatchpointLocation object.
All WatchpointLocations will enable/disable at the same time as the Watchpoint.
The Watchpoint will maintain the number of hits it has had, not the WatchpointLocations. SBWatchpoint::GetHitCount
et al do not change.
Users will not attach conditions/commands individual watchpoint locations, disable or enable them.