RFC: Large watchpoint support in lldb

jasonmolenda · July 18, 2023, 12:09am

A few months ago I was trying to think about how to improve lldb’s watchpoint capabilities, and how to take better advantage of the AArch64 capabilities, eventually coming up with an idea of “watchpoint locations” and set it aside until I could work this out more thoroughly ( AArch64 watchpoints - reported address outside watched range; adopting MASK style watchpoints - #6 by jasonmolenda ).

Hardware watchpoints have strict size and alignment restrictions, they are closely tied to the actual hardware level implementation to be performant. Commonly they are restricted to watching 8 bytes, at an 8 byte alignment, on 64-bit targets. lldb today cannot watch regions larger than 8 bytes.

My intention is to break up the user requested watch region – 24 bytes, say – into multiple physical hardware register requests. To do this, watchpoint locations will be added. A watchpoint will have the user requested size and address. It will have one or more watchpoint locations which have correct alignment and size for hardware watchpoints. A 24-byte C++ object can be watched by using three 8-byte hardware watchpoint registers, for instance.

Watchpoint locations can solve unaligned watch requests. If a user watches 8 bytes spanning two 8-byte aligned regions, this can be satisified by using two hardware watchpoints, one watching half of the requested region from each doubleword.

aarch64 hardware also has a type of watchpoint that can watch power-of-2 sized regions, power-of-2 properly aligned. So a 24-byte C++ object can be watched with a single 32-byte, 32-byte aligned, “mask” watchpoint. This watchpoint use requires that lldb ignore changes to the 8 bytes the user did not request be watched. To achieve this, a new type of watchpoint will be added in addition to “read” and “write”: “modify”. A modify watchpoint will only stop for the user when the memory bytes being watched by the user have changed. A write which puts the exact same value in the memory will not stop. I think this will be the most common desired behavior (“who is trashing this object”) versus the less common “show me all code that touch this object”.

WatchpointLocations

A user created Watchpoint will have one or more WatchpointLocations.

Unlike a Breakpoint and BreakpointLocations, WatchpointLocations are largely a read-only display of how a watchpoint was actually implemented, for the user. You cannot disable/enable individual WatchpointLocations. You cannot put conditions or commands on a WatchpointLocation. When we stop on a watchpoint, we will not display it as which watchpoint location was hit. SB API will have API to list the WatchpointLocations on a Watchpoint, but nothing more.

WatchpointLocations have a similar name to BreakpointLocations, but they are much simpler objects.

New ‘modify’ watchpoint type

Watchpoints have read or write types now, specified by the gdb remote serial protocol Z2,addr,kind (set watchpoint) and z2,addr,kind (remove clear watchpoint) where kind is 2 (write), 3 (read), or 4 (access, aka read+write).

With mask watchpoints, we can only watch power-of-2 regions of memory so when the user asks to watch 24 bytes, we will need to watch at least 32 bytes. We don’t want to show the user when the unwatched 8 bytes are accessed, so we need to detect those accesses and keep them as Private stops, where we continue silently.

lldb will add a new watchpoint kind: modify, where lldb only stops when the memory region being watched changes. lldb will read & compare the memory region and if it has not changed, it will continue silently.

For read watchpoints, we cannot know (on AArch64 at least) which bytes were accessed, so we will have false positive stops if a larger-than-requested region is being watched.

The default type in command line lldb will be modify.

There are some other cases where we need to watch larger regions than requested on other targets. On Intel x86-64, you can watch 1, 2, 4, 8 bytes within a doubleword. But within that doubleword, the bytes you watch must be properly aligned. You can watch bytes 2 and 3, but you cannot watch bytes 1 and 2, without using a 4-byte watchpoint range [0,3].

Remote stub tells lldb what watchpoints it supports

New gdb-remote packet qWatchpointSupportInfo where the remote stub can detail what its watchpoint capabilities are.

Should the capabilities be expressed in terms of

“intel watchpoints” (doubleword aligned 1, 2, 4, 8 bytes watched, must be aligned to that amount within the dword)
“aarch64 bas watchpoint” (doubleword aligned, any contiguous bytes within the dword).
“aarch64 mask watchpoint” (any power-of-2 size 8 bytes to 2GB, aligned to that same power-of-2 boundary).

Or should we construct a generic description of likely watchpoint capabilities and the stub describes its capabilities in those terms? In my earlier Discourse, @labath mused that this might look like an array of (1) region size, (2), region alignment, (3) number of consecutive bytes that can be watched in the region. This first sketch doesn’t cover the Intel watchpoint behavior of what byte ranges within a doubleword can be watched, but that’s not a criticism of the idea of a generic description itself. Watchpoint capabilities are such a low level specific feature of chips, I think it would be possible to come up with a description that applies to nearly all targets in common use.

The “pre-canned behaviors” system means that lldb has to include that for all supported targets (that differ), but it does simplify the engine which calculates the necessary WatchpointLocations from a user’s specified Watchpoint.

We can fall back to a base assumption that a remote target can watch sizeof(void*) size regions aligned to size(void*) boundaries, which puts on at a similar behavior we have today.

WatchpointLocationCreator

Take a user requested watchpoint (addr, size) and knowledge of what watchpoints this stub can support, and create a list of physical watchpoints that we will set to implement that request.

A simple doubleword size watchpoint doubleword aligned will be a single watchpoint location. e.g. watching 8 bytes at 0x1004 (0x1004-0x100b) will need to use two doubleword aligned watchpoints. One watchpoint at 0x1000 watching 0x1004-0x1007 and one watchpoint at 0x1008 watching 0x1008-0x100b.

If the stub supports power-of-2 mask watchpoints, and the user watches 24 bytes at 0x1000, we can implement that with

three 8-byte watchpoint BAS registers (no false positive hits).
one 32-byte watchpoint MASK register, ignoring changes to the last 8 bytes.
one 16-byte MASK watchpoint at 0x1000, one 8-byte BAS watchpoint at 0x1010 (no false positive hits).

Should WatchpointLocationCreator change it’s priority from “fewest watchpoint registers” to “more accurately watching” given how many other watchpoints are set currently?

What if the user sets one 24 byte watchpoint to watch one object, we use 3 watchpoint registers, and then they set a second 24-byte watchpoint?

Optimizing for “fewest watchpoint registers” can result in very poor choices for pathologically aligned watchpoint requests, if we try to cover them with one watchpoint register, on a target supporting AArch64 BAS and MASK watchpoints.

For example, watching 16 bytes at 0x01fffff8. A MASK watchpoint size 0x4000000 watching addresses 0x0-0x4000000 can watch these 16 bytes – watching 64MB of memory when only 16 bytes were user-requested. The next smallest mask watchpoint would be 0x2000000, which can only watch 0x0-0x2000000 – the first 8 bytes of the user request.

Using two watchpoint registers, this can be watched with an 8 byte BAS watchpoint at 0x1fffff8 and an 8 byte BAS watchpoint at 0x2000000; breaking the region into two makes it much easier to find a nearby alignment that will cover the memory block.

debugserver today has an algorithm for watchpoints which handles all of this for AArch64 BAS+MASK watchpoints, including using two watchpoint registers to handle a mis-aligned watchpoint request. This is hidden from lldb.

WatchpointSite Creation

When multiple WatchpointLocations exist, it would be ideal to handle them wholistically. There are three cases to think about

Two watchpoints within a single doubleword (or in the same watched region, more generally)
Two watchpoints which are contiguous in memory and can be covered by two hardware watchpoint registers, or one larger hardware watchpoint register request
A user creates multiple watchpoints for the same address/size.

(1) is the most important case, where you have two uint32_t’s in a doubleword and are watching both of them. Today, these are sent down as two watchpoint requests which a stub will (most likely) use two hardware watchpoint registers on the same doubleword, each watching half of it.

For example, W1 watches bytes 0…3 and W2 watches bytes 4…7. Today this is sent to the stub as two watchpoints on the same address, watching different bytes. When a write to that doubleword happens, the CPU will pick one of the registers as a match, and use its byte-mask to decide whether to stop or not. lldb will miss accesses to the watchpoint register which is not checked first by the hardware. We can see (1) with mask watchpoints as well, e.g. watching a 24 byte object and an 8 byte object directly after that. The first watch might be done with a 32-byte mask watchpoint and the second watchpoint overlaps with that - the hardware will only detect hits for one of these watchpoint registers.

In the case of (2), this is an optimziation possiblity. If the user is watching two 8-byte properly aligned doublewords, and the stub can watch 16 bytes with a single hardware watchpoint register, then only one physical watchpoint register is used.

In the case of (3), this may be an error, need to see how Breakpoints handle this. The important aspects would be how to handle ignore count/commands/conditions that might be attached to either of the two identical watchpoints.

We would need to re-evalute our WatchpointSites every time a watchpoint is added/deleted or enabled/disabled.

Watchpoint/Breakpoint Handling Unification

Jim Ingham has noted that when we hit a breakpoint or watchpoint, we have many similar things that need to be evaluated/done. Decrement an ignore count, increment a hit count, evaluate a condition, run commands on the Breakpoint/Watchpoint.
Breakpoints have these attributes on BreakpointLocations or on their containing Breakpoint, whereas we will always have these on Watchpoints, not the individual Locations.

We have WatchpointOption and BreakpointOption separate classes, both should be covered by Breakpoint/Stoppoint* with the difference of when we call a WatchpointCallback taking a Watchpoint versus BreakpointCallback taking a BreakpointLocations?

SBWatchpoint API changes

Add methods GetNumLocations(), GetLocationAtIndex().

I don’t think we need WatchpointLocation IDs like Breakpoint IDs which include the breakpoint and breakpoint location number. None of those API are needed. Users will not manipulate individual WatchpointLoctions. We should expose the locations through the SB API through GetNumLocations(), GetLocationAtIndex().

SBWatchpoint::GetWatchAddress() and SBWatchpoint::GetWatchSize() will return the user’s specified watchpoint start & size. The watchpoint registers with aligned address & sizes can be queried via the SBWatchpointLocation object.

All WatchpointLocations will enable/disable at the same time as the Watchpoint.

The Watchpoint will maintain the number of hits it has had, not the WatchpointLocations. SBWatchpoint::GetHitCount et al do not change.

Users will not attach conditions/commands individual watchpoint locations, disable or enable them.

jingham · July 18, 2023, 12:16am

Breakpoints handle multiple breakpoints with the same specification (shared at the level of the BreakpointSite). That can be handy, for instance it’s an easy way to do “if condition A is true, do some operation, if condition B, do some other operation”, or even just to count the number of hits when conditions A or B are true…

IMO it makes sense for watchpoints to behave the same way.

DavidSpickett · July 19, 2023, 3:58pm

I like the concept overall and I agree that users should interact with the single “top level” watchpoint and not have to care about how it works behind the scenes.

What is the current default, and are people who are used to watchpoint set myvar going to be surprised by the change?

If we assume that the majority of people doing that are in fact intending to check for modifications anyway, the few who aren’t could print the watchpoint and see right away “modify” which is pretty clear in its intent.

Does this account for cores where not all watchpoints have the same features? I’m not sure what the currently supported architectures all do.

I know MIPS isn’t the most modern example but it’s the one I used to know, a lot of those cores would have certain breakpoints able to use certain features.

I’ll look at the Linux kernel interfaces and the ARMARM to see if there’s any concept of that in there. Maybe AArch64 is simpler in this respect.

What is the justification to provide even the WatchpointLocations? Beyond us being able to test it more nicely than string parsing (which is almost enough tbh).

From an IDE potentially you could take the currently used locations and match it up to the known hardware resources, but I don’t know that you’d be able to know that say of the 4 watchpoints these can do power of 2 these can do this feature etc.

The MIPS tools I used to work on, one took the simple approach of having the user choose the resources used. The other would shuffle things each time a new point was added, and for the common use cases it worked well. Only once we got into chained (literally called “complex”) breakpoints did it become truly difficult.

Showing the locations in an IDE could also be a final fallback for any strange behaviours we might see, especially on AArch64 when instructions read/write > 8 bytes. Gives a human a chance to read it and verify.

If you weren’t already, make sure this part can be tested in isolation because hardware in the wild varies a lot when it comes to watchpoints. Sounds like it’s already on that path though, since you have the capability description. Should be possible to generically test by passing in the currently used resources, the capabilities and the desired watchpoint config.

Would be a lot of effort/impossible to make it exhaustive so starting with the examples you’ve given here is a good idea. We can add corner cases as they come up.

Is this another place to use that llvm term I see a lot, “plan”. WatchpointPlan

jasonmolenda · July 19, 2023, 8:04pm

The current default is “write”.

jasonmolenda · July 19, 2023, 8:08pm

The gdb remote serial protocol doesn’t provide any mechanism for creating a hardware breakpoint or watchpoint on specific threads/cores, a watchpoint is assumed to be active on all threads/cores when it is set, and that the capabilities of the threads/cores are homogeneous. It makes the same assumption about memory - there isn’t a memory read/write packet which specifies which core to read/write it from.

If lldb is communicating with a JTAG stub (the most likely case for this to come up) controlling a multi-core processor complex, I would expect the stub to report the watchpoint capabilities of the least sophisticated core for this packet. e.g. if two cores have BAS and MASK watchpoints (in AArch64 terms) and four smaller cores only have BAS watchpoints, I would expect the stub to tell lldb it can only do BAS watchpoints.

jasonmolenda · July 19, 2023, 8:11pm

My reason for adding the API to SBWatchpoint is only so that the front end can has a way of displaying watchpoint information like “You have one 32-byte watched region at 0x1000 which is implemented by dword watchpoints at 0x1000, 0x1008, 0x1010, 0x1018”, like watchpoint list will do in commandline lldb.

jingham · July 19, 2023, 8:59pm

I think being able to show how watchpoints resources are getting used is valuable. For instance, if I need to watch 3 variables, but watching one of them takes up 3 watchpoint resources, I might try to figure out a way to do what I want without that variable watchpoint. We try never to have pieces of information that you can ONLY get from the command-line as that renders GUI’s second class citizens. So if this information is useful to print, there should be an API for it…

Jim

On Jul 19, 2023, at 1:12 PM, Jason Molenda via LLVM Discussion Forums notifications@llvm.discoursemail.com wrote:

jasonmolenda
July 19

DavidSpickett:

What is the justification to provide even the WatchpointLocations? Beyond us being able to test it more nicely than string parsing (which is almost enough tbh).

From an IDE potentially you could take the currently used locations and match it up to the known hardware resources, but I don’t know that you’d be able to know that say of the 4 watchpoints these can do power of 2 these can do this feature etc.

My reason for adding the API to SBWatchpoint is only so that the front end can has a way of displaying watchpoint information like “You have one 32-byte watched region at 0x1000 which is implemented by dword watchpoints at 0x1000, 0x1008, 0x1010, 0x1018”, like watchpoint list will do in commandline lldb.

Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

jasonmolenda · July 19, 2023, 9:30pm

A fun part of our MIPS watchpoint support is that apparently the hardware does not distinguish the low 3 bits of the watchpointed address. If you watch 4 bytes at 0x104 and someone accesses 4 bytes at 0x100, it will match the wachpoint. The MIPS support in lldb decodes the LD/ST instruction to calculate what address it was accessing.

From my reading of SVE Streaming Mode, we will have a similar issue on AArch64 when in SSVE; an access near a watched region (at a hardware-dependent granule) may trigger the watchpoint even if the watched region was not actually accessed. There are flag bits in the ESR that indicate when the processor may be reporting a false watchpoint hit. This is a slightly different problem than the traditional issue with AArch64 where a memory access that includes a watched region, but extends before/after it, may report a trigger address (FAR address) outside the watched region. The Armv9A doc also talks about a mode where the watchpoint number that was triggered is reported instead of a FAR address.

DavidSpickett · July 20, 2023, 8:51am

My friendly neighborhood GDB maintainer told me that GDB doesn’t report writes of the same value, which supports changing our default to modify.

LLDB reports a write of 0 to a variable that was already 0:

(lldb) c
Process 3460123 resuming

Watchpoint 1 hit:
old value: 0
new value: 0
Process 3460123 stopped
* thread #1, name = 'test.o', stop reason = watchpoint 1
    frame #0: 0x0000aaaaaaaaa728 test.o`main at test.c:5:10
   2
   3    int main() {
   4      a = 0;
-> 5      return a;
   6    }

Which I would expect, if one takes write literally. GDB does not:

(gdb) watch a
Hardware watchpoint 2: a
(gdb) c
Continuing.
[Inferior 1 (process 3460239) exited normally]

You and GDB are right that 99% of the time modifications are what we’re looking for.

Great, that simplifies this a lot.

Sure, I agree with that. My concern was more the API allowing programs other than lldb to mess with the layout, which would add complication.

The MIPS tools I referenced would do that. Rather than solve a very difficult breakpoint packing problem, bail out and let the user can figure it out for their specific case.

tedwoodward · July 20, 2023, 2:43pm

I’ve got a couple thoughts:

Breaking up the watchpoint into watchpoint locations should be handled on the stub. I don’t think we want lldb handling hardware resource allocation. lldb should simply tell the stub “give me a watchpoint of type x, address y size z”. We should be able to query the stub to see the watchpoint locations in a watchpoint.
PowerPC Book E has powerful DAC (Data Address Compare) registers. You can pair them to make an address range, or an inverse address range, so you can say “watch accesses between 0x100 and 0xfffffff0” or “watch accesses that aren’t between 0x100 and 0xfffffff0”. We should be able to support this type of hardware watchpoint feature.

jasonmolenda · July 20, 2023, 11:32pm

Unfortunately most stubs are unsophisticated in how they manage the watchpoint resources. debugserver has always handled unaligned watchpoint requests by splitting them into two watchpoint registers, and I recently added AArch64 MASK watchpoint support to debugserver and have it split watchpoint requests into BAS/MASK watchpoints to cover the request (I did this as a proof of concept for the algorithm that I would eventually add in lldb).

When we look at JTAG stubs in particular, where the stub is a small agent, expecting all of them to these types of sophisticated watchpoint management is not ideal IMO. I think this logic should exist in lldb, and it gives us a way to show the user what limited hardware resources are actually being used to implement their request. Whereas it would be hidden from lldb & the user if it was all behind the gdb remote serial protocol stub. (lldb today on Darwin doesn’t really have any idea how many watchpoint registers were used by debugserver to watch a memory region for this very reason)

I quickly read through the arch docs on this. Unlike AArch64/Intel where you can imagine the implementation in terms of bitmask comparisons, this arch describes mechanisms for bitmask address matching or for address value comparisons. You can say “trap on address greater/equal to A, and less than B” or “trap on address less-than A or greater than B”. I’m not clear if power of 2 restrictions on the two addresses apply, or if it’s implementation-defined. If it was power of 2, that makes it a lot closer to AArch64 MASK watchpoints. It seems like the hardware only has one of these watchpoints? Again, I read only quickly.

tedwoodward · July 21, 2023, 3:47am

jasonmolenda:

tedwoodward:

Breaking up the watchpoint into watchpoint locations should be handled on the stub. I don’t think we want lldb handling hardware resource allocation. lldb should simply tell the stub “give me a watchpoint of type x, address y size z”. We should be able to query the stub to see the watchpoint locations in a watchpoint.

Unfortunately most stubs are unsophisticated in how they manage the watchpoint resources. debugserver has always handled unaligned watchpoint requests by splitting them into two watchpoint registers, and I recently added AArch64 MASK watchpoint support to debugserver and have it split watchpoint requests into BAS/MASK watchpoints to cover the request (I did this as a proof of concept for the algorithm that I would eventually add in lldb).

When we look at JTAG stubs in particular, where the stub is a small agent, expecting all of them to these types of sophisticated watchpoint management is not ideal IMO. I think this logic should exist in lldb, and it gives us a way to show the user what limited hardware resources are actually being used to implement their request. Whereas it would be hidden from lldb & the user if it was all behind the gdb remote serial protocol stub. (lldb today on Darwin doesn’t really have any idea how many watchpoint registers were used by debugserver to watch a memory region for this very reason)

Back in my days of doing JTAG debuggers, we had smart probes, so we could (and did) do this kind of computations in the stub. But if most are dumb, then it’s reasonable to do it in the debugger.

jasonmolenda:

tedwoodward:

PowerPC Book E has powerful DAC (Data Address Compare) registers. You can pair them to make an address range, or an inverse address range, so you can say “watch accesses between 0x100 and 0xfffffff0” or “watch accesses that aren’t between 0x100 and 0xfffffff0”. We should be able to support this type of hardware watchpoint feature.

I quickly read through the arch docs on this. Unlike AArch64/Intel where you can imagine the implementation in terms of bitmask comparisons, this arch describes mechanisms for bitmask address matching or for address value comparisons. You can say “trap on address greater/equal to A, and less than B” or “trap on address less-than A or greater than B”. I’m not clear if power of 2 restrictions on the two addresses apply, or if it’s implementation-defined. If it was power of 2, that makes it a lot closer to AArch64 MASK watchpoints. It seems like the hardware only has one of these watchpoints? Again, I read only quickly.

There are 2 DACs. Each one can watch 1 address. They can individually trigger on 1 address, be set to trigger on an address mask (power of 2, aligned), or be joined together to trigger on a range or inverse range.

I believe https://www.nxp.com/docs/en/reference-manual/E6500RM.pdf was the latest doc, before Freescale switched its networking processors to use ARM.

jasonmolenda · September 29, 2023, 5:17am

A small update on this topic. Via https://github.com/llvm/llvm-project/pull/66308 last week I landed support for modify style watchpoints, one of the requisites for large watchpoints where we may watch larger regions of memory than the user requested. And got the testsuites working again after landing that.

Now I’m starting on the next part of this project.

Today the Target has a WatchpointList, and watchpoints are added to it via Target::CreateWatchpoint. You cannot call Target::CreateWatchpoint when there is no current Process (it probably should have been under Process, but Breakpoints are under Target so I can see why it’s there). When we add a Watchpoint to the WatchpointList, we then call Process::EnableWatchpoint to send the gdb-remote packet etc to enable it. When we create a Watchpoint at the same address of an existing Watchpoint, if their size and type are the same, one of them is disabled (surprise if you have conditions on one of them, etc). If the two Watchpoints at the same address have different size/type, the old one is disabled. If the addresses of the watchpoints are within the same minimum watchable granule (e.g. bytes 2 & 3 in a word, watched by two separate watchpoints), that is not recognized or handled, they’re both sent to the gdb stub.

I will keep Target::CreateWatchpoint as the interface for adding a watchpoint, and its restriction to having a Process.

We will give the Watchpoint to Process. Process will break the user’s request (memory address & size) into parts that can be covered by watchpoint register(s) and the stub can set. I’ll probably default to “a stub can watch some unknown number of pointer-sized memory regions,” e.g. 8-byte regions on a 64-bit target. And something like debugserer than can use either up-to-8-byte Byte Address Select watchpoints or power-of-2 MASK watchpoints on AArch64, will have further capabilities.

Process will keep a list of WatchpointResources, one per hardware watchpoint register that the stub/target can support, that are currently active. A WatchpointResource will refer back to which Watchpoints are using it.

A Watchpoint will have one or more WatchpointLocations which will track the portion of the watchpoint covered by one of the WatchpointResources, the number of WatchpointLocations in a Watchpoint will match the number of WatchpointResources.

Three example scenarios:

Watch 16 bytes on a gdb stub that can only watch doublewords (8-bytes). We have one Watchpoint, with two WatchpointLocations, and the process has two WatchpointResources. The WatchpointLocations and the WatchpointResources are all tracking 8 bytes.

Watch 48 bytes on a gdb stub that can watch power-of-2 (mask) regions. We have one Watchpoint with one WatchpointLocation, and Process has one WatchpointResource. The WatchpointLocation specifies that we are watching 48 bytes at this address. The WatchpointResource shows that we are actually watching 64 bytes with the hardware watchpoint register.

Watchpoint 1: watch 1 byte at addr 0x1002. Watchpoint 2: watch 1 byte at addr 0x1003. Process has one WatchpointResource watching 2 bytes starting at 0x1002. Both Watchpoints are referring to the one shared WatchpointResource. The WatchpointResource has pointers back to the two Watchpoints which are using it.

The WatchpointResource idea is similar to a BreakpointSite. When we stop at a break instruction, we are at a BreakpointSite and we need find all BreakpointLocationss that use this BreakpointSite address. We need to evaluate all of the BreakpointLocations/Breakpoints to decide if we have an ignore count, conditions, commands to execute, continue silently, etc.

In the same way, two user Watchpoints may be handled with one hardware watchpoint register, so we will have a WatchpointResource which is matched against the watchpoint trap. We will find all Watchpoints that are using this WatchpointResource to watch some/all of their user request, and evaluate what actions to take based on this.

To get past the watchpoint trap, we disable our WatchpointResource, instruction step, re-enable it (and collect new values for write/modify type watchpoints).

An example of two user specified watchpoints, a 96 byte one and a 32-byte one next to it, both handled by a single 128-byte watchpoint register (AArch64 mask style watchpoint that debugserver supports):

	┌─────────────────────────────────────────────────┐    ┌──────────────────────────────┐
	│ Target                                          │    │ Process                      │
	├─────────────────────────────────────────────────┤    ├──────────────────────────────┤
	│     WatchpointList                              │    │                              │
	│                                                 │    │                              │
	│        Watchpoint 1                             │ ┌──┼─────▶ WatchpointResource 1   │
	│        user request: addr 0x10200 size 96◀──────┼─┼──┼─────▶ addr 0x10200 size 128  │
	│           WatchpointLocation 1                  │ │  │                              │
	│           addr 0x10200 size 96                  │ │  │                              │
	│                                                 │ │  │                              │
	│        Watchpoint 2                             │ │  │                              │
	│        user request: addr 0x10260 size 32       │ │  │                              │
	│           WatchpointLocation 1            ◀─────┼─┘  └──────────────────────────────┘
	│           addr 0x10260 size 32                  │                                    
	└─────────────────────────────────────────────────┘

and watchpoint list might look like:

Watchpoint 1: addr = 0x10200 size = 96 state = enabled type = m
      Watchpoint Resource 1: addr = 0x10200 size = 128

Watchpoint 2: addr = 0x10260 size = 32 state = enabled type = m
      Watchpoint Resource 1: addr = 0x10200 size = 128

An example of one watchpoint watching 2 bytes starting at addr 0x1000, and a second watchpoint watching 18 bytes starting at addr 0x1002 with a remote stub that can only do doubleword (8 byte) watchpoints:

	┌────────────────────────────────────────────────┐             ┌──────────────────────────────┐
	│ Target                                         │             │   Process                    │
	├────────────────────────────────────────────────┤             │──────────────────────────────┤
	│     WatchpointList                             │             │                              │
	│                                                │┌────────────┼──▶ WatchpointResource 1      │
	│        Watchpoint 1                            ││┌───────────┼──▶ addr 0x1000 size 8        │
	│        user request: addr 0x1000 size 2        │││           │                              │
	│           WatchpointLocation 1   ◀─────────────┼┘│           │    WatchpointResource 2      │
	│           addr 0x1000 size 2                   │ │┌──────────┼─▶  addr 0x1008 size 8        │
	│                                                │ ││          │                              │
	│        Watchpoint 2                            │ ││          │    WatchpointResource 3      │
	│        user request: addr 0x1002 size 18       │ ││┌─────────┼──▶ addr 0x1010 size 4        │
	│           WatchpointLocation 1     ◀───────────┼─┘││         │                              │
	│           addr 0x1002 size 6                   │  ││         └──────────────────────────────┘
	│           WatchpointLocation 2        ◀────────┼──┘│                                         
	│           addr 0x1008 size 8                   │   │                                         
	│           WatchpointLocation 3        ◀────────┼───┘                                         
	│           addr 0x1010 size 4                   │                                             
	└────────────────────────────────────────────────┘

and watchpoint list might look like:

Watchpoint 1: addr = 0x1000 size = 2 state = enabled type = m
      Watchpoint Resource 1: addr = 0x1000 size = 8

Watchpoint 2: addr = 0x1002 size = 18 state = enabled type = m
      Watchpoint Resource 1: addr = 0x1000 size = 8
      Watchpoint Resource 2: addr = 0x1008 size = 8
      Watchpoint Resource 3: addr = 0x1010 size = 4

I’m still not completely convinced with the usefulness of the WatchpointLocations on the Watchpoints. Maybe they could be used from a UI perspective to the user (“you asked to watch 100 bytes, it’s implemented with a 64-byte WP Resource watching 64 bytes, and a 64-byte WP Resource watching 36 bytes”), but we can also do that math from the user’s original request (watch 100 bytes at 0x1000) and the addresses of the WatchpointResources.

I was throwing around ideas with @jingham and Jim thinks that a Watchpoint could register a MatchDecider callback for each WatchpointResource it is assigned to, and the MatchDecider would take/get the WatchpointLocation for this WatchpointResource and decide if the watched region was actually accessed. The idea is that if I watch a 24-byte object using a 32-byte watchpoint, and we get a watchpoint trap for that 32-byte watchpoint, something may be able to evaluate if it was the 24 bytes the user wanted to watch, or the 8 bytes not being watched - a false positive. e.g. instruction emulation, like MIPS does for its LD/ST instructions, which is straightforward.

However, if a WatchpointLocation is watching the same range of bytes as the WatchpointResource – an 8 byte watchpoint on a 64-bit target – then there’s no further checks needed, and a no-op MatchDecider would be reigstered.

“modify” watchpoints might be considered a type of decider as well, where we are filtering out writes to the object that are writing the same value it already had.

For the targets we support today, and especially with no plans any time soon to extend instruction emulation to handle all loads/stores for our architectures, I don’t think any of this flexibility is adding anything. But on the other hand, this is the kind of callback that has made it pretty easy to extend ThreadPlans over the years, without needing to add lots of conditionalized logic around the match-decider engine for all the different ways we might disambiguate a watchpoint trap.

In short, I don’t really know what I want to do here yet regarding WatchpointLocation and this MatchDecider callback scheme, but it’s something I’m thinking about right now as I start looking at implementing these ideas. I can also see not adding a MatchDecider callback and not having WatchpointLocations, and the Watchpoint will refer to one or more WatchpointResources in the target when it is enabled.

jasonmolenda · October 25, 2023, 11:05pm

As I’ve been writing the first PR where I introduce the WatchpointResource class and set watchpoints in the inferior in terms of them, [lldb] [mostly NFC] Large WP foundation: WatchpointResources by jasonmolenda · Pull Request #68845 · llvm/llvm-project · GitHub , I noticed on SB API issue that I will run in to later in this work. SBThread::GetStopReasonDataCount() and SBThread::GetStopReasonDataAtIndex() return values based on the StopReason, as documented in the header,

  /// Stop Reason              Count Data Type
  /// ======================== ===== =========================================
  /// eStopReasonNone          0
  /// eStopReasonTrace         0
  /// eStopReasonBreakpoint    N     duple: {breakpoint id, location id}
  /// eStopReasonWatchpoint    1     watchpoint id
  /// eStopReasonSignal        1     unix signal number
  /// eStopReasonException     N     exception data
  /// eStopReasonExec          0
  /// eStopReasonFork          1     pid of the child process
  /// eStopReasonVFork         1     pid of the child process
  /// eStopReasonVForkDone     0
  /// eStopReasonPlanComplete  0

Right now a eStopReasonWatchpoint has 1 StopReasonData entry, which is the watchpoint id. But we need to move to a model closer to eStopReasonBreakpoint where eStopReasonWatchpoint will return N watchpoint id’s, for all watchpoints that are sharing the WatchpointResource which was accessed in this stop.

There won’t be any way to get existing SB API programs to understand that multiple watchpoints were hit by the same access, short of changing this to an eStopReasonWatchpointResource (or whatever) which is defined to return N data items, and all SB API programs would need to handle this new StopReason.

I think it’ll probably be better to keep using eStopReasonWatchpoint and API users will need to check the StopReasonDataCount instead of assuming there is 1 data item with a watchpoint id in it. That means that existing programs, when lldb hits a WatchpointResource that is owned by two Watchpoints at the same memory region, will only know about the first watchpoint id (until they’re updated to check the number of EventData’s etc)

@clayborg @jingham @JDevlieghere

jingham · October 25, 2023, 11:53pm

Yup, the restriction that you’d only ever be able to set one watchpoint on any given address was not a good choice, and this is fall out from that choice.

I agree that we should still send the same stop reason, having multiple watchpoints on one address will still be uncommon after your changes make it possible, so this will keep older clients working almost all the time, and they will still report one of the multiple watchpoints hit if they exist, and then if you did f in the console you’d see the full list of stops. This seems like the least disruptive compromise.

And it will be easy for clients to change to support the new behavior, since they just need to add a loop on StopReasonDataCount, and remain backwards compatible.

jasonmolenda · January 30, 2024, 8:23am

FTR the first large patch of refactoring and adding WatchpointResources landed in December, and I’ve just put up a PR which now breaks the user’s watchpoint request into multiple WatchpointResources, so we can watch up to 32-bytes doubleword-aligned on most 64-bit targets that have four hardware watchpoint registers. There remains work to be done, but this gets us the user visible feature of the work. The PR is at [lldb] Add support for large watchpoints in lldb by jasonmolenda · Pull Request #79962 · llvm/llvm-project · GitHub

Topic		Replies	Views
AArch64 watchpoints - reported address outside watched range; adopting MASK style watchpoints LLDB	10	967	July 18, 2023
Problem with watchpoints LLDB	8	136	September 15, 2016
[Bug 14416] New: Watchpoints are not supported on Linux LLDB	1	88	May 13, 2013
how to set a watchpoint on an "unsigned short" parameter ? LLDB	3	115	February 15, 2019
[PATCH] Remove unnecessary writing to dr6/dr7 on linux LLDB	0	86	February 21, 2014