RFC: Break/Watchpoint refactor

I have been giving study to the private and public classes for break/watchpoints and I have some ideas for how to improve them. I am looking for comments from the community on this now to avoid wasted work and invite suggestion for improvements and any objections you may have.

Some problems with things as they stand:

1. The Watchpoint and SBWatchpoint classes share very little implementation with the Breakpoint and SBBreakpoint classes. Both Breakpoint and Watchpoint inherit from Stoppoint, but that class provides very little functionality and not much of an interface. This is both a waste and a potential hazard (in the sense that we run the risk of allowing breakpoints and watchpoints to evolve in parallel thus complicating the API and making things more difficult to merge later).
2. The SBWatchpoint class lacks feature parity with SBBreakpoint (e.g. no callback implementation in SBWatchpoint).
3. The Breakpoint class has grown an ecosystem of support classes which are unnecessarily specific to it. For example, I don't see a huge reason to maintain a totally distinct implementation for BreakpointList and WatchpointList.
4. All of these classes are "old school" (not necessarily bad, but potentially a problem). For example:
   a. lots of const char* running around.
   b. DISALLOW_COPY_AND_ASSIGN(BreakpointEventData) to make ctors and such private rather than using ctor() = delete (which provides better compiler errors)
   c. use of Error& args in function signatures as opposed to Expected<ReturnType>.
   d. callback implementation uses function pointers (an ever confusing topic, especially for new programmers) where I think we could use templated methods (or just a parent Callback class) to significant effect.
5. Confusing or simply wrong documentation (e.g. Breakpoint::ResolveBreakpointInModules has arguments marked as @param[in] when the description of those arguments clearly indicates that they will be the output of the function).
6. We simply don't have a Finishpoint class (triggers callback at the conclusion of the function or scope)

My plan:

I would like to refactor this in a few phases.

Phase 1: Low hanging fruit

  1. All of these classes are “old school” (not necessarily bad, but
    potentially a problem). For example:
    a. lots of const char* running around.

We should use llvm::StringRef here.

b. DISALLOW_COPY_AND_ASSIGN(BreakpointEventData) to make ctors and
such private rather than using ctor() = delete (which provides better
compiler errors)
c. use of Error& args in function signatures as opposed to
Expected.

LLDB already has its own class called Error, and it makes it confusing if we’re going to be using llvm::Error as well. In an earlier thread I proposed changing lldb::Error to LLDBError for exactly this reason. I would suggest doing this first.

d. callback implementation uses function pointers (an ever confusing
topic, especially for new programmers) where I think we could use
templated methods (or just a parent Callback class) to significant effect.

We should consider using llvm::function_ref<> here.

More details on each of these below.

Phase 2: Restructure / Modernize the Private API / Implementation

  • Change Error& out parameters to Expected.

As mentioned, we should rename lldb::Error first so as to avoid naming conflicts with llvm. Granted, they are each in their own namespace, but still, it will lead to confusion, and prevents the use of “using namespace llvm;” as it currently stands.

  • Get rid of all the const char* vars / args in favor of a better string
    type (which?)

llvm::StringRef. Anyone returning a const char* is saying they don’t own the backing memory. That’s exactly what a StringRef is for, but it has many other benefits such as very efficient searching, substring, and parsing methods.

  • Prefer explicitly deleted copy ctor / assignments over multiline macro
    DISALLOW_COPY_AND_ASSIGN
  • Try to reduce the (perhaps excessive) use of friendship between the
    support classes. For instance, is it necessary to give
    BreakpointLocation access to private data members from
    BreakpointLocationList, Process, BreakpointSite, and
    StopInfoBreakpoint? Wouldn’t it be better to expand the functionality
    of those other classes?

A more significant change could be a rewrite of the callback functionality.

There are some problems with the way callbacks are implemented in terms
of maintainability and extensibility. I think that using templates and
simple inheritance we could shift to using callback objects instead of
function pointers. I have a crude code sketch of this plan in the works
now, and I will submit that if there is interest in this idea.
Essentially, the idea is to let the user define their own Breakpoint or
Watchpoint (or whatever) child class which overrides a pure virtual run
method from a parent StoppointCallback templated class. The template
parameter of StoppointCallback would allow us to define a
std::unique_ptr member where the user can hang any data they
want for the callback. That saves us from void pointers (which I find
very awkward).

I’m not a huge fan of this. Inheriting from Breakpoint or Watchpoint sounds old school to me. Implementation inheritance in general should be avoided wherever possible IMO. Also, you’ve mentioned that StoppointCallback would only have a template argument, but it’s possible that different types of callback take completely incompatible types and numbers of arguments. This is something that can’t be described by a single function with a template parameter.

I think the canonical pattern for what you want here is “double dispatch”. You’ve got something like this:

struct CallbackArgs {
virtual void dispatch(StoppointBase &point) = 0;
};
struct WatchpointArgs : public CallbackArgs {
private:
int Old;
int New;
public:
void dispatch(StoppointBase &point) override {
static_cast<Watchpoint&>(point).Callback(Old, New);
};
};

struct BreakpointArgs : public CallbackArgs {
private:
int id;
public:
void dispatch(StoppointBase &point) override {
static_cast<Breakpoint&>(point).Callback(id);
}

};

It’s a little bit awkward though, so you can perhaps simplify this by making use of llvm’s casting infrastructure. Check out llvm/include/Casting.h. But basically you could move all this dispatch logic into a single function that looks like this:

if (auto wp = llvm::dyn_cast<Watchpoint*>(stop_point)) {
wp->Callback( … );
} else if (auto bp = llvm::dyn_cast<Breakpoint*>(stop_point)) {
bp->Callback( … );
}

And these callback functions could have completely different signatures.

template
class StoppointCallback {
private:
std::unique_ptr m_data;
// …
};

Yes, this kind of pattern is good. Then just pass StoppointUserData& into the callback and let the caller cast it to the appropriate derived type.

  • GDB style tracepoints. This may be very difficult but it seems very
    desirable.

I don’t see why this would be hard. but I also don’t think it’s sufficiently different from a breakpoint that it needs to be that special. A tracepoint is just a breakpoint that does some stuff and then automatically continues. We can already stop at breakpoints, run commands, and resume commands. Why can’t we just do all of those in a single scripted operation? I can envision a command like:

break set -n foo --exec “print p” --exec “break set -n bar” --continue

which sets a breakpoint at foo, and when it gets hit prints the value of p and sets another breakpoint in bar, then resumes.

I have been giving study to the private and public classes for break/watchpoints and I have some ideas for how to improve them. I am looking for comments from the community on this now to avoid wasted work and invite suggestion for improvements and any objections you may have.

Some problems with things as they stand:

1. The Watchpoint and SBWatchpoint classes share very little implementation with the Breakpoint and SBBreakpoint classes. Both Breakpoint and Watchpoint inherit from Stoppoint, but that class provides very little functionality and not much of an interface. This is both a waste and a potential hazard (in the sense that we run the risk of allowing breakpoints and watchpoints to evolve in parallel thus complicating the API and making things more difficult to merge later).
2. The SBWatchpoint class lacks feature parity with SBBreakpoint (e.g. no callback implementation in SBWatchpoint).

The main problem here is that Watchpoints & Breakpoints should share the Options class, and most of the StopInfo DoOnRemoval. I don’t think you’ll need to write a lot of new code to do this, it’s mostly ripping out the WatchpointOptions, using BreakpointOptions instead adapting as necessary. Ditto for the StopInfo{Watchpoint,Breakpoint}.

3. The Breakpoint class has grown an ecosystem of support classes which are unnecessarily specific to it. For example, I don't see a huge reason to maintain a totally distinct implementation for BreakpointList and WatchpointList.
4. All of these classes are "old school" (not necessarily bad, but potentially a problem). For example:
a. lots of const char* running around.
b. DISALLOW_COPY_AND_ASSIGN(BreakpointEventData) to make ctors and such private rather than using ctor() = delete (which provides better compiler errors)
c. use of Error& args in function signatures as opposed to Expected<ReturnType>.

This should not be done piecemeal. If we decide to change how we do error reporting in lldb, that should be done consistently throughout.

d. callback implementation uses function pointers (an ever confusing topic, especially for new programmers) where I think we could use templated methods (or just a parent Callback class) to significant effect.

Folks on the outside of the SB API’s need to be able to pass callbacks in. We don’t currently have any templates you need to instantiate or classes you need to override to go from outside the SB API to inside it. So whatever happens on the lldb_private side, for the SB API’s we’ll probably have to stick with void * & function pointers.

5. Confusing or simply wrong documentation (e.g. Breakpoint::ResolveBreakpointInModules has arguments marked as @param[in] when the description of those arguments clearly indicates that they will be the output of the function).
6. We simply don't have a Finishpoint class (triggers callback at the conclusion of the function or scope)

My plan:

I would like to refactor this in a few phases.

Phase 1: Low hanging fruit
--------------------------

* Kick as much functionality as is reasonable from Breakpoint up to Stoppoint. At a minimum I would expand the Stoppoint API to include features which should logically be available to Breakpoint, Watchpoint, a hypothetical Finishpoint, and any future *point class. I would also try to refactor the Breakpoint support classes (e.g. BreakpointID) to derive from a new class (i.e. StoppointID) and kick functionality up the chain there as well. This would include methods like GetDescription, SetOptions, GetOptions, AddName, and so on.

Again, most of the overlap isn’t actually in the breakpoint and watchpoint classes, but in the Options classes. After all, the setting of watchpoints and breakpoints are very different, but that they have conditions and callbacks are very similar. Keeping the setting part separate and sharing the reaction to hitting the stop points seems to more closely model the objects better.

* Review and clean the source docs

* Expand test coverage where that seems easy (perhaps there are tests hiding somewhere I didn't see?)

test/testcases/functionalities/breakpoint

There are lots of tests there, but feel free to add tests as you go.

I would try to break private APIs minimally if at all. SB api will not be broken.

This should go a long ways toward resolving problems 1, 2, 3, and 5.

Phase 2: Restructure / Modernize the Private API / Implementation
-----------------------------------------------------------------

* Change Error& out parameters to Expected<ReturnType>.

Again, this is a larger policy change. We shouldn’t make different sections of lldb handle errors differently.

* Get rid of all the const char* vars / args in favor of a better string type (which?)

StringRef’s seem to be the vogue these days.

* Prefer explicitly deleted copy ctor / assignments over multiline macro DISALLOW_COPY_AND_ASSIGN
* Try to reduce the (perhaps excessive) use of friendship between the support classes. For instance, is it necessary to give BreakpointLocation access to private data members from BreakpointLocationList, Process, BreakpointSite, and StopInfoBreakpoint? Wouldn't it be better to expand the functionality of those other classes?

A more significant change could be a rewrite of the callback functionality.

There are some problems with the way callbacks are implemented in terms of maintainability and extensibility. I think that using templates and simple inheritance we could shift to using callback objects instead of function pointers. I have a crude code sketch of this plan in the works now, and I will submit that if there is interest in this idea. Essentially, the idea is to let the user define their own Breakpoint or Watchpoint (or whatever) child class which overrides a pure virtual run method from a parent StoppointCallback templated class. The template parameter of StoppointCallback would allow us to define a std::unique_ptr<UserData> member where the user can hang any data they want for the callback. That saves us from void pointers (which I find very awkward).

template <class UserData>
class StoppointCallback {
private:
  std::unique_ptr<UserData> m_data;
// ...
};

I also think that such a decision requires significant thought before we pull that trigger. It does constitute a significant addition to the SB api, tho I don't think it would break the SB api per se since we can always use overloading and template specialization to keep the old functionality. On the other hand, ABI compatibility may be a problem. I don't know that much about SWIG or the needs of LLDB from a ABI stability perspective, so please speak up if I'm suggesting something crazy.

We’ve been pretty strict about not requiring subclassing or specialization of SB API’s classes to use them. We’re serious about maintaining binary compatibility across lldb versions.

That said, it is somewhat difficult to keep the API consistent between Breakpoint and Watchpoint using function pointers; the signature associated with each will differ (e.g. Watchpoint should provide old and new values of the subject variable, while that does not apply to Breakpoint at all). I welcome any suggestions which allow us to avoid logic duplication. My goto just happens to be templates here.

I think you would call back into the watchpoint to get the old and new values. I don’t think you would need more than the frame passed into the watchpoint callback, and the location.

Doing even a subset of phase 2 would go a long ways toward fixing problem 4.

Phase 3: Expanded Feature Set
-----------------------------

I would love to see these features:

* Watch variables from a function in every scope of that function OR in only select invocations. Perhaps this already exists, but I can only get it to watch from a particular scope.
* Better (read functional) remote GDB server break / watch. As has been disused in other threads, this is not really easy but is very desirable. For instance, I would very much like to see lldb play nice with valgrind's vgdb server as it offers some amazing break / watch functionality. As of today I need to do a great many hacks to make that work properly.

Not sure what you mean by this.

* Finishpoints. This just seems obvious. (problem 6)

This is an overlap of functionality with stepping, finish and step-until. We’d have to sketch out some use cases to see which functionality seems more natural, and if both, how to keep them from conflicting.

* GDB style tracepoints. This may be very difficult but it seems very desirable.

That is a pretty big problem, since we would have to implement both the server & client sides.

Jim

Why not just move DISALLOW_COPY_AND_ASSIGN over to using =delete ? That seems like a trivial change…

Thanks,
- Enrico
:envelope_with_arrow: egranata@.com :phone: 27683

  1. All of these classes are “old school” (not necessarily bad, but
    potentially a problem). For example:
    a. lots of const char* running around.

We should use llvm::StringRef here.

Sounds good to me. Just wanted to make sure I’m on the same page as everyone else.

b. DISALLOW_COPY_AND_ASSIGN(BreakpointEventData) to make ctors and
such private rather than using ctor() = delete (which provides better
compiler errors)
c. use of Error& args in function signatures as opposed to
Expected.

LLDB already has its own class called Error, and it makes it confusing if we’re going to be using llvm::Error as well. In an earlier thread I proposed changing lldb::Error to LLDBError for exactly this reason. I would suggest doing this first.

d. callback implementation uses function pointers (an ever confusing
topic, especially for new programmers) where I think we could use
templated methods (or just a parent Callback class) to significant effect.

We should consider using llvm::function_ref<> here.

More details on each of these below.

Phase 2: Restructure / Modernize the Private API / Implementation

  • Change Error& out parameters to Expected.

As mentioned, we should rename lldb::Error first so as to avoid naming conflicts with llvm. Granted, they are each in their own namespace, but still, it will lead to confusion, and prevents the use of “using namespace llvm;” as it currently stands.

  • Get rid of all the const char* vars / args in favor of a better string
    type (which?)

llvm::StringRef. Anyone returning a const char* is saying they don’t own the backing memory. That’s exactly what a StringRef is for, but it has many other benefits such as very efficient searching, substring, and parsing methods.

  • Prefer explicitly deleted copy ctor / assignments over multiline macro
    DISALLOW_COPY_AND_ASSIGN
  • Try to reduce the (perhaps excessive) use of friendship between the
    support classes. For instance, is it necessary to give
    BreakpointLocation access to private data members from
    BreakpointLocationList, Process, BreakpointSite, and
    StopInfoBreakpoint? Wouldn’t it be better to expand the functionality
    of those other classes?

A more significant change could be a rewrite of the callback functionality.

There are some problems with the way callbacks are implemented in terms
of maintainability and extensibility. I think that using templates and
simple inheritance we could shift to using callback objects instead of
function pointers. I have a crude code sketch of this plan in the works
now, and I will submit that if there is interest in this idea.
Essentially, the idea is to let the user define their own Breakpoint or
Watchpoint (or whatever) child class which overrides a pure virtual run
method from a parent StoppointCallback templated class. The template
parameter of StoppointCallback would allow us to define a
std::unique_ptr member where the user can hang any data they
want for the callback. That saves us from void pointers (which I find
very awkward).

I’m not a huge fan of this. Inheriting from Breakpoint or Watchpoint sounds old school to me. Implementation inheritance in general should be avoided wherever possible IMO. Also, you’ve mentioned that StoppointCallback would only have a template argument, but it’s possible that different types of callback take completely incompatible types and numbers of arguments. This is something that can’t be described by a single function with a template parameter.

I think the canonical pattern for what you want here is “double dispatch”. You’ve got something like this:

struct CallbackArgs {
virtual void dispatch(StoppointBase &point) = 0;
};
struct WatchpointArgs : public CallbackArgs {
private:
int Old;
int New;
public:
void dispatch(StoppointBase &point) override {
static_cast<Watchpoint&>(point).Callback(Old, New);
};
};

struct BreakpointArgs : public CallbackArgs {
private:
int id;
public:
void dispatch(StoppointBase &point) override {
static_cast<Breakpoint&>(point).Callback(id);
}

};

It’s a little bit awkward though, so you can perhaps simplify this by making use of llvm’s casting infrastructure. Check out llvm/include/Casting.h. But basically you could move all this dispatch logic into a single function that looks like this:

if (auto wp = llvm::dyn_cast<Watchpoint*>(stop_point)) {
wp->Callback( … );
} else if (auto bp = llvm::dyn_cast<Breakpoint*>(stop_point)) {
bp->Callback( … );
}

And these callback functions could have completely different signatures.

template
class StoppointCallback {
private:
std::unique_ptr m_data;
// …
};

Yes, this kind of pattern is good. Then just pass StoppointUserData& into the callback and let the caller cast it to the appropriate derived type.

I will look into the llvm::function_ref<> as that may be the real answer here.
I also tend to agree that inheritance is mostly undesirable. My argument for it in this case is that it would be nice to enforce a consistent interface, but that may be better done manually.
I always try to avoid casting, but that may be unrealistic here.

  • GDB style tracepoints. This may be very difficult but it seems very
    desirable.

I don’t see why this would be hard. but I also don’t think it’s sufficiently different from a breakpoint that it needs to be that special. A tracepoint is just a breakpoint that does some stuff and then automatically continues. We can already stop at breakpoints, run commands, and resume commands. Why can’t we just do all of those in a single scripted operation? I can envision a command like:

break set -n foo --exec “print p” --exec “break set -n bar” --continue

which sets a breakpoint at foo, and when it gets hit prints the value of p and sets another breakpoint in bar, then resumes.

That is correct for an infinite speed computer. The issue is that GDB Tracepoints run their instructions without talking to the debugger (until the next time the program stops). This allows you to watch variables in applications that are timing sensitive (e.g., you introduce a bug in a VOIP application just from the lag associated with watching the data).

  1. All of these classes are “old school” (not necessarily bad, but
    potentially a problem). For example:
    a. lots of const char* running around.

We should use llvm::StringRef here.

b. DISALLOW_COPY_AND_ASSIGN(BreakpointEventData) to make ctors and
such private rather than using ctor() = delete (which provides better
compiler errors)
c. use of Error& args in function signatures as opposed to
Expected.

LLDB already has its own class called Error, and it makes it confusing if we’re going to be using llvm::Error as well. In an earlier thread I proposed changing lldb::Error to LLDBError for exactly this reason. I would suggest doing this first.

I don’t think we want different parts of lldb handling errors differently. That would be even more confusing.

d. callback implementation uses function pointers (an ever confusing
topic, especially for new programmers) where I think we could use
templated methods (or just a parent Callback class) to significant effect.

We should consider using llvm::function_ref<> here.

More details on each of these below.

Phase 2: Restructure / Modernize the Private API / Implementation

  • Change Error& out parameters to Expected.

As mentioned, we should rename lldb::Error first so as to avoid naming conflicts with llvm. Granted, they are each in their own namespace, but still, it will lead to confusion, and prevents the use of “using namespace llvm;” as it currently stands.

  • Get rid of all the const char* vars / args in favor of a better string
    type (which?)

llvm::StringRef. Anyone returning a const char* is saying they don’t own the backing memory. That’s exactly what a StringRef is for, but it has many other benefits such as very efficient searching, substring, and parsing methods.

  • Prefer explicitly deleted copy ctor / assignments over multiline macro
    DISALLOW_COPY_AND_ASSIGN
  • Try to reduce the (perhaps excessive) use of friendship between the
    support classes. For instance, is it necessary to give
    BreakpointLocation access to private data members from
    BreakpointLocationList, Process, BreakpointSite, and
    StopInfoBreakpoint? Wouldn’t it be better to expand the functionality
    of those other classes?

A more significant change could be a rewrite of the callback functionality.

There are some problems with the way callbacks are implemented in terms
of maintainability and extensibility. I think that using templates and
simple inheritance we could shift to using callback objects instead of
function pointers. I have a crude code sketch of this plan in the works
now, and I will submit that if there is interest in this idea.
Essentially, the idea is to let the user define their own Breakpoint or
Watchpoint (or whatever) child class which overrides a pure virtual run
method from a parent StoppointCallback templated class. The template
parameter of StoppointCallback would allow us to define a
std::unique_ptr member where the user can hang any data they
want for the callback. That saves us from void pointers (which I find
very awkward).

I’m not a huge fan of this. Inheriting from Breakpoint or Watchpoint sounds old school to me. Implementation inheritance in general should be avoided wherever possible IMO. Also, you’ve mentioned that StoppointCallback would only have a template argument, but it’s possible that different types of callback take completely incompatible types and numbers of arguments. This is something that can’t be described by a single function with a template parameter.

I think the canonical pattern for what you want here is “double dispatch”. You’ve got something like this:

struct CallbackArgs {
virtual void dispatch(StoppointBase &point) = 0;
};
struct WatchpointArgs : public CallbackArgs {
private:
int Old;
int New;
public:
void dispatch(StoppointBase &point) override {
static_cast<Watchpoint&>(point).Callback(Old, New);
};
};

struct BreakpointArgs : public CallbackArgs {
private:
int id;
public:
void dispatch(StoppointBase &point) override {
static_cast<Breakpoint&>(point).Callback(id);
}

};

It’s a little bit awkward though, so you can perhaps simplify this by making use of llvm’s casting infrastructure. Check out llvm/include/Casting.h. But basically you could move all this dispatch logic into a single function that looks like this:

if (auto wp = llvm::dyn_cast<Watchpoint*>(stop_point)) {
wp->Callback( … );
} else if (auto bp = llvm::dyn_cast<Breakpoint*>(stop_point)) {
bp->Callback( … );
}

And these callback functions could have completely different signatures.

I’m not sure this is solving a problem that we have. The callbacks get arguments which describe the environment in which the stop occurred (the execution context) and something identifying the stop point (ID & loc ID). The data specific to the breakpoint/watchpoint should come from the breakpoint/watchpoint. I don’t see any convincing reason these need to have this flexibility.

template
class StoppointCallback {
private:
std::unique_ptr m_data;
// …
};

Yes, this kind of pattern is good. Then just pass StoppointUserData& into the callback and let the caller cast it to the appropriate derived type.

Again, on the lldb_private side is fine. But we still need to pass callbacks and data across the SB API’s in a non-fragile way.

  • GDB style tracepoints. This may be very difficult but it seems very
    desirable.

I don’t see why this would be hard. but I also don’t think it’s sufficiently different from a breakpoint that it needs to be that special. A tracepoint is just a breakpoint that does some stuff and then automatically continues. We can already stop at breakpoints, run commands, and resume commands. Why can’t we just do all of those in a single scripted operation? I can envision a command like:

break set -n foo --exec “print p” --exec “break set -n bar” --continue

which sets a breakpoint at foo, and when it gets hit prints the value of p and sets another breakpoint in bar, then resumes.

That’s not what the tracepoints Daniel is talking about do. A tracepoint is an instruction to the server to stop, run an experiment (expressed in a little byte-coded language) without returning to the debugger, and store the result away. Then when the target stops at a “real" breakpoint, the debugger downloads the results of all the tracepoint experiments and presents this past history to the user in one way or the other.

It isn’t rocket science, but it does require server side support to be useful.

Jim

I have been giving study to the private and public classes for break/watchpoints and I have some ideas for how to improve them. I am looking for comments from the community on this now to avoid wasted work and invite suggestion for improvements and any objections you may have.

Some problems with things as they stand:

1. The Watchpoint and SBWatchpoint classes share very little implementation with the Breakpoint and SBBreakpoint classes. Both Breakpoint and Watchpoint inherit from Stoppoint, but that class provides very little functionality and not much of an interface. This is both a waste and a potential hazard (in the sense that we run the risk of allowing breakpoints and watchpoints to evolve in parallel thus complicating the API and making things more difficult to merge later).
2. The SBWatchpoint class lacks feature parity with SBBreakpoint (e.g. no callback implementation in SBWatchpoint).

The main problem here is that Watchpoints & Breakpoints should share the Options class, and most of the StopInfo DoOnRemoval. I don’t think you’ll need to write a lot of new code to do this, it’s mostly ripping out the WatchpointOptions, using BreakpointOptions instead adapting as necessary. Ditto for the StopInfo{Watchpoint,Breakpoint}.

So you would avoid inheritance and just make Watchpoint and Breakpoint take the same StoppointOptions class?

3. The Breakpoint class has grown an ecosystem of support classes which are unnecessarily specific to it. For example, I don't see a huge reason to maintain a totally distinct implementation for BreakpointList and WatchpointList.
4. All of these classes are "old school" (not necessarily bad, but potentially a problem). For example:
  a. lots of const char* running around.
  b. DISALLOW_COPY_AND_ASSIGN(BreakpointEventData) to make ctors and such private rather than using ctor() = delete (which provides better compiler errors)
  c. use of Error& args in function signatures as opposed to Expected<ReturnType>.

This should not be done piecemeal. If we decide to change how we do error reporting in lldb, that should be done consistently throughout.

Fair enough.

  d. callback implementation uses function pointers (an ever confusing topic, especially for new programmers) where I think we could use templated methods (or just a parent Callback class) to significant effect.

Folks on the outside of the SB API’s need to be able to pass callbacks in. We don’t currently have any templates you need to instantiate or classes you need to override to go from outside the SB API to inside it. So whatever happens on the lldb_private side, for the SB API’s we’ll probably have to stick with void * & function pointers.

That is fair. I may be able to avoid the entire problem with llvm::function_ref<>. Will look into it.

5. Confusing or simply wrong documentation (e.g. Breakpoint::ResolveBreakpointInModules has arguments marked as @param[in] when the description of those arguments clearly indicates that they will be the output of the function).
6. We simply don't have a Finishpoint class (triggers callback at the conclusion of the function or scope)

My plan:

I would like to refactor this in a few phases.

Phase 1: Low hanging fruit
--------------------------

* Kick as much functionality as is reasonable from Breakpoint up to Stoppoint. At a minimum I would expand the Stoppoint API to include features which should logically be available to Breakpoint, Watchpoint, a hypothetical Finishpoint, and any future *point class. I would also try to refactor the Breakpoint support classes (e.g. BreakpointID) to derive from a new class (i.e. StoppointID) and kick functionality up the chain there as well. This would include methods like GetDescription, SetOptions, GetOptions, AddName, and so on.

Again, most of the overlap isn’t actually in the breakpoint and watchpoint classes, but in the Options classes. After all, the setting of watchpoints and breakpoints are very different, but that they have conditions and callbacks are very similar. Keeping the setting part separate and sharing the reaction to hitting the stop points seems to more closely model the objects better.

* Review and clean the source docs

* Expand test coverage where that seems easy (perhaps there are tests hiding somewhere I didn't see?)

test/testcases/functionalities/breakpoint

There are lots of tests there, but feel free to add tests as you go.

I would try to break private APIs minimally if at all. SB api will not be broken.

This should go a long ways toward resolving problems 1, 2, 3, and 5.

Phase 2: Restructure / Modernize the Private API / Implementation
-----------------------------------------------------------------

* Change Error& out parameters to Expected<ReturnType>.

Again, this is a larger policy change. We shouldn’t make different sections of lldb handle errors differently.

* Get rid of all the const char* vars / args in favor of a better string type (which?)

StringRef’s seem to be the vogue these days.

* Prefer explicitly deleted copy ctor / assignments over multiline macro DISALLOW_COPY_AND_ASSIGN
* Try to reduce the (perhaps excessive) use of friendship between the support classes. For instance, is it necessary to give BreakpointLocation access to private data members from BreakpointLocationList, Process, BreakpointSite, and StopInfoBreakpoint? Wouldn't it be better to expand the functionality of those other classes?

A more significant change could be a rewrite of the callback functionality.

There are some problems with the way callbacks are implemented in terms of maintainability and extensibility. I think that using templates and simple inheritance we could shift to using callback objects instead of function pointers. I have a crude code sketch of this plan in the works now, and I will submit that if there is interest in this idea. Essentially, the idea is to let the user define their own Breakpoint or Watchpoint (or whatever) child class which overrides a pure virtual run method from a parent StoppointCallback templated class. The template parameter of StoppointCallback would allow us to define a std::unique_ptr<UserData> member where the user can hang any data they want for the callback. That saves us from void pointers (which I find very awkward).

template <class UserData>
class StoppointCallback {
private:
   std::unique_ptr<UserData> m_data;
// ...
};

I also think that such a decision requires significant thought before we pull that trigger. It does constitute a significant addition to the SB api, tho I don't think it would break the SB api per se since we can always use overloading and template specialization to keep the old functionality. On the other hand, ABI compatibility may be a problem. I don't know that much about SWIG or the needs of LLDB from a ABI stability perspective, so please speak up if I'm suggesting something crazy.

We’ve been pretty strict about not requiring subclassing or specialization of SB API’s classes to use them. We’re serious about maintaining binary compatibility across lldb versions.

That said, it is somewhat difficult to keep the API consistent between Breakpoint and Watchpoint using function pointers; the signature associated with each will differ (e.g. Watchpoint should provide old and new values of the subject variable, while that does not apply to Breakpoint at all). I welcome any suggestions which allow us to avoid logic duplication. My goto just happens to be templates here.

I think you would call back into the watchpoint to get the old and new values. I don’t think you would need more than the frame passed into the watchpoint callback, and the location.

The last time I looked I didn't see a good way to do that. I may be wrong tho, so I will check again.

Doing even a subset of phase 2 would go a long ways toward fixing problem 4.

Phase 3: Expanded Feature Set
-----------------------------

I would love to see these features:

* Watch variables from a function in every scope of that function OR in only select invocations. Perhaps this already exists, but I can only get it to watch from a particular scope.
* Better (read functional) remote GDB server break / watch. As has been disused in other threads, this is not really easy but is very desirable. For instance, I would very much like to see lldb play nice with valgrind's vgdb server as it offers some amazing break / watch functionality. As of today I need to do a great many hacks to make that work properly.

Not sure what you mean by this.

Problems discussed here (http://lists.llvm.org/pipermail/lldb-dev/2016-September/011203.html). I have seen quite a few crashes when talking to vgdb but I have not been able to take them apart well enough to even file a bug report.

* Finishpoints. This just seems obvious. (problem 6)

This is an overlap of functionality with stepping, finish and step-until. We’d have to sketch out some use cases to see which functionality seems more natural, and if both, how to keep them from conflicting.

That is a good point.

* GDB style tracepoints. This may be very difficult but it seems very desirable.

That is a pretty big problem, since we would have to implement both the server & client sides.

That is why I put it as the last step. I think it would be great, but may be a more long term issue.

That was my first thought as well. Still, I personally try to avoid macros. On the other hand that one is simple enough that it may be justified.

That is the way I would approach it. The other half of this is StopInfoBreakpoint vrs. StopInfoWatchpoint classes. These aren’t going to be exactly equivalent so you will probably have to find some way express what needs to be different, but those differences should be pretty small. The Breakpoint side of this works pretty well, so I’d start from there.

Jim

FWIW LLVM removed all it’s disallow copy / assign macros in favor of explicitly writing it. I agree it’s easier to just change the macro, but I would just do what LLVM does as long as you don’t mind the extra work.

Doing it everywhere would be a public service IMO. I don’t like macros either.

Sean

Why? The macro states the intent explicitly, rather than having to deduce it from details scattered through the class definition.

Jim

The issue I have with the DISALLOW_ macro is that when you’re looking to see what sort of constructors etc. are possible, you now have to look through a macro. Personally, I like to see what constructors are available on an object in one list, and not have to guess about whether e.g. a move constructor is present or disallowed.

Sean

Folks on the outside of the SB API’s need to be able to pass callbacks in. We don’t currently have any templates you need to instantiate or classes you need to override to go from outside the SB API to inside it. So whatever happens on the lldb_private side, for the SB API’s we’ll probably have to stick with void * & function pointers.

That is fair. I may be able to avoid the entire problem with llvm::function_ref<>. Will look into it.

I do not think we want the SB API’s to require llvm header files.

Jim

One solution I’ve seen (I think boost does this as well) is to make a utility class called noncopyable that you can privately inherit from:

class noncopyable {
public:
noncopyable(const noncopyable &) = delete;
noncopyable &operator=(const noncopyable &other) = delete;
};

class Foo : private noncopyable {
};

That solves the problem of excising the macros, but it replaces it with now having to look at a superclass. Honestly that sounds like six one way, a half dozen the other.
I wouldn’t want this great overall list of improvements to get sidetracked over bike shedding this, however. I’m fine with whatever approach is used, even if it’s the status quo.

Sean

I have been giving study to the private and public classes for break/watchpoints and I have some ideas for how to improve them. I am looking for comments from the community on this now to avoid wasted work and invite suggestion for improvements and any objections you may have.

Some problems with things as they stand:

1. The Watchpoint and SBWatchpoint classes share very little implementation with the Breakpoint and SBBreakpoint classes. Both Breakpoint and Watchpoint inherit from Stoppoint, but that class provides very little functionality and not much of an interface. This is both a waste and a potential hazard (in the sense that we run the risk of allowing breakpoints and watchpoints to evolve in parallel thus complicating the API and making things more difficult to merge later).
2. The SBWatchpoint class lacks feature parity with SBBreakpoint (e.g. no callback implementation in SBWatchpoint).

It would be great to add any needed features internally (lldb_private::*) and in the public API (lldb::*).

3. The Breakpoint class has grown an ecosystem of support classes which are unnecessarily specific to it. For example, I don't see a huge reason to maintain a totally distinct implementation for BreakpointList and WatchpointList.

Internally we can use virtual functions and have the list be a list of some shared base class, that is fine. I don't believe we export the lists across the public API so that shouldn't be a problem.

4. All of these classes are "old school" (not necessarily bad, but potentially a problem). For example:
a. lots of const char* running around.

Feel free to use llvm::StringRef internally (lldb_private namespace), but not in the public API. Anything in the "lldb" namespace can't change as we maintain backward compatibility. The main gist in the public API:
- no inheritance
- fixed ivars that never change that are either one of: std::shared_ptr, std::unique_ptr, or raw pointer.
- no virtual functions
- can't remove functions, you can add, but not remove or change. If you need to change, just add an overloaded version with different args.

b. DISALLOW_COPY_AND_ASSIGN(BreakpointEventData) to make ctors and such private rather than using ctor() = delete (which provides better compiler errors)

Change the DISALLOW_COPY_AND_ASSIGN macro. Done.

c. use of Error& args in function signatures as opposed to Expected<ReturnType>.

Not sure I would tackle this. It would need to happen everywhere and would need to not lose any lldb_private::Error functionality.

d. callback implementation uses function pointers (an ever confusing topic, especially for new programmers) where I think we could use templated methods (or just a parent Callback class) to significant effect.

Internally fine, but the public API needs a simple function pointer. I don't see the need to require templates.

5. Confusing or simply wrong documentation (e.g. Breakpoint::ResolveBreakpointInModules has arguments marked as @param[in] when the description of those arguments clearly indicates that they will be the output of the function).

Feel free to fix.

6. We simply don't have a Finishpoint class (triggers callback at the conclusion of the function or scope)

My plan:

I would like to refactor this in a few phases.

Phase 1: Low hanging fruit
--------------------------

* Kick as much functionality as is reasonable from Breakpoint up to Stoppoint. At a minimum I would expand the Stoppoint API to include features which should logically be available to Breakpoint, Watchpoint, a hypothetical Finishpoint, and any future *point class. I would also try to refactor the Breakpoint support classes (e.g. BreakpointID) to derive from a new class (i.e. StoppointID) and kick functionality up the chain there as well. This would include methods like GetDescription, SetOptions, GetOptions, AddName, and so on.

fine

* Review and clean the source docs

fine

* Expand test coverage where that seems easy (perhaps there are tests hiding somewhere I didn't see?)

fine

I would try to break private APIs minimally if at all. SB api will not be broken.

No worries, improve the internals as much as you want as long as the SB APIs do not change.

This should go a long ways toward resolving problems 1, 2, 3, and 5.

Phase 2: Restructure / Modernize the Private API / Implementation
-----------------------------------------------------------------

* Change Error& out parameters to Expected<ReturnType>.

I am fine trying this out on a few classes like Watchpoint internally to see how we like it.

* Get rid of all the const char* vars / args in favor of a better string type (which?)

StringRef is fine as long as the ownership is never assumed to transfer, so these are perfect for arguments, but not for storing a reference to a string that is passed in as an argument inside an instance variable of a class. So "const char *" in the public API, and llvm::StringRef internally is fine.

* Prefer explicitly deleted copy ctor / assignments over multiline macro DISALLOW_COPY_AND_ASSIGN

Just change the macro. Do that quickly as a separate change list if you want as we will OK that right away.

* Try to reduce the (perhaps excessive) use of friendship between the support classes. For instance, is it necessary to give BreakpointLocation access to private data members from BreakpointLocationList, Process, BreakpointSite, and StopInfoBreakpoint? Wouldn't it be better to expand the functionality of those other classes?

Feel free to give us patches, we can review these on a case by case basis.

A more significant change could be a rewrite of the callback functionality.

There are some problems with the way callbacks are implemented in terms of maintainability and extensibility. I think that using templates and simple inheritance we could shift to using callback objects instead of function pointers. I have a crude code sketch of this plan in the works now, and I will submit that if there is interest in this idea. Essentially, the idea is to let the user define their own Breakpoint or Watchpoint (or whatever) child class which overrides a pure virtual run method from a parent StoppointCallback templated class. The template parameter of StoppointCallback would allow us to define a std::unique_ptr<UserData> member where the user can hang any data they want for the callback. That saves us from void pointers (which I find very awkward).

template <class UserData>
class StoppointCallback {
private:
  std::unique_ptr<UserData> m_data;
// ...
};

I would switch to use std::function internally so we can use lambdas. I don't see the need to templatize anything since you can do almost anything with function objects and lambdas, both which work in place of a std::function. No need to invent anything. We didn't have C++11 and C++14 back when this was first coded, so we didn't use it back then. Public APIs will need to use standard function callbacks. So "std::*" or "llvm::*" in the public API.

I also think that such a decision requires significant thought before we pull that trigger. It does constitute a significant addition to the SB api, tho I don't think it would break the SB api per se since we can always use overloading and template specialization to keep the old functionality. On the other hand, ABI compatibility may be a problem. I don't know that much about SWIG or the needs of LLDB from a ABI stability perspective, so please speak up if I'm suggesting something crazy.

No templates in the public API please.

That said, it is somewhat difficult to keep the API consistent between Breakpoint and Watchpoint using function pointers; the signature associated with each will differ (e.g. Watchpoint should provide old and new values of the subject variable, while that does not apply to Breakpoint at all). I welcome any suggestions which allow us to avoid logic duplication. My goto just happens to be templates here.

Not sure why they are better. Again, use std::function internally will buy us labmdas, function pointers, and function object support, and in the public API we must keep it simple and use function pointers.

Doing even a subset of phase 2 would go a long ways toward fixing problem 4.

Phase 3: Expanded Feature Set
-----------------------------

I would love to see these features:

* Watch variables from a function in every scope of that function OR in only select invocations. Perhaps this already exists, but I can only get it to watch from a particular scope.

You would run out of watchpoints so quickly these would probably not be very useful. Most things we use have 2 - 4 watchpoints at most. If you watched every instance it would quickly run out. What should happen when it runs out of resources? Just stop?

* Better (read functional) remote GDB server break / watch. As has been disused in other threads, this is not really easy but is very desirable. For instance, I would very much like to see lldb play nice with valgrind's vgdb server as it offers some amazing break / watch functionality. As of today I need to do a great many hacks to make that work properly.

If something does support infinite watchpoints, we should be able to take advantage of that and your case above could just work.

* Finishpoints. This just seems obvious. (problem 6)

no problems with adding this once we have other support.

* GDB style tracepoints. This may be very difficult but it seems very desirable.

I don't know these, but I know Jim does.

Watchpoints are in need of some serious work, so I am happy to see someone willing to dive in. I think you have a pretty good idea of where we stand now, let me know what you think of my comments.

Greg Clayton

That is fine for internal classes.

llvm::function_ref is preferable to std::function. It’s smaller and faster while still allowing lambdas, function pointers, and function objects