Object identities in the LLDB's C++ API

LLDB’s C++ API deals with SBxxx objects, most of which are PIMPL-style wrappers around an opaque pointer to the internal implementation. These SBxxx objects act as handles and are passed/returned by value, which is generally convenient, except for the situations where one would need to keep track of object identities, ex. using them as keys in associative containers.

As far as I can tell, there’s a bit of inconsistency in the current state:

  1. Some types, ex. SBThread, SBTarget, SBSection, … offer ==, != that map directly to the corresponding operator on the opaque pointer (good!), but:
    … there are no ordering operators, nor obvious ways to hash the objects

  2. SBModule offer the == , != operators, but:
    … the implementations for == and != are not exactly forwarded to the corresponding operator on the opaque pointer (1)

  3. Things like SBFrame offer IsEqual() in addition to ==, !=, creating a bit of confusion

  4. Other types (ex. SBProcess, SBSymbol, SBBlock) don’t offer any kind of comparison operations.

IMO it would be nice to have a consistent “handle type” semantics regarding identity, ordering and hashing. I can see the following options:

  1. Expose the opaque ptr as an opaque handle()
  • this is an easy, quick and convenient solution for many SBxxx types but it may not work for all
  1. Design and implement a consistent, first class identity/ordering/hashing for all the SBxxx types
  • perhaps the most elegant and flexible approach, but also the most work

Any thoughts on this? Did I miss anything fundamental here?

Thanks,
Lemo.

(1) example of operator== from SBModule:

LLDB’s C++ API deals with SBxxx objects, most of which are PIMPL-style wrappers around an opaque pointer to the internal implementation. These SBxxx objects act as handles and are passed/returned by value, which is generally convenient, except for the situations where one would need to keep track of object identities, ex. using them as keys in associative containers.

As far as I can tell, there’s a bit of inconsistency in the current state:

  1. Some types, ex. SBThread, SBTarget, SBSection, … offer ==, != that map directly to the corresponding operator on the opaque pointer (good!), but:
    … there are no ordering operators, nor obvious ways to hash the objects

  2. SBModule offer the == , != operators, but:
    … the implementations for == and != are not exactly forwarded to the corresponding operator on the opaque pointer (1)

  3. Things like SBFrame offer IsEqual() in addition to ==, !=, creating a bit of confusion

  4. Other types (ex. SBProcess, SBSymbol, SBBlock) don’t offer any kind of comparison operations.

IMO it would be nice to have a consistent “handle type” semantics regarding identity, ordering and hashing. I can see the following options:

  1. Expose the opaque ptr as an opaque handle()
  • this is an easy, quick and convenient solution for many SBxxx types but it may not work for all

That would be nice, but that won’t always work with how LLDB is currently coded for SBFrame and possibly SBThread. These objects will be problems as they can come and go and the underlying object isn’t always the same even through they lock onto the same logical object. SBThread and SBFrame have “lldb::ExecutionContextRefSP m_opaque_sp” members. The execution context reference is a class that contains weak pointers to the lldb_private::Thread and lldb_private::StackFrame objects, but it also contains the thread ID and frame ID so it can reconstitute the value lldb_private::Thread and lldb_private::StackFrame even if the weak pointer isn’t valid. So the opaque handle will work for many objects but not all.

  1. Design and implement a consistent, first class identity/ordering/hashing for all the SBxxx types
  • perhaps the most elegant and flexible approach, but also the most work

I would be fine with adding new members to classes we know we want to hash and order, like by adding:

uint32_t SB*::GetHash();
bool SB*::operator==(const SB*& ohs);

bool SB*::operator<(const SB*& ohs);

Would those be enough?

Any thoughts on this? Did I miss anything fundamental here?

Thanks,
Lemo.

(1) example of operator== from SBModule:

bool SBModule::operator==(const SBModule &rhs) const {

if (m_opaque_sp)

return m_opaque_sp.get() == rhs.m_opaque_sp.get();

return false;

}

So I would leave this up to the SB classes so that we can change the implementation if needed so I would prefer to add methods to each SB object for hashing and comparison.

Greg Clayton

Thanks Greg,

  1. Expose the opaque ptr as an opaque handle()
  • this is an easy, quick and convenient solution for many SBxxx types but it may not work for all

That would be nice, but that won’t always work with how LLDB is currently coded for SBFrame and possibly SBThread. These objects will be problems as they can come and go and the underlying object isn’t always the same even through they lock onto the same logical object. SBThread and SBFrame have “lldb::ExecutionContextRefSP m_opaque_sp” members. The execution context reference is a class that contains weak pointers to the lldb_private::Thread and lldb_private::StackFrame objects, but it also contains the thread ID and frame ID so it can reconstitute the value lldb_private::Thread and lldb_private::StackFrame even if the weak pointer isn’t valid. So the opaque handle will work for many objects but not all.

Indeed. One, relatively small but interesting benefit of the opaque handle type is that it opens the possibility of generic “handle maps” (I’ll elaborate below)

  1. Design and implement a consistent, first class identity/ordering/hashing for all the SBxxx types
  • perhaps the most elegant and flexible approach, but also the most work

I would be fine with adding new members to classes we know we want to hash and order, like by adding:
uint32_t SB*::GetHash();
bool SB*::operator==(const SB*& ohs);
bool SB*::operator<(const SB*& ohs);
Would those be enough?

I think so. If we use the standard containers as reference, technically we only need operator< to satisfy the Compare concept. (also, a small nit - size_t would be a better type for the hash value). Also, both the hashing and the compare can be implemented as non-member functions (or even specializing std::hash, std::less for SBxxx types). A few minor concerns:

a. if we keep things like SBModule::operator==() unchanged, it’s not going to be the same as the equiv(a, b) for the case where a and b have null opaque pointers (not sure if this breaks anything, but I wouldn’t want to be the first to debug a case where this matter)
b. defining just the minimum set of operations may be technically enough but it may look a bit weird to have a type define < but none of the other relational operators.
c. if some of the hash/compare implementation end up going through multiple layers (the execution context with thread, frame IDs example) the performance characteristics can be unpredictable, right?

For context, the use case that brought this to my attention is managing a set of data structures that contain custom data associated with modules, frames, etc. It’s easy to create, let’s say a MyModule from a SBModule, but if later on I get the module for a particular frame, SBFrame::GetModule() will return a SBModule, which I would like to map to the corresponding MyModule instance. Logically this would require a SBModule → MyModule map. The standard associative containers (map or unordered_map) would make this trivial if SBxxx types satisfy the key requirements.

Another option for maintaining such a mapping, suggested by Mark Mentovai, is to use provision for an “user data” tag associated with every SBxxx object (this tag can simply be a void*, maybe wrapped with type safe accessors). This would be extremely convenient for the API users (since they don’t have to worry about maintaining any maps themselves) but implementing it would hit the same complications around the synthesized instances (like SBFrame) and it may carry a small price - one pointer per SBxxx instance even if this facility is not used. I personally like this approach and in this particular case it has the additional benefit of being additive (we can graft it on with minimal risk of breaking existing stuff), although it still seems nice to have consistent identity semantics for the SBxxx types.

Reviving this old thread and +Joshua Peraza who’s also interested in this.

It would be easy to add the equality or comparison and other operators to any items that don’t have them. I am not a fan of adding user data. It can’t live be in the SBXXX object itself, because we can’t change the member variables in the objects for API backward compat issues. If it is inside the object itself, and we have only 1 user data “void *” per object, I can see many clients wanting to use this user data. I would rather people construct their own maps that don’t live in the object, but that gets back to adding hashing to the items. So I would vote to allow the items to be hashed where needed. I am not sure we need hashing for SBFrame or SBThread since they have really well defined IDs in their APIs. SBFrame has the index ID and CFA (call frame address or SP at start of the function). SBThread has the index ID (a unique number that starts at 1 and increments as new threads come along). But if you feel we need them it is fine to add them to any objects that need it. I always use the file path for the SBModule as the key for any maps when I use SBModule objects in a map. Let me know what you think after these comments

Adding a hashes where we need them SGTM.

The embedded Python interpreter and the IDE or command line that are running LLDB are two I can think of. Also if the baton lives in the lldb_private::* objects, then we can have internal clients as well as clients through the API.

I think that's a reasonable approach. We can decide on the best hash and equality functions on a case-by-case basis.