Advice on debugging DSP and Harvard architectures

Hi folks,

I have been researching using lldb and a custom gdbserver stub to debug some of our processors. I have already played with ArchSpec.h/.cpp/Elf.h to add a processor definition, and since our tools output DWARF information, have already managed to use lldb to dump line tables.

On the gdbserver side of things I'm managing to get register read and writes to work, but I fear an issue may arise in memory reads, since the DSPs Harvard architecture dictates separate address spaces. Therefore, when we attempt to read 1) code memory to disassemble, and 2) data memory (for variables decode etc.) the stub will receive an 'm' request but interpretation of the address field is ambiguous, as it could refer to either the CODE or DATA bus.

It seems that the commonly adopted approach so far (i.e. with gdb) is to produce a larger single address space by adding an offset to the memory address and arranging for the stub to interpret the presence/absence of the offset and act accordingly. (I have indeed read of this approach being employed for AVR processors). This technique is workable (provided the DSPs continue to have fairly small physical memories), but certainly has drawbacks i) changes to the debugger to add the offset, ii) increased packet size (e.g. all code read addresses having highest bit set), iii) increased compression/decompression due to RLE on the response.

So is the "add an offset" technique still the best way forward to solve this problem? How about adding a new request (along with a query request to interrogate the stub for support) for code reads - is this an option? (If so, I'd be happy to do the work...)

Another issue which I'm looking at, is that some of our DSPs have 24-bit bytes. (That is, a single data address reads back 24-bits of data). At this moment in time, I'm not altogether sure just how problematic this will be for lldb. I've looked into the g_core_definitions table, and I can't see an entry for this, (presumably it would either be a 1 or an 8, depending whether it's measured in host bytes, or bits). I assume that all the architectures in the table so far have 8-bit bytes. Is anyone else out there looking at using lldb to debug targets with non-8-bit bytes?

So, summarising, I'm wondering if anyone has any ideas/advice on the above questions, that is, using lldb on harvard architectures and on non-standard-byte-size architectures.

All comments welcome,
Matthew Gardiner

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.

So is the "add an offset" technique still the best way forward to solve this
problem? How about adding a new request (along with a query request to
interrogate the stub for support) for code reads - is this an option?
(If so, I'd be happy to do the work...)

If you do this, then you will probably have to add the knowledge of whether you
are reading a code memory or a data memory into the remote protocol layer and
its callers. I think it would be more invasive change.

One more issue that you may face is pointer residing in data memory
but pointing to entities in code memory. The dwarf attribute DW_AT_address_class
can be used to provide the required debug information. But you may need to
add/improve the address space handling in lldb.

So, summarising, I'm wondering if anyone has any ideas/advice on the above
questions, that is, using lldb on harvard architectures and on non-standard-
byte-size architectures.

Do you have a clang port for your dsp? I wonder if you would have issues with
expression parsing without clang knowing the details of your dsp. I may be
completely wrong though.

Regards,
Abid

Abid, Hafiz wrote:

So is the "add an offset" technique still the best way forward to solve this
problem? How about adding a new request (along with a query request to
interrogate the stub for support) for code reads - is this an option?
(If so, I'd be happy to do the work...)

If you do this, then you will probably have to add the knowledge of whether you
are reading a code memory or a data memory into the remote protocol layer and
its callers. I think it would be more invasive change.

One more issue that you may face is pointer residing in data memory
but pointing to entities in code memory. The dwarf attribute DW_AT_address_class
can be used to provide the required debug information. But you may need to
add/improve the address space handling in lldb.

Yes, the change would be invasive. I was assuming that all regular accesses for memory reads/writes would be targeting the DATA bus, except for those reads required for the disassembler, which would be CODE reads. However, your mention of a function pointer breaks this assumption somewhat.

So, summarising, I'm wondering if anyone has any ideas/advice on the above
questions, that is, using lldb on harvard architectures and on non-standard-
byte-size architectures.

Do you have a clang port for your dsp? I wonder if you would have issues with
expression parsing without clang knowing the details of your dsp. I may be
completely wrong though.

We don't have such a clang port as yet. There is some talk here of providing an LLVM backend for our dsp at some stage, though.

thanks
Matt

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.

Hi folks,

I have been researching using lldb and a custom gdbserver stub to debug some of our processors. I have already played with ArchSpec.h/.cpp/Elf.h to add a processor definition, and since our tools output DWARF information, have already managed to use lldb to dump line tables.

On the gdbserver side of things I'm managing to get register read and writes to work, but I fear an issue may arise in memory reads, since the DSPs Harvard architecture dictates separate address spaces. Therefore, when we attempt to read 1) code memory to disassemble, and 2) data memory (for variables decode etc.) the stub will receive an 'm' request but interpretation of the address field is ambiguous, as it could refer to either the CODE or DATA bus.

Addresses in LLDB are represented by

class lldb_private::Address {
    lldb::SectionWP m_section_wp; ///< The section for the address, can be NULL.
    std::atomic<lldb::addr_t> m_offset; ///< Offset into section if \a m_section_wp is valid...
}

The section class:

class lldb_private::Section {
    ObjectFile *m_obj_file; // The object file that data for this section should be read from
    lldb::SectionType m_type; // The type of this section
    lldb::SectionWP m_parent_wp; // Weak pointer to parent section
    ConstString m_name; // Name of this section
    lldb::addr_t m_file_addr; // The absolute file virtual address range of this section if m_parent == NULL,
                                        // offset from parent file virtual address if m_parent != NULL
    lldb::addr_t m_byte_size; // Size in bytes that this section will occupy in memory at runtime
    lldb::offset_t m_file_offset; // Object file offset (if any)
    lldb::offset_t m_file_size; // Object file size (can be smaller than m_byte_size for zero filled sections...)
    SectionList m_children; // Child sections
    bool m_fake:1, // If true, then this section only can contain the address if one of its
                                        // children contains an address. This allows for gaps between the children
                                        // that are contained in the address range for this section, but do not produce
                                        // hits unless the children contain the address.
                    m_encrypted:1, // Set to true if the contents are encrypted
                    m_thread_specific:1;// This section is thread specific

};

The section type "m_type" is one of:

    typedef enum SectionType
    {
        eSectionTypeInvalid,
        eSectionTypeCode,
        eSectionTypeContainer, // The section contains child sections
        eSectionTypeData,
        eSectionTypeDataCString, // Inlined C string data
        eSectionTypeDataCStringPointers, // Pointers to C string data
        eSectionTypeDataSymbolAddress, // Address of a symbol in the symbol table
        eSectionTypeData4,
        eSectionTypeData8,
        eSectionTypeData16,
        eSectionTypeDataPointers,
        eSectionTypeDebug,
        eSectionTypeZeroFill,
        eSectionTypeDataObjCMessageRefs, // Pointer to function pointer + selector
        eSectionTypeDataObjCCFStrings, // Objective C const CFString/NSString objects
        eSectionTypeDWARFDebugAbbrev,
        eSectionTypeDWARFDebugAranges,
        eSectionTypeDWARFDebugFrame,
        eSectionTypeDWARFDebugInfo,
        eSectionTypeDWARFDebugLine,
        eSectionTypeDWARFDebugLoc,
        eSectionTypeDWARFDebugMacInfo,
        eSectionTypeDWARFDebugPubNames,
        eSectionTypeDWARFDebugPubTypes,
        eSectionTypeDWARFDebugRanges,
        eSectionTypeDWARFDebugStr,
        eSectionTypeDWARFAppleNames,
        eSectionTypeDWARFAppleTypes,
        eSectionTypeDWARFAppleNamespaces,
        eSectionTypeDWARFAppleObjC,
        eSectionTypeELFSymbolTable, // Elf SHT_SYMTAB section
        eSectionTypeELFDynamicSymbols, // Elf SHT_DYNSYM section
        eSectionTypeELFRelocationEntries, // Elf SHT_REL or SHT_REL section
        eSectionTypeELFDynamicLinkInfo, // Elf SHT_DYNAMIC section
        eSectionTypeEHFrame,
        eSectionTypeOther
        
    } SectionType;

So we see we have eSectionTypeCode and eSectionTypeData.

This could be used to help make the correct reads if addresses fall within known address ranges that fall into sections within a binary. I am guessing that there are code and data reads that are not found within any sections from files right?

If so we would need to change all functions that take a "load address ("lldb::addr_t load_addr") into something that takes a load address + segment which should be a new struct type that like:

ResolvedAddress {
  lldb::addr_t addr;
  lldb::segment_t segment;
};

Then all things like Read/Write memory in the process would need to be switched over to use the ResolvedAddress.

The lldb_private::Address function that is:

    lldb::addr_t
    Address::GetLoadAddress (Target *target) const;

Would now need to be switched over to:

    ResolvedAddress
    Address::GetLoadAddress (Target *target) const;

We would need an invalid segment ID (like UINT32_MAX or UINT64_MAX) to indicate that there is no segment.

So all in all this would be quite a big fix that would involve a lot of the code.

It seems that the commonly adopted approach so far (i.e. with gdb) is to produce a larger single address space by adding an offset to the memory address and arranging for the stub to interpret the presence/absence of the offset and act accordingly. (I have indeed read of this approach being employed for AVR processors). This technique is workable (provided the DSPs continue to have fairly small physical memories), but certainly has drawbacks i) changes to the debugger to add the offset, ii) increased packet size (e.g. all code read addresses having highest bit set), iii) increased compression/decompression due to RLE on the response.

So is the "add an offset" technique still the best way forward to solve this problem? How about adding a new request (along with a query request to interrogate the stub for support) for code reads - is this an option? (If so, I'd be happy to do the work...)

Another issue which I'm looking at, is that some of our DSPs have 24-bit bytes. (That is, a single data address reads back 24-bits of data). At this moment in time, I'm not altogether sure just how problematic this will be for lldb. I've looked into the g_core_definitions table, and I can't see an entry for this, (presumably it would either be a 1 or an 8, depending whether it's measured in host bytes, or bits). I assume that all the architectures in the table so far have 8-bit bytes. Is anyone else out there looking at using lldb to debug targets with non-8-bit bytes?

This would be a big change. Not sure how other debuggers handle 24 bit bytes. It might be better to leave this as is and if you read 3 bytes of memory from your DSP, you get 9 bytes back.

If a variable for your DSP is referenced in DWARF, what does the byte size show? The actual size in 8 bit bytes, or the size in 24 bit bytes?

So, summarising, I'm wondering if anyone has any ideas/advice on the above questions, that is, using lldb on harvard architectures and on non-standard-byte-size architectures.

It would be great to enable support for these kinds of architectures in LLDB, and it will take some work, but we should be able to make it happen.

Greg Clayton wrote:

Addresses in LLDB are represented by

class lldb_private::Address {
     lldb::SectionWP m_section_wp; ///< The section for the address, can be NULL.
     std::atomic<lldb::addr_t> m_offset; ///< Offset into section if \a m_section_wp is valid...
}

The section class:

class lldb_private::Section {
     ObjectFile *m_obj_file; // The object file that data for this section should be read from
     lldb::SectionType m_type; // The type of this section
     lldb::SectionWP m_parent_wp; // Weak pointer to parent section
     ConstString m_name; // Name of this section
     lldb::addr_t m_file_addr; // The absolute file virtual address range of this section if m_parent == NULL,
                                         // offset from parent file virtual address if m_parent != NULL
     lldb::addr_t m_byte_size; // Size in bytes that this section will occupy in memory at runtime
     lldb::offset_t m_file_offset; // Object file offset (if any)
     lldb::offset_t m_file_size; // Object file size (can be smaller than m_byte_size for zero filled sections...)
     SectionList m_children; // Child sections
     bool m_fake:1, // If true, then this section only can contain the address if one of its
                                         // children contains an address. This allows for gaps between the children
                                         // that are contained in the address range for this section, but do not produce
                                         // hits unless the children contain the address.
                     m_encrypted:1, // Set to true if the contents are encrypted
                     m_thread_specific:1;// This section is thread specific

};

The section type "m_type" is one of:

     typedef enum SectionType
     {
         eSectionTypeInvalid,
         eSectionTypeCode,
         eSectionTypeContainer, // The section contains child sections
         eSectionTypeData,
         eSectionTypeDataCString, // Inlined C string data
         eSectionTypeDataCStringPointers, // Pointers to C string data
         eSectionTypeDataSymbolAddress, // Address of a symbol in the symbol table
         eSectionTypeData4,
         eSectionTypeData8,
         eSectionTypeData16,
         eSectionTypeDataPointers,
         eSectionTypeDebug,
         eSectionTypeZeroFill,
         eSectionTypeDataObjCMessageRefs, // Pointer to function pointer + selector
         eSectionTypeDataObjCCFStrings, // Objective C const CFString/NSString objects
         eSectionTypeDWARFDebugAbbrev,
         eSectionTypeDWARFDebugAranges,
         eSectionTypeDWARFDebugFrame,
         eSectionTypeDWARFDebugInfo,
         eSectionTypeDWARFDebugLine,
         eSectionTypeDWARFDebugLoc,
         eSectionTypeDWARFDebugMacInfo,
         eSectionTypeDWARFDebugPubNames,
         eSectionTypeDWARFDebugPubTypes,
         eSectionTypeDWARFDebugRanges,
         eSectionTypeDWARFDebugStr,
         eSectionTypeDWARFAppleNames,
         eSectionTypeDWARFAppleTypes,
         eSectionTypeDWARFAppleNamespaces,
         eSectionTypeDWARFAppleObjC,
         eSectionTypeELFSymbolTable, // Elf SHT_SYMTAB section
         eSectionTypeELFDynamicSymbols, // Elf SHT_DYNSYM section
         eSectionTypeELFRelocationEntries, // Elf SHT_REL or SHT_REL section
         eSectionTypeELFDynamicLinkInfo, // Elf SHT_DYNAMIC section
         eSectionTypeEHFrame,
         eSectionTypeOther
              } SectionType;

So we see we have eSectionTypeCode and eSectionTypeData.

This could be used to help make the correct reads if addresses fall within known address ranges that fall into sections within a binary.

Thanks for your help with this Greg. I am currently trying to understand the above structures. Probably take some time before I get it all clear in my head, though.

I am guessing that there are code and data reads that are not found within any sections from files right?

I can't comment on your above question just yet, since I'm concentrating figuring out how get a "disassemble" command (from lldb) to read from the correct bus on our devices.
We are concerned that disassembling always reads from the device (not from ELF), since:

1. we prefer to always read from the device for dis since it is easy then to spot if our users have chosen the wrong elf file.
2. we may try to debug without symbol files. This is a corner case however.
3. we may encounter self-modifying code.

As a quick check I did try debugging a native 64-bit linux process on linux, and when I invoked a simple disassemble from address command (e.g. di -s 0x4004f0 -c 10), I did observe that the target's memory is read:

#0 lldb_private::Process::ReadMemoryFromInferior
#1 lldb_private::MemoryCache::Read
#2 lldb_private::Process::ReadMemory
#3 .lldb_private::Target::ReadMemory
...
#5 lldb_private::Disassembler::Disassemble

(I'll try debugging using a remote target, shortly, for comparision...)

Whilst debugging, I did observe that in the parameter:
"const Address &start_address" of #5 lldb_private::Disassembler::Disassemble
that the m_section_wp data is 0x0. In your reply, do you suggest that I arrange that this data is populated with a valid section pointer whose m_type is eSectionTypeCode?

If so we would need to change all functions that take a "load address ("lldb::addr_t load_addr") into something that takes a load address + segment which should be a new struct type that like:

ResolvedAddress {
   lldb::addr_t addr;
   lldb::segment_t segment;
};

Then all things like Read/Write memory in the process would need to be switched over to use the ResolvedAddress.

The lldb_private::Address function that is:

     lldb::addr_t
     Address::GetLoadAddress (Target *target) const;

Would now need to be switched over to:

     ResolvedAddress
     Address::GetLoadAddress (Target *target) const;

We would need an invalid segment ID (like UINT32_MAX or UINT64_MAX) to indicate that there is no segment.

I couldn't find segment_t in my checkout. So I assume that you're floating this as an idea for me to try out :slight_smile: I could certainly give it a try with my working copy... and let you know how I get on.

Were you suggesting that the value of segment_t for our Harvard case would be hard-coded somewhere in our Target code, and if the m_section_wp of the Address object is a valid code section, then we'd pull out this constant?

So all in all this would be quite a big fix that would involve a lot of the code.

Indeed, but from my perspective probably a good way for me to learn more of the code-base.

This would be a big change. Not sure how other debuggers handle 24 bit bytes. It might be better to leave this as is and if you read 3 bytes of memory from your DSP, you get 9 bytes back.

If a variable for your DSP is referenced in DWARF, what does the byte size show? The actual size in 8 bit bytes, or the size in 24 bit bytes?

I'm not sure on this one, Greg. I'm leaving it for one of my colleagues to research this further, then get back.

It would be great to enable support for these kinds of architectures in LLDB, and it will take some work, but we should be able to make it happen.

Indeed. I'll keep you posted with my progress on the above.

Matt

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.

Greg Clayton wrote:

Addresses in LLDB are represented by

class lldb_private::Address {
    lldb::SectionWP m_section_wp; ///< The section for the address, can be NULL.
    std::atomic<lldb::addr_t> m_offset; ///< Offset into section if \a m_section_wp is valid...
}

The section class:

class lldb_private::Section {
    ObjectFile *m_obj_file; // The object file that data for this section should be read from
    lldb::SectionType m_type; // The type of this section
    lldb::SectionWP m_parent_wp; // Weak pointer to parent section
    ConstString m_name; // Name of this section
    lldb::addr_t m_file_addr; // The absolute file virtual address range of this section if m_parent == NULL,
                                        // offset from parent file virtual address if m_parent != NULL
    lldb::addr_t m_byte_size; // Size in bytes that this section will occupy in memory at runtime
    lldb::offset_t m_file_offset; // Object file offset (if any)
    lldb::offset_t m_file_size; // Object file size (can be smaller than m_byte_size for zero filled sections...)
    SectionList m_children; // Child sections
    bool m_fake:1, // If true, then this section only can contain the address if one of its
                                        // children contains an address. This allows for gaps between the children
                                        // that are contained in the address range for this section, but do not produce
                                        // hits unless the children contain the address.
                    m_encrypted:1, // Set to true if the contents are encrypted
                    m_thread_specific:1;// This section is thread specific

};

The section type "m_type" is one of:

    typedef enum SectionType
    {
        eSectionTypeInvalid,
        eSectionTypeCode,
        eSectionTypeContainer, // The section contains child sections
        eSectionTypeData,
        eSectionTypeDataCString, // Inlined C string data
        eSectionTypeDataCStringPointers, // Pointers to C string data
        eSectionTypeDataSymbolAddress, // Address of a symbol in the symbol table
        eSectionTypeData4,
        eSectionTypeData8,
        eSectionTypeData16,
        eSectionTypeDataPointers,
        eSectionTypeDebug,
        eSectionTypeZeroFill,
        eSectionTypeDataObjCMessageRefs, // Pointer to function pointer + selector
        eSectionTypeDataObjCCFStrings, // Objective C const CFString/NSString objects
        eSectionTypeDWARFDebugAbbrev,
        eSectionTypeDWARFDebugAranges,
        eSectionTypeDWARFDebugFrame,
        eSectionTypeDWARFDebugInfo,
        eSectionTypeDWARFDebugLine,
        eSectionTypeDWARFDebugLoc,
        eSectionTypeDWARFDebugMacInfo,
        eSectionTypeDWARFDebugPubNames,
        eSectionTypeDWARFDebugPubTypes,
        eSectionTypeDWARFDebugRanges,
        eSectionTypeDWARFDebugStr,
        eSectionTypeDWARFAppleNames,
        eSectionTypeDWARFAppleTypes,
        eSectionTypeDWARFAppleNamespaces,
        eSectionTypeDWARFAppleObjC,
        eSectionTypeELFSymbolTable, // Elf SHT_SYMTAB section
        eSectionTypeELFDynamicSymbols, // Elf SHT_DYNSYM section
        eSectionTypeELFRelocationEntries, // Elf SHT_REL or SHT_REL section
        eSectionTypeELFDynamicLinkInfo, // Elf SHT_DYNAMIC section
        eSectionTypeEHFrame,
        eSectionTypeOther
             } SectionType;

So we see we have eSectionTypeCode and eSectionTypeData.

This could be used to help make the correct reads if addresses fall within known address ranges that fall into sections within a binary.

Thanks for your help with this Greg. I am currently trying to understand the above structures. Probably take some time before I get it all clear in my head, though.

I am guessing that there are code and data reads that are not found within any sections from files right?

I can't comment on your above question just yet, since I'm concentrating figuring out how get a "disassemble" command (from lldb) to read from the correct bus on our devices.
We are concerned that disassembling always reads from the device (not from ELF), since:

1. we prefer to always read from the device for dis since it is easy then to spot if our users have chosen the wrong elf file.
2. we may try to debug without symbol files. This is a corner case however.
3. we may encounter self-modifying code.

As a quick check I did try debugging a native 64-bit linux process on linux, and when I invoked a simple disassemble from address command (e.g. di -s 0x4004f0 -c 10), I did observe that the target's memory is read:

#0 lldb_private::Process::ReadMemoryFromInferior
#1 lldb_private::MemoryCache::Read
#2 lldb_private::Process::ReadMemory
#3 .lldb_private::Target::ReadMemory
...
#5 lldb_private::Disassembler::Disassemble

You are correct, we always use the memory from the device because relocations might have been performed on data and code references.

(I'll try debugging using a remote target, shortly, for comparision...)

Whilst debugging, I did observe that in the parameter:
"const Address &start_address" of #5 lldb_private::Disassembler::Disassemble
that the m_section_wp data is 0x0. In your reply, do you suggest that I arrange that this data is populated with a valid section pointer whose m_type is eSectionTypeCode?

No, some addresses will resolve to a section that is "eSectionTypeCode" + offset, but others might not resolve this way. Kind of like a variable, as a global variable, will exist in a section whose type is eSectionTypeData and it will have an offset, but a lot of data, like anything on the stack on heap, won't resolve to a section + offset.

So it is probably safe to say that your data might be on the stack or heap and in that case you can't resolve in a lldb_private::Address. In this case it will have no section and it will have an absolute offset which is the address itself.

If so we would need to change all functions that take a "load address ("lldb::addr_t load_addr") into something that takes a load address + segment which should be a new struct type that like:

ResolvedAddress {
  lldb::addr_t addr;
  lldb::segment_t segment;
};

Then all things like Read/Write memory in the process would need to be switched over to use the ResolvedAddress.

The lldb_private::Address function that is:

    lldb::addr_t
    Address::GetLoadAddress (Target *target) const;

Would now need to be switched over to:

    ResolvedAddress
    Address::GetLoadAddress (Target *target) const;

We would need an invalid segment ID (like UINT32_MAX or UINT64_MAX) to indicate that there is no segment.

I couldn't find segment_t in my checkout. So I assume that you're floating this as an idea for me to try out :slight_smile:

Yes. segment_t would be a uint32_t or a uint64_t. A uint64_t is probably best in case a segment identifier on a system is actually a pointer to a segment structure.

I could certainly give it a try with my working copy... and let you know how I get on.

Were you suggesting that the value of segment_t for our Harvard case would be hard-coded somewhere in our Target code, and if the m_section_wp of the Address object is a valid code section, then we'd pull out this constant?

If you have an address that was in a code or data section you could just use the lldb_private::Address as is, but when we are asked to resolve it into a ResolvedAddress, you would lookup the lldb::segment_t for code or data and return it:

ResolvedAddress
Address::GetLoadAddress (Target *target) const
{
    ResolvedAddress resolved_addr; // Initialize with invalid value
    SectionSP section_sp (GetSection()); SectionSP section_sp (GetSection());
    if (section_sp)
    {
        if (target)
        {
            addr_t sect_load_addr = section_sp->GetLoadBaseAddress (target);

            if (sect_load_addr != LLDB_INVALID_ADDRESS)
            {
    resolved_addr.addr = sect_load_addr + m_offset;
                // new function for section which knows its segment based off of the section type
                resolved_addr.segment = section_sp->GetSegment();
            }
        }
    }
    else if (!SectionWasDeletedPrivate())
    {
        // We don't have a section so the offset is the load address
        resolved_addr.addr = m_offset;
        // Given a raw address how could be ever determine the right segment???
        resolved_addr.segment = ???;
    }
    return resolved_addr;
}

Notice above the "resolved_addr.segment = section_sp->GetSegment();"

This would be a new function that you would add to lldb_private::Section. As you build your sections you can probably set the segment ID correctly.

The big problem is in the "else if()" clause, there is no way to take a raw address and set its segment correctly. And this is the biggest drawback of the current attempted solution. There is not a 1 to 1 mapping from a "load" address to a section + offset address. This poses a huge problem for debuggers.

The only way to solve this would be to replace all "lldb::addr_t" in all of the sources, which is defined currently as:

namespace lldb
{
    typedef uint64_t addr_t;
}

To be:

namespace lldb
{
    typedef struct addr_t {
        uint64_t addr;
        uint64_t segment;
    }
}

Now everywhere that used to take or return a lldb::addr_t would return this new struct.

So all in all this would be quite a big fix that would involve a lot of the code.

Indeed, but from my perspective probably a good way for me to learn more of the code-base.

Yes, so probably the best way is to replace lldb::addr_t with the struct I showed above and add all sorts of operators (+, -, +=, -=, <, <=, etc) to the struct to it can behave just like an integer when needed.

This would be a big change. Not sure how other debuggers handle 24 bit bytes. It might be better to leave this as is and if you read 3 bytes of memory from your DSP, you get 9 bytes back.

If a variable for your DSP is referenced in DWARF, what does the byte size show? The actual size in 8 bit bytes, or the size in 24 bit bytes?

I'm not sure on this one, Greg. I'm leaving it for one of my colleagues to research this further, then get back.

It would be great to enable support for these kinds of architectures in LLDB, and it will take some work, but we should be able to make it happen.

Indeed. I'll keep you posted with my progress on the above.

Yes, I would start with redefining lldb::addr_t to the struct and get that compiling and passing the test suite. All instances of lldb::addr_t would always contain an invalid segment ID and thus all memory read/write calls would do what they do now.

Then you start to try and get your DSP debugger to start resolving addresses correctly with the right segment ID. I believe the GDB remote protocol has memory read/write packets that can take a segment ID.

Greg

Greg:

Thanks for all your code suggestions. I can't comment on them just now as I'm still single-stepping the existing code and trying to think it all through.

lldb-dev:

I can't help but think though that I'm going have problems in getting the lldb disassembler to work against a traditional harvard architecture. The kind of architecture I'm describing is one in which code and data have completely separate address spaces, so address 0 on the code bus is different than address 0 on the data bus.

So somewhere in my debugserver, I need to either invoke:

device_read_dm(buffer, address, length)
or
device_read_pm(buffer, address, length)

Where dm means "data bus" and pm means "program/code bus".

The problem, in my mind, is in this interface:
"disassemble --start-address <addr> --end-address <addr>"

(Since for a unified address space model e.g. when debugging an intel x86, there is no need to disambiguate between code and data).

In my company's current debugger, the request to disassemble, *always* results in a request to read the code bus. However, were I to port lldb to debug our architectures, this approach itself is not ideal, since in a generic debugger, though, I imagine that whilst for the most part the developer would want to disassemble real running code, there exists a corner case, (e.g. they are working on an interpreter) where they may want to disassemble from a piece of data (where they previously copied some code).

So I think I am stuck here. How do you see disassemble working in this scenario? Should a disassemble command always expect to read from memory originally mapped in from a code section? Or is that definition too restrictive?

More thoughts reveal that we'd have a similar issue with the "memory read" commands as here there is too no distinction between code and data.

So it seems that the best way for me to debug our architectures with lldb is to form a unified 64-bit address space (our chips currently have 16-bit, 24-bit, 32-bit address spaces) and to set the top bit for code bus access.

Therefore if one of our users wants to disassemble from the code bus, they'd say
(lldb) di -s 0x80000000004004f0

but for data they'd say:
(lldb) di -s 0x4004f0

The following challenges then arise:

1. when the user disassembles using a function_name the address discovered from the ELF file would then need to set the correct bit before making the memory access.
2. when the PC is read from the chip, it would have to have this bit set, before it's value is presented to the user, or to a memory read function, to be consistent.
(3. I'm also imagining some issues affecting stack unwinds too, since the return address of a frame, read from the stack will, of course, require the offset to be applied prior to disassembling this frame.)

An internet search revealed similar issues when producing a debugger for AVR processors using gdb and eclipse:

http://avr-eclipse.sourceforge.net/wiki/index.php/Debugging#Harvard_Architecture

A similar solution was applied in this case, but this time the offset was applied to data addresses.

With the kind of issues/challenges I outlined above I think I may need to make some big changes in lldb. I'm wondering whether there is a convenient pre-defined abstraction layer. At first I thought that the "Target" class would be the right place to subclass, but it does not have the "pure virtuals" that I would expect to see. So I looked at the "Process" layer, which has the expected "pure virtuals", but unfortunately having both

class ProcessPOSIX : public Process
and
class ProcessGDBRemote : public Process

here, makes me think that either 1) this is the wrong place or 2) that the exact current positioning of ProcessGDBRemote in the hierarchy is wrong.

If you/anyone-in-the-list have any more input into my dilemma, I'd greatly appreciate your thoughts.

thanks
Matt

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.

Matthew Gardiner wrote:

Therefore if one of our users wants to disassemble from the code bus, they'd say
(lldb) di -s 0x80000000004004f0

Actually a colleague suggested that rather than adding an offset/setting a bit, we could augment the address with a decorator, e.g. "c"

(lldb) di -s 0x4004f0:c

and this would denote a code bus read. This is perhaps similar to the offset approach, but it does give a more abstract approach to addressing, and could possibly fit in with Greg's previous proposal of:

ResolvedAddress {
   lldb::addr_t addr;
   lldb::segment_t segment;
};

whereupon "4004f0:c" is parsed into meaning "this address is the addr is the code segment/bus".

Matt

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.