DWARF .debug_aranges data objects and address spaces

Hello

I've been looking at a debuginfo issue on an out-of-tree target which uses
DWARF aranges.

The problem is that aranges are generated for both data and code objects, and
the debugger gets confused when program addresses overlap data addresses. The
target is a Harvard Architecture CPU, so the appearance of overlapping address
ranges is not in itself a bug as they reside in different address spaces.

During my investigations, I found that:

    - gcc appears to never generate an entry in the `.debug_aranges` table for
      data objects. I did a cursory read over gcc's source and history and it is
      my understanding that aranges are deliberately only emitted for text and
      cold text sections[1].
    - However, the DWARF v5 specification[2] for `.debug_aranges` does not suggest
      that aranges should only be for text address and the wording
      strongly suggests that their use is general:

          6.1.2:
          > This header is followed by a variable number of address range descriptors.
          > Each descriptor is a triple consisting of a segment selector, the
          > beginning address within that segment of a range of text or data covered
          > by some entry owned by the corresponding compilation unit, followed by the
          > non-zero length of that range

      As such llvm is doing nothing generally wrong by emitting aranges for data
      objects.

    - llvm unconditionally sets the `.debug_aranges.segment_selector_size` to
      zero[3]. GCC does this too. I think this is a bug if the target can have
      overlapping ranges due to multiple code/data address spaces as in my case
      of a Harvard machine.

As far as I can tell, the only upstream backend that is of a similar
configuration is AVR. I can reproduce the same `.debug_aranges` table as my
target with the following simple example:

    $ clang -target avr -mmcu=attiny104 -S -o - -g -gdwarf-aranges -xc - <<'EOF'
    char char_array[16383] = {0};
    int main() {
      return char_array[0];
    }
    EOF
    # ...
    .section .debug_aranges,"",@progbits
    .long 20 ; Length of ARange Set
    .short 2 ; DWARF Arange version number
    .long .Lcu_begin0 ; Offset Into Debug Info Section
    .byte 2 ; Address Size (in bytes)
    .byte 0 ; Segment Size (in bytes)
    .short my_array
    .short .Lsec_end0-my_array
    .short .Lfunc_begin0
    .short .Lsec_end1-.Lfunc_begin0
    .short 0 ; ARange terminator

...but I cannot see documentation anywhere on what a consumer is expected to do
with such information, and how *in general* multiple address spaces are expected
to work for llvm and gcc when generating DWARF aranges when there is no segment
selector in the tuple.

A cursory grep of lldb shows that the segment size is set from the
`.debug_aranges` header, but never checked. If it *is* nonzero, lldb will silently
read incorrect data and possibly crash. I have provided a patch on the lldb
mailing list[5]. My patch brings lldb in-line with gdb which throws an error in
case of a nonzero segment selector size[6].

My question is: Should LLVM have some logic to emit `segment_selector_size != 0`
for targets without a flat address space? Alternative formation: do we need to
limit the emission of arange info for only code objects 1) only in non-flat
address-space case or 2) for all targets unconditionally?

My intuition is that we should limit emission of aranges to objects in the main
text section. Neither GDB nor LLDB handle aranges for targets without flat
address spaces, and significant work might be needed in downstream DWARF
consumers. The usefulness of address ranges for data objects is not
something obvious to me as the uses of this section in DWARF consumers
seeems to mostly be PC-lookup.

Any insight would be appreciated. I can likely provide patches if we conclude
that changes are needed in LLVM.

All the Best

Luke

[1] GCC only emits aranges for text:
    https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637
[2] DWARF Debugging Information Format Version 5; 6.1. http://dwarfstd.org/Dwarf5Std.php
[3] LLVM segment selector size is always zero: https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749
[4] GCC segment selector size is always zero:
    https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624
[5] lldb patch to gracefully error on nonzero segment selector size: https://reviews.llvm.org/D75925
[6] GDB implementation of [5]:
    https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779

If you only want code addresses, why not use the CU’s low_pc/high_pc/ranges - those are guaranteed to be only code addresses, I think?

In the common case, for most targets LLVM supports I think you're right,
but for my case, regrettably, not. Because my target is a Harvard
Architecture, any code address can have the same ordinal value as any
data address: the code and data reside on different buses so the whole
4GiB space is available to both code, and data. `DW_AT_low_pc` and
`DW_AT_high_pc` can be used to find the range of the code segment, but
given an arbitrary address, cannot be used to conclusively determine
whether that address belongs to code or data when both segments contain
addresses in that numeric range.

All the Best

Luke

Sorry I’m not following, partly probably due to my not having worked with such machines before.

But how are the code addresses and data addresses differentiated then (eg: if you had segment selectors in debug_aranges, how would they be used? The addresses taken from the system at runtime have some kind of segment selector associated with them, that you can then use to match with the addr+segment selector in aranges?).

Actually, coming at it from a different angle: It sounds like in the original email you’re suggesting if debug_aranges did not contain data addresses, this would be good/sufficient for you? So somehow you’d be ensuring you only query debug_aranges using things you know are code addresses, not data addresses? So why would the same solution/approach not hold to querying low/high/ranges on a CU that’s already guaranteed not to contain data addresses?

> > If you only want code addresses, why not use the CU's
> > low_pc/high_pc/ranges
> > - those are guaranteed to be only code addresses, I think?
> >
> In the common case, for most targets LLVM supports I think you're right,
> but for my case, regrettably, not. Because my target is a Harvard
> Architecture, any code address can have the same ordinal value as any
> data address: the code and data reside on different buses so the whole
> 4GiB space is available to both code, and data. `DW_AT_low_pc` and
> `DW_AT_high_pc` can be used to find the range of the code segment, but
> given an arbitrary address, cannot be used to conclusively determine
> whether that address belongs to code or data when both segments contain
> addresses in that numeric range.

Sorry I'm not following, partly probably due to my not having worked
with
such machines before.

But how are the code addresses and data addresses differentiated then
(eg:
if you had segment selectors in debug_aranges, how would they be used?
The
addresses taken from the system at runtime have some kind of segment
selector associated with them, that you can then use to match with the
addr+segment selector in aranges?).

Yes. This. The system mostly provides us the ability to disambiguate
addresses because the device's simulator / debugger make this
unambiguous, but the current .debug_aranges does not allow us to do this
because it's missing such info.

Actually, coming at it from a different angle: It sounds like in the
original email you're suggesting if debug_aranges did not contain data
addresses, this would be good/sufficient for you? So somehow you'd be
ensuring you only query debug_aranges using things you know are code
addresses, not data addresses? So why would the same solution/approach
not
hold to querying low/high/ranges on a CU that's already guaranteed not
to
contain data addresses?

That's the root of the issue: the .debug_aranges section emitted by llvm
*does* contain data addresses by default and therefore can be ambiguous.
I've worked around this locally by hacking llvm to only emit aranges for
text objects, but I was wandering if it's something that's valuable to
fix upstream. My guess is that it's probably too niche to worry about
for the moment, but if there's interest I can propose a design (probably
a target hook to ask if segment selectors are required and how to get
their number from an object).

Thanks for your help

Luke

If you only want code addresses, why not use the CU’s
low_pc/high_pc/ranges

  • those are guaranteed to be only code addresses, I think?

In the common case, for most targets LLVM supports I think you’re right,
but for my case, regrettably, not. Because my target is a Harvard
Architecture, any code address can have the same ordinal value as any
data address: the code and data reside on different buses so the whole
4GiB space is available to both code, and data. DW_AT_low_pc and
DW_AT_high_pc can be used to find the range of the code segment, but
given an arbitrary address, cannot be used to conclusively determine
whether that address belongs to code or data when both segments contain
addresses in that numeric range.

Sorry I’m not following, partly probably due to my not having worked
with
such machines before.

But how are the code addresses and data addresses differentiated then
(eg:
if you had segment selectors in debug_aranges, how would they be used?
The
addresses taken from the system at runtime have some kind of segment
selector associated with them, that you can then use to match with the
addr+segment selector in aranges?).
Yes. This. The system mostly provides us the ability to disambiguate
addresses because the device’s simulator / debugger make this
unambiguous, but the current .debug_aranges does not allow us to do this
because it’s missing such info.

Actually, coming at it from a different angle: It sounds like in the
original email you’re suggesting if debug_aranges did not contain data
addresses, this would be good/sufficient for you? So somehow you’d be
ensuring you only query debug_aranges using things you know are code
addresses, not data addresses? So why would the same solution/approach
not
hold to querying low/high/ranges on a CU that’s already guaranteed not
to
contain data addresses?
That’s the root of the issue: the .debug_aranges section emitted by llvm
does contain data addresses by default and therefore can be ambiguous.
I’ve worked around this locally by hacking llvm to only emit aranges for
text objects,

Sorry, but I’m still not understanding why “aranges for only text objects” is more usable for your use case than “high/low/ranges on the CU”? Could you help me understand how those are different in your situation?

but I was wandering if it’s something that’s valuable to
fix upstream. My guess is that it’s probably too niche to worry about
for the moment, but if there’s interest I can propose a design (probably
a target hook to ask if segment selectors are required and how to get
their number from an object).

Added a few debug info folks in case they’ve got opinions. I don’t really mind if we removed data objects from debug_aranges, though as you say, it’s arguably correct/maybe useful as-is. Supporting it properly - probably using address segment selectors would be fine too, I guess AVR uses address spaces for its pointers to differentiate data and code addresses? In which case we could encode the LLVM address space as the segment selector (& probably would need to query the target to decide if it has non-zero address spaces and use that to decide whether to use segment selectors in debug_aranges)

But in general, I’m mostly just discouraging people from using aranges - the data is duplicated in the CU’s ranges anyway (there’s some small caveats there - a producer doesn’t /have/ to produce ranges on the CU, but I’d just say lower performance on such DWARF would be acceptable) & makes object files/executables larger for minimal value/mostly duplicate data.

  • Dave

I’ve encountered this kind of architecture before, a long time ago (academically). In a flat-address-space machine such as X64, there is still an instruction/data distinction, but usually only down at the level of I-cache versus D-cache (instruction fetch versus data fetch). A Harvard architecture machine exposes that to the programmer, which effectively doubles the available address space. Code and data live in different address spaces, although the address space identifier per se is not explicit. A Move instruction would implicitly use the data address space, while an indirect Branch would implicitly target the code address space. An OS running on a Harvard architecture would require the loader to be privileged, so it can map data from an object file into the code address space and implement any necessary fixups. Self-modifying code is at least wicked hard if not impossible to achieve.

In DWARF this would indeed be described by a segment selector. It’s up to the target ABI to specify what the segment selector numbers actually are. For a Harvard architecture machine this is pretty trivial, you say something like 0 for code and 1 for data. Boom done.

LLVM basically doesn’t have targets like this, or at least it has never come up before that I’m aware of. So, when we emit DWARF, we assume a flat address space (unconditionally setting the segment selector size to zero), and llvm-dwarfdump will choke (hopefully cleanly, but still) on an object file that uses DWARF segment selectors.

The point of .debug_aranges is to accelerate the search for the appropriate CU. Yes you can spend time trolling through .debug_info and .debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs (or perhaps pointers to .debug_ranges) and effectively rebuild a .debug_aranges section yourself, if the compiler/linker isn’t kind enough to pre-build the table for you. I don’t understand why .debug_aranges should be discouraged; I shouldn’t think they would be huge, and consumers can avoid loading lots of data just to figure out what’s not worth looking at. Forcing all consumers to do things the slow way seems unnecessarily inefficient.

Thinking about Harvard architecture specifically, you need the segment selector only when an address could be ambiguous about whether it’s a code or data address. This basically comes up only in .debug_aranges, he said thinking about it for about 30 seconds. Within .debug_info you don’t need it because when you pick up the address of an entity, you know whether it’s for a code or data entity. Location lists and range lists always point to code. For .debug_aranges you would need the segment selector, but I think that’s the only place.

For an architecture with multiple code or data segments, then you’d need the segment selector more widely, but I should think this case wouldn’t be all that difficult to make work. Even factoring in the llvm-dwarfdump part, it has to understand the selector only for the .debug_aranges section; everything else can remain as it is, pretending there’s a flat address space.

Now, if your target is downstream, that would make upstreaming the LLVM support a bit dicier, because we’d not want to have that feature in the upstream repo if there are no targets using it. You’d be left maintaining that patch on your own. But as I described above, I don’t think it would be a huge deal.

HTH,

–paulr

I’ve encountered this kind of architecture before, a long time ago (academically). In a flat-address-space machine such as X64, there is still an instruction/data distinction, but usually only down at the level of I-cache versus D-cache (instruction fetch versus data fetch). A Harvard architecture machine exposes that to the programmer, which effectively doubles the available address space. Code and data live in different address spaces, although the address space identifier per se is not explicit. A Move instruction would implicitly use the data address space, while an indirect Branch would implicitly target the code address space. An OS running on a Harvard architecture would require the loader to be privileged, so it can map data from an object file into the code address space and implement any necessary fixups. Self-modifying code is at least wicked hard if not impossible to achieve.

In DWARF this would indeed be described by a segment selector. It’s up to the target ABI to specify what the segment selector numbers actually are. For a Harvard architecture machine this is pretty trivial, you say something like 0 for code and 1 for data. Boom done.

LLVM basically doesn’t have targets like this, or at least it has never come up before that I’m aware of. So, when we emit DWARF, we assume a flat address space (unconditionally setting the segment selector size to zero), and llvm-dwarfdump will choke (hopefully cleanly, but still) on an object file that uses DWARF segment selectors.

FWIW Luke mentioned in the original email the AVR in-tree backend seems to have this problem with an ambiguous debug_aranges entries.

The point of .debug_aranges is to accelerate the search for the appropriate CU. Yes you can spend time trolling through .debug_info and .debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs (or perhaps pointers to .debug_ranges) and effectively rebuild a .debug_aranges section yourself, if the compiler/linker isn’t kind enough to pre-build the table for you. I don’t understand why .debug_aranges should be discouraged; I shouldn’t think they would be huge, and consumers can avoid loading lots of data just to figure out what’s not worth looking at. Forcing all consumers to do things the slow way seems unnecessarily inefficient.

If the producer has put ranges on the CU it’s not a lot of work - it’s parsing one DIE & looking for a couple of attributes. With Split DWARF the cost of becomes a bit more prominent - Sema.o from clang, with split dwarf (v4 or v5 about the same) is about 3.5% larger with debug aranges (not sure about the overall data). It’s enough at least at Google for us to not use them & use CU ranges for the same purpose.

I thought I might be able to find some email history about why we turned it off by default, but seems we never turned it /on/ by default to begin with & it wasn’t implemented until relatively late in the game (well, what I think as relatively late - after I started on the project at least).

I’m not across most of this debug info stuff but I’ll stomp in here to confirm that AVR is a Harvard architecture, with separate addressing for the data and program buses via specialized instructions which will load from either one, or the other, but never both.

It makes sense that this particular problem would also affect AVR - the backend does have some issues with debug info generation.

With AVR being affected, upstreaming a patch to put segment selectors into .debug_aranges becomes completely reasonable. There would likely want to be a target hook somewhere to return a value saying what size to use, with the default implementation returning zero.

If the producer has put ranges on the CU it’s not a lot of work - it’s parsing one DIE & looking for a couple of attributes.

It’s walking through all the CUs, picking up the associated abbrevs, trolling down the list of attributes… “not a lot” indeed, but not as trivial as running through a single section linearly, which is what .debug_aranges gets you. I’ve been lectured by @clayborg on what consumers really want for performance gains.

It’s enough at least at Google for us to not use them & use CU ranges for the same purpose.

Google is much more seriously concerned about debug-info size than about debugger performance, IIRC. This is not universally the preferred tradeoff. Just sayin’.

–paulr

With AVR being affected, upstreaming a patch to put segment selectors into .debug_aranges becomes completely reasonable. There would likely want to be a target hook somewhere to return a value saying what size to use, with the default implementation returning zero.

nod something along those lines

If the producer has put ranges on the CU it’s not a lot of work - it’s parsing one DIE & looking for a couple of attributes.

It’s walking through all the CUs, picking up the associated abbrevs, trolling down the list of attributes… “not a lot” indeed, but not as trivial as running through a single section linearly, which is what .debug_aranges gets you. I’ve been lectured by @clayborg on what consumers really want for performance gains.

Sure enough - though I don’t believe aranges is used by default on any target/platform LLVM supports, so this time/space tradeoff doesn’t seem to have been important to any of them?

It’s enough at least at Google for us to not use them & use CU ranges for the same purpose.

Google is much more seriously concerned about debug-info size than about debugger performance, IIRC. This is not universally the preferred tradeoff. Just sayin’.

Sure enough.

I’ve just had a couple of people ask about aranges recently (~year or so) & when pressing a little further, using the CU’s address ranges turned out to be sufficient for their needs without having to change Clang’s defaults or have their users specify extra flags to explicitly request them, etc.

Out of curiosity/for data/usage/etc - does Sony use aranges? (changing the default when targeting SCE or the like)

  • Dave

SCE tuning does turn on the .debug_aranges section. Our debugger team really cares about startup cost. Turnaround time in general is huge for our licensees, to the point where we support edit-and-continue (minimal rebuild, live-patch the running process).

–paulr

SCE tuning does turn on the .debug_aranges section. Our debugger team really cares about startup cost. Turnaround time in general is huge for our licensees, to the point where we support edit-and-continue (minimal rebuild, live-patch the running process).

Ah, good to know! I’d be curious to know about the performance tradeoff when they’re disabled if you ever happen to have data around that.
I guess a related question: Does SCE use the non-.text entries (or otherwise have an opinion on having them) in debug_aranges?

Oh, and yeah - I’m all for turnaround time, though different situations put the costs for that in different places - for Google a distributed build means file sizes are important because they delay sending content between builders and from the builders down to the developers machine.

I don’t know to what extent our debugger cares about non-.text entries. I can ask but those guys are slammed right now.

We care about debug-info size to the extent it can improve build (esp. link) times. I don’t have hard info about how our processes actually work, but I know we are smart about what sections get downloaded to the test console, and clearly we try to be smart about loading debug sections by the debugger. This suggests to me that our tools are tuned to optimize remote/slow access to the image files, while Google’s tactic of copying the entire image to a developer’s machine before even getting started is premised on debugging tools that aren’t really remote-filesystem-aware, with the traditional assumption of local/fast access to the image files. This is all me speculating, I don’t actually know anything & am likely wrong about the reasoning, but it fits the info I have.

–paulr

I don’t know to what extent our debugger cares about non-.text entries. I can ask but those guys are slammed right now.

Yeah, nothing vital - just curiosity.

We care about debug-info size to the extent it can improve build (esp. link) times. I don’t have hard info about how our processes actually work, but I know we are smart about what sections get downloaded to the test console, and clearly we try to be smart about loading debug sections by the debugger. This suggests to me that our tools are tuned to optimize remote/slow access to the image files, while Google’s tactic of copying the entire image to a developer’s machine before even getting started is premised on debugging tools that aren’t really remote-filesystem-aware, with the traditional assumption of local/fast access to the image files.

Certainly part of it - debugging hasn’t been a high priority/common use case at Google for a long time (some changes recently, but not a massive shift in priorities/usage), so it’s mostly about making adding debug info as unobtrusive as possible (as little overhead to compilation/linking), but without a lot of investment in/interference with the existing build/test/debugging tools. We also aren’t usually dealing with a need to remote debug, so downloading the whole executable to the local machine for the debugger to run is fairly normal/reasonable - though split DWARF keeps most of the debug info out on the slow remote file system to be pulled in as needed. The last tricky one we might tackle after we tranistion to DWARFv5 might be the index sections (debug_names, gnu_pubnames, gdb_index, whatever you might call it/whatever we end up using) - it currently needs to be in the objects/linked executable (so the debugger doesn’t have to pull down all the .dwo files to create such an index) but that adds a fair bit to the linker inputs, etc - so might end up with a 3rd intermediate file (.o, .dwo, .dwi(ndex)) or only a 3rd final output file (send the .o files to two different actions - a link action that ignores/skips the index sections, and an indexing sections that ignores everything other than the index sections).

Does that mean putting the selector *only* into debug_aranges (and not
debug_line, debug_frame, etc.)?

Even though they are not really needed if the target only ever has one
code address space, it seems somewhat odd to have different values for
segment_selector_size in different sections.

In the DWARF spec these are described as "... containing the size in
bytes of a segment selector on the _target system_". I would interpret
the "target system" portion of that as meaning that the segment selector
size is a property of a target, and hence, it should be consistent
across all relevant sections.

pl

From: Pavel Labath <pavel@labath.sk>
Sent: Tuesday, March 17, 2020 3:02 AM
To: David Blaikie <dblaikie@gmail.com>; Robinson, Paul
<paul.robinson@sony.com>
Cc: llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] DWARF .debug_aranges data objects and address
spaces

>
> With AVR being affected, upstreaming a patch to put segment
> selectors into .debug_aranges becomes completely reasonable. There
> would likely want to be a target hook somewhere to return a value
> saying what size to use, with the default implementation returning
zero.
>
>
> *nod* something along those lines
>

Does that mean putting the selector *only* into debug_aranges (and not
debug_line, debug_frame, etc.)?

That was my thought, yes. It's the only section where there is no other
context to determine whether a raw address is for code or for data.

Even though they are not really needed if the target only ever has one
code address space, it seems somewhat odd to have different values for
segment_selector_size in different sections.

In the DWARF spec these are described as "... containing the size in
bytes of a segment selector on the _target system_". I would interpret
the "target system" portion of that as meaning that the segment selector
size is a property of a target, and hence, it should be consistent
across all relevant sections.

For a target with actual segments (like 80x86) the selector would always
have to be present.

For a Harvard target there is no explicit selector in the machine code,
and a strict reading of the DWARF spec would require the segement selector
size to be zero in all cases; but that leaves us where we are today, with
.debug_aranges being impossible to interpret correctly.

IMO, having a segment selector in .debug_aranges and nowhere else, for a
Harvard architecture, falls within the "permissive" aspect of DWARF. It
solves an actual problem using what is IMO a reasonable interpretation of
the existing DWARF feature set. If the AVR (+other Harvard-like) targets
prefer, I wouldn't stop them from adding a segment selector to all DWARF
sections, but it seems like a waste of space in other sections.

I'd be happy to propose a DWARF wiki item or even a non-normative bit of
text in the spec, to codify this. It would affect consumers that target
a Harvard architecture, but they have to contend with this somehow in any
case.
--paulr

Thanks for sharing your thoughts Paul. I agree that putting the unused
segment selectors everywhere is silly, but I also think that different
selector sizes in different sections could be confusing/surprising -- I
find it about equally surprising as using different values of
address_size in different sections. That too could potentially be useful
(is e.g. the code address space is significantly smaller than the data
address space) -- but it would definitely raise some eyebrows.

I think that some bit of text to make this more "official" be useful.

pl