DWARF .debug_aranges data objects and address spaces

Hi David, Paul

> I’ve encountered this kind of architecture before, a long time ago
> (academically). In a flat-address-space machine such as X64, there is
> still an instruction/data distinction, but usually only down at the level
> of I-cache versus D-cache (instruction fetch versus data fetch). A Harvard
> architecture machine exposes that to the programmer, which effectively
> doubles the available address space. Code and data live in different
> address spaces, although the address space identifier per se is not
> explicit. A Move instruction would implicitly use the data address space,
> while an indirect Branch would implicitly target the code address space.
> An OS running on a Harvard architecture would require the loader to be
> privileged, so it can map data from an object file into the code address
> space and implement any necessary fixups. Self-modifying code is at least
> wicked hard if not impossible to achieve.

Paul: On our target it's impossible to have self-modifying code. The loader is
the ROM flasher and the target jumps to a fixed address on core reset.
There are no special addressing modes that would allow a code section to
be overwritten.
Everything you say above is consistent with my experience on this
target.

>
>
>
> In DWARF this would indeed be described by a segment selector. It’s up to
> the target ABI to specify what the segment selector numbers actually are.
> For a Harvard architecture machine this is pretty trivial, you say
> something like 0 for code and 1 for data. Boom done.
>

Paul: I think I need to have a discussion with the people implementing
the debugger and work out the ABI a bit more formally.

>
>
> LLVM basically doesn’t have targets like this, or at least it has never
> come up before that I’m aware of. So, when we emit DWARF, we assume a flat
> address space (unconditionally setting the segment selector size to zero),
> and llvm-dwarfdump will choke (hopefully cleanly, but still) on an object
> file that uses DWARF segment selectors.
>

FWIW Luke mentioned in the original email the AVR in-tree backend seems
to
have this problem with an ambiguous debug_aranges entries.

David: I'm not sure what's expected for AVR, but it's the only target I
can see that looks like a Harvard machine. The default seems to be to
not emit arange information so I think if this is a bug, it's only in
very specific circumstances.

> The point of .debug_aranges is to accelerate the search for the
> appropriate CU. Yes you can spend time trolling through .debug_info and
> .debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs (or
> perhaps pointers to .debug_ranges) and effectively rebuild a .debug_aranges
> section yourself, if the compiler/linker isn’t kind enough to pre-build the
> table for you. I don’t understand why .debug_aranges should be
> discouraged; I shouldn’t think they would be huge, and consumers can avoid
> loading lots of data just to figure out what’s not worth looking at.
> Forcing all consumers to do things the slow way seems unnecessarily
> inefficient.
>
> If the producer has put ranges on the CU it's not a lot of work - it's

In my mind keeping debug_aranges is definitely a win for my usecases,
and as long as the linker is doing the right thing is a useful
optimization.

parsing one DIE & looking for a couple of attributes. With Split DWARF
the
cost of becomes a bit more prominent - Sema.o from clang, with split
dwarf
(v4 or v5 about the same) is about 3.5% larger with debug aranges (not
sure
about the overall data). It's enough at least at Google for us to not
use
them & use CU ranges for the same purpose.

I thought I might be able to find some email history about why we turned
it
off by default, but seems we never turned it /on/ by default to begin
with
& it wasn't implemented until relatively late in the game (well, what I
think as relatively late - after I started on the project at least).

I'm generally of the opinion that if you're hulking around *any*
debuginfo, you should include as much useful context as possible - and
because of that any exra info the debugger can use to shorten the parse
time is helpful. This is of course assuming I never have to debug
binaries that are on production machines (a clear distinction between
debug and release builds); reducing size makes sense
there.

> Thinking about Harvard architecture specifically, you **need** the
> segment selector only when an address could be ambiguous about whether it’s
> a code or data address. This basically comes up **only** in
> .debug_aranges, he said thinking about it for about 30 seconds. Within
> .debug_info you don’t need it because when you pick up the address of an
> entity, you know whether it’s for a code or data entity. Location lists
> and range lists always point to code. For .debug_aranges you would need
> the segment selector, but I think that’s the only place.
>

Agreed.

>
>
> For an architecture with multiple code or data segments, then you’d need
> the segment selector more widely, but I should think this case wouldn’t be
> all that difficult to make work. Even factoring in the llvm-dwarfdump
> part, it has to understand the selector only for the .debug_aranges
> section; everything else can remain as it is, pretending there’s a flat
> address space.
>
>
>
> Now, if your target is downstream, that would make upstreaming the LLVM
> support a bit dicier, because we’d not want to have that feature in the
> upstream repo if there are no targets using it. You’d be left maintaining
> that patch on your own. But as I described above, I don’t think it would
> be a huge deal.
>

I think I agree, and for the moment maintaining a patch downstream is
perfectly fine. If and when LLVM gets an upstream target that needs to
use multiple segments, we can probably contribute that support.

>
>
> HTH,

This discussion has helped me greatly. Thanks all for your advice.

(hmm, unfortunately this broke gmail (& maybe other?) threading in some way :/)

Hi David, Paul

I’ve encountered this kind of architecture before, a long time ago
(academically). In a flat-address-space machine such as X64, there is
still an instruction/data distinction, but usually only down at the level
of I-cache versus D-cache (instruction fetch versus data fetch). A Harvard
architecture machine exposes that to the programmer, which effectively
doubles the available address space. Code and data live in different
address spaces, although the address space identifier per se is not
explicit. A Move instruction would implicitly use the data address space,
while an indirect Branch would implicitly target the code address space.
An OS running on a Harvard architecture would require the loader to be
privileged, so it can map data from an object file into the code address
space and implement any necessary fixups. Self-modifying code is at least
wicked hard if not impossible to achieve.

Paul: On our target it’s impossible to have self-modifying code. The loader is
the ROM flasher and the target jumps to a fixed address on core reset.
There are no special addressing modes that would allow a code section to
be overwritten.
Everything you say above is consistent with my experience on this
target.

In DWARF this would indeed be described by a segment selector. It’s up to
the target ABI to specify what the segment selector numbers actually are.
For a Harvard architecture machine this is pretty trivial, you say
something like 0 for code and 1 for data. Boom done.

Paul: I think I need to have a discussion with the people implementing
the debugger and work out the ABI a bit more formally.

LLVM basically doesn’t have targets like this, or at least it has never
come up before that I’m aware of. So, when we emit DWARF, we assume a flat
address space (unconditionally setting the segment selector size to zero),
and llvm-dwarfdump will choke (hopefully cleanly, but still) on an object
file that uses DWARF segment selectors.

FWIW Luke mentioned in the original email the AVR in-tree backend seems
to
have this problem with an ambiguous debug_aranges entries.

David: I’m not sure what’s expected for AVR, but it’s the only target I
can see that looks like a Harvard machine. The default seems to be to
not emit arange information so I think if this is a bug, it’s only in
very specific circumstances.

The point of .debug_aranges is to accelerate the search for the
appropriate CU. Yes you can spend time trolling through .debug_info and
.debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs (or
perhaps pointers to .debug_ranges) and effectively rebuild a .debug_aranges
section yourself, if the compiler/linker isn’t kind enough to pre-build the
table for you. I don’t understand why .debug_aranges should be
discouraged; I shouldn’t think they would be huge, and consumers can avoid
loading lots of data just to figure out what’s not worth looking at.
Forcing all consumers to do things the slow way seems unnecessarily
inefficient.

If the producer has put ranges on the CU it’s not a lot of work - it’s

In my mind keeping debug_aranges is definitely a win for my usecases,
and as long as the linker is doing the right thing is a useful
optimization.

I’d be curious to know how much of an advantage it is, compared to using CU ranges.

parsing one DIE & looking for a couple of attributes. With Split DWARF
the
cost of becomes a bit more prominent - Sema.o from clang, with split
dwarf
(v4 or v5 about the same) is about 3.5% larger with debug aranges (not
sure
about the overall data). It’s enough at least at Google for us to not
use
them & use CU ranges for the same purpose.

I thought I might be able to find some email history about why we turned
it
off by default, but seems we never turned it /on/ by default to begin
with
& it wasn’t implemented until relatively late in the game (well, what I
think as relatively late - after I started on the project at least).

I’m generally of the opinion that if you’re hulking around any
debuginfo, you should include as much useful context as possible - and
because of that any exra info the debugger can use to shorten the parse
time is helpful. This is of course assuming I never have to debug
binaries that are on production machines (a clear distinction between
debug and release builds); reducing size makes sense
there.

Even just for link times - reading/writing more bytes does slow things down - so some data about how much the extra data improves debugger load times/costs link time/etc would be interesting to me at least.