Are overlapping ELF sections problematic?

I'm working with an embedded platform that segregates memory between
executable code, RAM, and constant values. The three kinds occupy
three separate address spaces, accessed by specific instructions (e.g.
"load from RAM address #0" vs "load from constant ROM address #0")
with fairly small ranges for literal address values. So necessarily
all three address spaces all start at zero.

We're using the LLVM toolchain with ELF32 files, mapping the three
spaces as.text, .data, and .crom sections, with a linker script
setting the address for all three sections to zero and so producing a
non-relocatable executable image (the .text section becomes a ROM for
an embedded device so final addresses are required). To support
debugging with LLDB (where the GDB server protocol presumes a single
flat memory space) the sections are mapped to address ranges in a
larger space (using the top two bits) and the debugger stub of the
platform then demuxes the memory accesses to the appropriate address
spaces).

Until recently this was done by loading the ELF file in LLDB, e.g:
"target modules load --file test.elf .data 0 .crom 0x40000000 .text
0x80000000". However the changes introduced through
https://reviews.llvm.org/D55998 removed support for overlapping
sections, with a remark "I don't anticipate running into this
situation in the real world. However, if we do run into it, and the
current behavior is not suitable for some reason, we can implement
this logic differently."

Our immediate coping strategy was implementing the remapping in the
file parsing of ObjectFileELF, but this LLDB change makes us
apprehensive that we may start encountering similar issues elsewhere
in the LLVM tooling. Are ELF sections with overlapping addresses so
rare (or even actually invalid) that ongoing support will be fragile?

Hi Thomas,

I can't say what's the situation in the rest of llvm, but right now lldb has zero test coverage for the flow you are using, so the fact that this has worked until now was pretty much an accident.

The reason I chose to disallow the overlapping sections in the patch you quote was because it was very hard to say what will be the meaning of this to the upper layers of lldb. For instance, a lot things in lldb work with "file addresses" (that is, virtual address, as they are known in the file, without any remapping). This means that the overlapping sections become ambiguous even though you have remapped them to non-overlapping "load addresses" with the "target modules load" command. For instance, the result of a query like "SectionList::FindSectionContainingFileAddress(lldb::addr_t)" would depend on how exactly was the search algorithm implemented.

I believe that a long term solution here would be to introduce some concept of address spaces to lldb. Then these queries would no longer be ambiguous as the function FindSectionContainingFileAddress would (presumably) take an additional address-space identifier as an argument. I know this is what some downstream users are doing to make things like this work. However, this is a fairly invasive change, so doing something like this upstream would require a lot of previous discussion.

In the mean time, I believe you can just patch out the part which drops the overlapping sections from the section list and get behavior which was more-or-less identical to the old one. However, I can't guarantee that nothing else will break in this scenario. I also wouldn't be opposed to making some change to this logic upstream too, if we can come up with some consistent story as to what exactly this means.

regards,
pl

Hi Pavel, Thomas,

Just a note that this topic is repeating now and then. It'd be nice to have a concept at least. We can go with an additional argument, or enhance addr_t, or enhance Address, or create a new type for it. So, some sort of discussion that would clarify the concept a little bit is welcome, I think.

Best regards.

Hi Pavel

I can't say what's the situation in the rest of llvm, but right now lldb
has zero test coverage for the flow you are using, so the fact that this
has worked until now was pretty much an accident.

It was a pleasant surprise that it worked at all, since flat memory
maps have become near-ubiquitous. But it's good to at least know that
the conceptual ice hasn't become any thinner through the patch, i.e.
it refines the existing state rather than reflecting a more explicit
policy change.

In the mean time, I believe you can just patch out the part which drops
the overlapping sections from the section list and get behavior which
was more-or-less identical to the old one.

I think this also requires reverting the use of the IntervalMap as the
VM address container, since that relies upon non-overlapping
intervals? That smells like a bigger fork than I would want like to
keep indefinitely alive.

I believe that a long term solution here would be to introduce some
concept of address spaces to lldb. Then these queries would no longer be
ambiguous as the function FindSectionContainingFileAddress would
(presumably) take an additional address-space identifier as an argument.
I know this is what some downstream users are doing to make things like
this work. However, this is a fairly invasive change, so doing something
like this upstream would require a lot of previous discussion.

Would this also extend the GDB remote protocol, where the single flat
address space seems the only current option? (at least the common
solution in various GDB discussions of DSP targets is address muxing
of the sort we're using)

I imagine such changes are hampered by the lack of in-tree targets
that require them, both to motivate the change and to keep it testable
(the recent "removing magic numbers assuming 8-bit bytes" discussion
in llvm-dev features the same issue). Previously Embecosm was
attempting to upstream a LLVM target for its demonstration AAP
architecture (features multiple address spaces), e.g.
http://lists.llvm.org/pipermail/llvm-dev/2017-February/109776.html .
However their public forks on GitHub only reveal GDB support rather
than LLDB, and that implementation is by an address mux.

Unfortunately the architecture I'm working with is (yet another) poor
candidate for upstreaming, since it lacks general availability, but
hopefully one of the exotic architectures lurking in the LLVM shadows
someday steps forth with a commitment to keep it alive in-tree.

Cheers,
Tom

Hi Zdenek

In an ideal world LLVM and LLDB would support a common approach for
address spaces. Currently our LLVM backend doesn't yet support address
spaces anyway, e.g. access to a variable declared as constant data:

const int my_val __attribute__((section (".crom"))) = { 42 };

is only possible from assembler code. Since current code already
features a blend of C and assembler this limitation is cumbersome
rather than catastrophic, but of course we expect to add proper
lowering for such addresses, so we're certainly interested in this
domain.

The work already done in coupling LLDB more closely to LLVM is obvious
(e.g. migrating from duplicated utility code) but the backends still
seem a little disjoint, e.g. some targets specify the same
architectural attributes such as registers in both projects. It would
be nice if a new (?) feature like address spaces was added in a way
that minimised redundancy.

Cheers,
Tom

Hi Pavel

I can't say what's the situation in the rest of llvm, but right now lldb
has zero test coverage for the flow you are using, so the fact that this
has worked until now was pretty much an accident.

It was a pleasant surprise that it worked at all, since flat memory
maps have become near-ubiquitous. But it's good to at least know that
the conceptual ice hasn't become any thinner through the patch, i.e.
it refines the existing state rather than reflecting a more explicit
policy change.

Yes, I didn't mean to make anything drastic with this patch. However, I would say that independently of this patch, in the past few years, lldb has gotten more strict in accepting features/fixes which don't have test coverage and/or are useful in only some peculiar downstream use case (see removal of ocaml/go/java language support, etc.)..

In the mean time, I believe you can just patch out the part which drops
the overlapping sections from the section list and get behavior which
was more-or-less identical to the old one.

I think this also requires reverting the use of the IntervalMap as the
VM address container, since that relies upon non-overlapping
intervals? That smells like a bigger fork than I would want like to
keep indefinitely alive.

It sounds like you might be able to just skip adding some (all?) of the sections into the interval map, which should result in all of them being created, like they used to be.

Or maybe you could fudge their "file addresses" and remap them into non-overlapping regions at this level too. It would break lookups by file addresses for the remapped sections, but this is something that didn't work already when the addresses overlapped. I'm not sure what else could be broken by this.. We already do some fudging like this for relocatable (.o) files, which have all addresses starting at zero, so it seems like at least something can work here.

For my own education, would you be able to send me one of your files with these overlapping sections (or maybe just the output of "readelf -e" or something)? I don't know much about these more exotic platforms, so being aware things like these might be of help when doing future changes.

Incidentally, I was just made aware that this change also breaks for thread-local sections, which can appear to have overlapping file addresses with other sections. So I will probably be revisiting this piece of code soon. However, right not my thinking is to simply stop putting thread-local section address range map while simultaneously starting to ignore them for file address lookups (as thread-local sections need to be handled in a more complex manner anyway). This won't help your use case much...

I believe that a long term solution here would be to introduce some
concept of address spaces to lldb. Then these queries would no longer be
ambiguous as the function FindSectionContainingFileAddress would
(presumably) take an additional address-space identifier as an argument.
I know this is what some downstream users are doing to make things like
this work. However, this is a fairly invasive change, so doing something
like this upstream would require a lot of previous discussion.

Would this also extend the GDB remote protocol, where the single flat
address space seems the only current option? (at least the common
solution in various GDB discussions of DSP targets is address muxing
of the sort we're using)

I would say "hopefully yes", but I not very familiar with these kinds of targets.

I imagine such changes are hampered by the lack of in-tree targets
that require them, both to motivate the change and to keep it testable
(the recent "removing magic numbers assuming 8-bit bytes" discussion
in llvm-dev features the same issue). Previously Embecosm was
attempting to upstream a LLVM target for its demonstration AAP
architecture (features multiple address spaces), e.g.
http://lists.llvm.org/pipermail/llvm-dev/2017-February/109776.html .
However their public forks on GitHub only reveal GDB support rather
than LLDB, and that implementation is by an address mux.

Unfortunately the architecture I'm working with is (yet another) poor
candidate for upstreaming, since it lacks general availability, but
hopefully one of the exotic architectures lurking in the LLVM shadows
someday steps forth with a commitment to keep it alive in-tree.

Yeah, the lack of in-tree targets is one of the causes (but also a consequence of ?) the lack of address space support. I've been following the non-8-bit thread from a distance, and FWIW, I would be fine with having some kind of a mock target supporting these things in lldb. I might even prefer debugging things against a simple mock instead of some complicated-but-real target.

The other causes are the main contributors not knowing enough about these architectures to help drive this, and just being generally busy with other stuff. :confused:

cheers,
pavel