Inconsistencies in CIE pointer in FDEs in .debug_frame

Hi,

I'm looking into something that seems like an inconsistency in handling of the CIE pointer in FDEs in .debug_frame, between how debug info is generated in LLVM and consumed in LLDB.

For FDEs in .eh_frame, the CIE pointer/cie_id field is interpreted as an offset from the current FDE - this seems to be consistent.

But for cases in .debug_frame, they are treated differently. In LLDB, the cie_id field is assumed to be relative to the begin of the .debug_frame section: https://github.com/llvm/llvm-project/blob/master/lldb/source/Symbol/DWARFCallFrameInfo.cpp#L482-L495

However, when this field is produced in LLVM, it can, depending on MCAsmInfo flags, end up written as a plain absolute address to the CIE: https://github.com/llvm/llvm-project/blob/master/llvm/lib/MC/MCDwarf.cpp#L1699-L1705

That code in MCDwarf.cpp hasn't been touched in many years, so I would expect that the info it generates actually has been used since and been found to be correct. Or are most cases built with -funwind-tables or similar, enabled by default?, so this is exercised in untested cases?

In the case where I'm running in this, LLDB reports "error: Invalid cie offset" when running executables with such .debug_frame sections.

By adding an ", true" to the end of the EmitSymbolValue call in MCDwarf.cpp, the symbol reference is made section relative and the code seems to do what LLDB expects. Is that correct, or should LLDB learn the cases (which?) where the cie_id is an absolute address instead of a section relative one?

// Martin

What's the target you're encountering this behavior on? Can you maybe provide a short example of how the CIE/FDE entries in question look like?

I could be wrong (I'm not really an expert on this), but my understanding is that "asmInfo->doesDwarfUseRelocationsAcrossSections()" is basically equivalent to "is target MachO", and the reason that we don't emit section relative addresses there is because MachO does not link debug info sections. This means there will only ever be a single debug_frame contribution in one file, and so we can just put offsets directly, instead of relying on linker to patch things up. Doing anything like this in a format which links (concatenates) debug info sections would certainly result in irreparably corrupted unwind info, since you have no idea what will be present at a certain absolute address (offset) once the linker has finished its thing.

That said, if that is all there is here, then it does not seem to me like there's any special support in lldb needed, as the cie offset will always be a correct absolute offset from the start of the section by the time lldb gets to see it (and so it shouldn't matter if the offset was put there by the compiler or the linker). This makes me think that I am missing something, but I have no idea what could that be..

Anyway, I hope this helps somehow..

pl

Hi,

I'm looking into something that seems like an inconsistency in handling of the CIE pointer in FDEs in .debug_frame, between how debug info is generated in LLVM and consumed in LLDB.

For FDEs in .eh_frame, the CIE pointer/cie_id field is interpreted as an offset from the current FDE - this seems to be consistent.

But for cases in .debug_frame, they are treated differently. In LLDB, the cie_id field is assumed to be relative to the begin of the .debug_frame section: https://github.com/llvm/llvm-project/blob/master/lldb/source/Symbol/DWARFCallFrameInfo.cpp#L482-L495

However, when this field is produced in LLVM, it can, depending on MCAsmInfo flags, end up written as a plain absolute address to the CIE: https://github.com/llvm/llvm-project/blob/master/llvm/lib/MC/MCDwarf.cpp#L1699-L1705

That code in MCDwarf.cpp hasn't been touched in many years, so I would expect that the info it generates actually has been used since and been found to be correct. Or are most cases built with -funwind-tables or similar, enabled by default?, so this is exercised in untested cases?

In the case where I'm running in this, LLDB reports "error: Invalid cie offset" when running executables with such .debug_frame sections.

By adding an ", true" to the end of the EmitSymbolValue call in MCDwarf.cpp, the symbol reference is made section relative and the code seems to do what LLDB expects. Is that correct, or should LLDB learn the cases (which?) where the cie_id is an absolute address instead of a section relative one?

// Martin

What's the target you're encountering this behavior on? Can you maybe provide a short example of how the CIE/FDE entries in question look like?

I'm seeing this behaviour for mingw targets. GCC produces debug_frame sections where the CIE pointer is a section relative address (with a SECTREL relocation), while LLVM produces debug_frame sections with absolute (global/virtual) addresses.

LLDB seems to expect the format that GCC produces here.

I could be wrong (I'm not really an expert on this), but my understanding is that "asmInfo->doesDwarfUseRelocationsAcrossSections()" is basically equivalent to "is target MachO"

Yes, that's pretty much my take of it as well. The BPF target also has an option for setting this flag in asminfo, but other than that, it's not modified.

That said, if that is all there is here, then it does not seem to me like there's any special support in lldb needed, as the cie offset will always be a correct absolute offset from the start of the section by the time lldb gets to see it (and so it shouldn't matter if the offset was put there by the compiler or the linker). This makes me think that I am missing something, but I have no idea what could that be..

This wasn't the inconsistency I'm looking into.

I'm looking into an inconsistency between section relative and absolute addresses. The default case in MCDwarf.cpp, calls EmitSymbolValue(&cieStart, 4).

By default EmitSymbolValue emits _absolute_ addresses (or more precisely, relocations that makes the linker produce absolute addresses), i.e. the full address of the CIE, instead of section relative.

The EmitSymbolValue function, declared at https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/MC/MCStreamer.h#L669-L670, takes an IsSectionRelative parameter, which defaults to false here (as it isn't specified). I would expect that it should be true, as LLDB expects a section relative address here.

I think this is a bug in LLVM's MCDwarf.cpp, but it puzzles me how it can have gone unnoticed.

But now I tested this a bit more with ELF setups, and realized that it somehow does seem to do the right thing. It might have something to do with how ELF linkers handle this kind of section that isn't loaded at runtime (and thus perhaps doesn't really have a virtual address assigned).

So that pretty much clears the question regarding inconsistency, and raises more questions about how this really works in ELF and MCDwarf.

A test procedure that shows off the issue is this:

$ cat test.c
void entry(void) { }

$ bin/clang -fno-unwind-tables test.c -c -g -o test.o -target i686-linux-gnu
$ bin/llvm-objdump -r test.o

test.o: file format ELF32-i386

<redacted>

RELOCATION RECORDS FOR [.debug_frame]:
00000018 R_386_32 .debug_frame
0000001c R_386_32 .text

# As far as I know, these two R_386_32 relocations both indicate that the
# full, absolute address of these two locations should be inserted in
# these two locations.

$ bin/ld.lld test.o -o exe -e entry
$ bin/llvm-dwarfdump --eh-frame exe

exe: file format ELF32-i386

.debug_frame contents:

00000000 00000010 ffffffff CIE
<redacted for brevity>

00000014 00000018 00000000 FDE cie=00000000 pc=004010c0...004010c5
                   ^
# The CIE offset, the third field, is set as zero (the offset where the
# CIE starts, even though the relocation indicated absolute address),
# but the R_386_32 for the .text address gave a correct absolute pc range.

Now if I repeat the same steps but for a mingw target, this ends up different:

$ bin/clang -fno-unwind-tables test.c -c -g -o test.o -target i686-mingw32
$ bin/llvm-objdump -r test.o

test.o: file format COFF-i386

<redacted>

RELOCATION RECORDS FOR [.debug_frame]:
00000018 IMAGE_REL_I386_DIR32 .debug_frame
0000001c IMAGE_REL_I386_DIR32 .text

# Same thing here, absolute addresses for .debug_frame and .text

$ bin/lld-link test.o -out:exe -entry:entry -subsystem:console -debug:dwarf
$ bin/llvm-dwarfdump --eh-frame exe
exe: file format COFF-i386

.debug_frame contents:

00000000 00000010 ffffffff CIE
<redacted>

00000014 00000014 00404000 FDE cie=00404000 pc=00401000...00401005
                   ^
# Here the CIE offset, the third column, ended up as an absolute address,
# 0x00404000, which LLDB rejects.

So, if I make the call to EmitSymbolValue() set the IsSectionRelative parameter to true, I get the correct, expected relocations for this section:

RELOCATION RECORDS FOR [.debug_frame]:
00000018 IMAGE_REL_I386_SECREL .debug_frame
0000001c IMAGE_REL_I386_DIR32 .text

This matches what GCC produces in similar cases as well.

But with this in place, ELF targets misbehave severely; there's no relocation produced at all for the .debug_frame symbol, and the second relocation gets written at the wrong offset.

In any case, it's clearly only an LLVM/MC issue, and no issue with LLDB.

// Martin

Ok, it turns out that there's already a flag that indicates exactly this, asmInfo->needsDwarfSectionOffsetDirective(), which just seems to not be used here where it should, which seems to encapsulate whether a certain type of relocation needs to be used (like in COFF) or if different section types like in ELF seems to handle it automatically with just one kind of relocation. I had seen it before but didn't really understand its role until I saw how ELF behaved.

So this is clearly a closed case, and I'll be sending a patch for MCDwarf soon.

// Martin

Hi,

I'm looking into something that seems like an inconsistency in handling of the CIE pointer in FDEs in .debug_frame, between how debug info is generated in LLVM and consumed in LLDB.

For FDEs in .eh_frame, the CIE pointer/cie_id field is interpreted as an offset from the current FDE - this seems to be consistent.

But for cases in .debug_frame, they are treated differently. In LLDB, the cie_id field is assumed to be relative to the begin of the .debug_frame section: https://github.com/llvm/llvm-project/blob/master/lldb/source/Symbol/DWARFCallFrameInfo.cpp#L482-L495

However, when this field is produced in LLVM, it can, depending on MCAsmInfo flags, end up written as a plain absolute address to the CIE: https://github.com/llvm/llvm-project/blob/master/llvm/lib/MC/MCDwarf.cpp#L1699-L1705

That code in MCDwarf.cpp hasn't been touched in many years, so I would expect that the info it generates actually has been used since and been found to be correct. Or are most cases built with -funwind-tables or similar, enabled by default?, so this is exercised in untested cases?

In the case where I'm running in this, LLDB reports "error: Invalid cie offset" when running executables with such .debug_frame sections.

By adding an ", true" to the end of the EmitSymbolValue call in MCDwarf.cpp, the symbol reference is made section relative and the code seems to do what LLDB expects. Is that correct, or should LLDB learn the cases (which?) where the cie_id is an absolute address instead of a section relative one?

// Martin

What's the target you're encountering this behavior on? Can you maybe provide a short example of how the CIE/FDE entries in question look like?

I'm seeing this behaviour for mingw targets. GCC produces debug_frame sections where the CIE pointer is a section relative address (with a SECTREL relocation), while LLVM produces debug_frame sections with absolute (global/virtual) addresses.

Right. That's the part I was missing. Thanks.

LLDB seems to expect the format that GCC produces here.

I could be wrong (I'm not really an expert on this), but my understanding is that "asmInfo->doesDwarfUseRelocationsAcrossSections()" is basically equivalent to "is target MachO"

Yes, that's pretty much my take of it as well. The BPF target also has an option for setting this flag in asminfo, but other than that, it's not modified >

That said, if that is all there is here, then it does not seem to me like there's any special support in lldb needed, as the cie offset will always be a correct absolute offset from the start of the section by the time lldb gets to see it (and so it shouldn't matter if the offset was put there by the compiler or the linker). This makes me think that I am missing something, but I have no idea what could that be..

This wasn't the inconsistency I'm looking into.

I'm looking into an inconsistency between section relative and absolute addresses. The default case in MCDwarf.cpp, calls EmitSymbolValue(&cieStart, 4).

By default EmitSymbolValue emits _absolute_ addresses (or more precisely, relocations that makes the linker produce absolute addresses), i.e. the full address of the CIE, instead of section relative.

The EmitSymbolValue function, declared at https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/MC/MCStreamer.h#L669-L670, takes an IsSectionRelative parameter, which defaults to false here (as it isn't specified). I would expect that it should be true, as LLDB expects a section relative address here.

I think this is a bug in LLVM's MCDwarf.cpp, but it puzzles me how it can have gone unnoticed.

But now I tested this a bit more with ELF setups, and realized that it somehow does seem to do the right thing. It might have something to do with how ELF linkers handle this kind of section that isn't loaded at runtime (and thus perhaps doesn't really have a virtual address assigned).

So that pretty much clears the question regarding inconsistency, and raises more questions about how this really works in ELF and MCDwarf.

A test procedure that shows off the issue is this:

$ cat test.c
void entry(void) { }

$ bin/clang -fno-unwind-tables test.c -c -g -o test.o -target i686-linux-gnu
$ bin/llvm-objdump -r test.o

test.o: file format ELF32-i386

<redacted>

RELOCATION RECORDS FOR [.debug_frame]:
00000018 R_386_32 .debug_frame
0000001c R_386_32 .text

# As far as I know, these two R_386_32 relocations both indicate that the
# full, absolute address of these two locations should be inserted in
# these two locations.

$ bin/ld.lld test.o -o exe -e entry
$ bin/llvm-dwarfdump --eh-frame exe

exe: file format ELF32-i386

.debug_frame contents:

00000000 00000010 ffffffff CIE
<redacted for brevity>

00000014 00000018 00000000 FDE cie=00000000 pc=004010c0...004010c5
^
# The CIE offset, the third field, is set as zero (the offset where the
# CIE starts, even though the relocation indicated absolute address),
# but the R_386_32 for the .text address gave a correct absolute pc range.

Now if I repeat the same steps but for a mingw target, this ends up different:

$ bin/clang -fno-unwind-tables test.c -c -g -o test.o -target i686-mingw32
$ bin/llvm-objdump -r test.o

test.o: file format COFF-i386

<redacted>

RELOCATION RECORDS FOR [.debug_frame]:
00000018 IMAGE_REL_I386_DIR32 .debug_frame
0000001c IMAGE_REL_I386_DIR32 .text

# Same thing here, absolute addresses for .debug_frame and .text

$ bin/lld-link test.o -out:exe -entry:entry -subsystem:console -debug:dwarf
$ bin/llvm-dwarfdump --eh-frame exe
exe: file format COFF-i386

.debug_frame contents:

00000000 00000010 ffffffff CIE
<redacted>

00000014 00000014 00404000 FDE cie=00404000 pc=00401000...00401005
^
# Here the CIE offset, the third column, ended up as an absolute address,
# 0x00404000, which LLDB rejects.

So, if I make the call to EmitSymbolValue() set the IsSectionRelative parameter to true, I get the correct, expected relocations for this section:

RELOCATION RECORDS FOR [.debug_frame]:
00000018 IMAGE_REL_I386_SECREL .debug_frame
0000001c IMAGE_REL_I386_DIR32 .text

This matches what GCC produces in similar cases as well.

But with this in place, ELF targets misbehave severely; there's no relocation produced at all for the .debug_frame symbol, and the second relocation gets written at the wrong offset.

In any case, it's clearly only an LLVM/MC issue, and no issue with LLDB.

Thanks for the detailed explanation.

So, what elf linkers do is that they link non-loadable (SHF_ALLOC) sections as if they were loaded at address zero. I think it's possible to change that via a linker script, but I think doing that would cause pretty much everything to blow up.

This means that the whole absolute vs. section-relative inconsistency is irrelevant there (and I would expect the elf folks would not even consider that a inconsistency/bug).

In any case, I agree with your assessment that this is an llvm/mc bug, and so we'll probably need to open this issue on llvm-dev. I guess the reason that this wasn't discovered is because llvm tools (and lldb in particular) are not so widely used/tested on windows. In might be interesting to see what happens if you feed the llvm generated file to gdb, or maybe link it with the gnu linker...

pl

Ah, thanks - that does explain it.

Yeah, in COFF, all sections, even non-loaded ones (IMAGE_SCN_MEM_DISCARDABLE) are assigned virtual addresses as if they actually were loaded.

// Martin