STABS

Hello everyone,

I recently found myself looking at ObjectFileMachO.cpp. I noticed that nearly half of that file (2700 LOC) is taken up by the ParseSymtab function, and that maybe one third of that is taken up by what appears to be STABS parsing code.

Is anyone still using STABS debug info? If not, can we remove it?

pl

I have now found just:
  Re: stabs changes for 64 bit targets
  David Edelsohn - Re: stabs changes for 64 bit targets
  "AIX continues to use STABS with its XCOFF file format"

And then IBM is porting LLVM to AIX:
  [llvm-dev] [RFC] Adding LLVM Support for AIX
  https://lists.llvm.org/pipermail/llvm-dev/2019-February/130175.html

But who knows whether AIX is still using STABS nowdays (Cced)?

Jan

STABS nlist entries are the way the linker leaves breadcrumbs to the debugger to find the objectifies containing debug info. I haven’t checked whether the code you see is this one, but it seems likely.

Fred

Amplifying Fred's comments:

Most of the code in ParseSymtab is parsing the nlist records in the binary. Only a tiny subset of those nlist records are "stabs". Most are just the format by which MachO expresses its symbol table. So all that needs to be there.

Over the past couple of years, the linker on MachO has switched from using nlist records to using the dyld trie data structure. You can also see evidence of that in ParseSymtab. At this point the nlist records are there because there are lots of analysis tools that haven't been updated to support the new dyld trie. At some point, everything will be updated and the linker will switch over to only emitting the dyld trie, and not emitting the symbol table in nlist form. When that is done and we convince ourselves we no longer need to support older binaries that still use nlist records, we can then remove the nlist parsing code. But until then, this is how the symbol table is expressed. The symbol parsing is actually the majority of the code in ParseSymtab.

Not all nlist records are stabs. Stabs, per se, are the nlist records that have the is_debug flag set. As Fred said, MachO uses the debug nlist records as the format for it's "debug map" which tells us where symbols from .o files ended up in the final linked product. This is definitely NOT a stabs parser, we only support a tiny subset of the full stabs debug format, just what is needed for the debug map. We've talked on and off about coming up with a dedicated format for the debug map, but so far there's been no strong motivation to actually do that, so we've continued to borrow a subset of stabs for the purpose.

There is one bit of ugliness, which is that the debug map parsing is essentially duplicated. Look for: "if (is_debug)" and you will see two very similar blocks (2860 and 3826 in the current sources.) Jason will remember the details better, but there was something gnarly about how libraries in the "shared cache" on Darwin systems work that made it awkward to use the same code for it and normal libraries. Some ambitious person could probably go through and unify the two sections, but this is code that doesn't see much active development, it pretty much does what's needed, so it's not clear what the benefit would be at this point.

Jim

Amplifying Fred's comments:

Most of the code in ParseSymtab is parsing the nlist records in the binary. Only a tiny subset of those nlist records are "stabs". Most are just the format by which MachO expresses its symbol table. So all that needs to be there.

Over the past couple of years, the linker on MachO has switched from using nlist records to using the dyld trie data structure. You can also see evidence of that in ParseSymtab. At this point the nlist records are there because there are lots of analysis tools that haven't been updated to support the new dyld trie. At some point, everything will be updated and the linker will switch over to only emitting the dyld trie, and not emitting the symbol table in nlist form. When that is done and we convince ourselves we no longer need to support older binaries that still use nlist records, we can then remove the nlist parsing code. But until then, this is how the symbol table is expressed. The symbol parsing is actually the majority of the code in ParseSymtab.

Not all nlist records are stabs. Stabs, per se, are the nlist records that have the is_debug flag set. As Fred said, MachO uses the debug nlist records as the format for it's "debug map" which tells us where symbols from .o files ended up in the final linked product. This is definitely NOT a stabs parser, we only support a tiny subset of the full stabs debug format, just what is needed for the debug map. We've talked on and off about coming up with a dedicated format for the debug map, but so far there's been no strong motivation to actually do that, so we've continued to borrow a subset of stabs for the purpose.

There is one bit of ugliness, which is that the debug map parsing is essentially duplicated. Look for: "if (is_debug)" and you will see two very similar blocks (2860 and 3826 in the current sources.) Jason will remember the details better, but there was something gnarly about how libraries in the "shared cache" on Darwin systems work that made it awkward to use the same code for it and normal libraries. Some ambitious person could probably go through and unify the two sections, but this is code that doesn't see much active development, it pretty much does what's needed, so it's not clear what the benefit would be at this point.

  Jim

Thanks for the detailed explanation Jim. I've found it very useful, as it plugs a large gap I've had in the knowledge of how debug info works on apple platforms.

The reason I was looking at this code in the first place is because I'm trying to add unwinding support on windows platforms. It is mostly straight-forward, but there is one large hickup in the form of the __stdcall calling convention on x86. This is a callee-cleanup convention, which AFAICT is a new thing to lldb.

The interesting bit here is that it becomes important to know the size of the arguments to a function during unwinding. This size is encoded in the symbol names (e.g. "_foo@4"). Due to the way that unwind info is represented (the argument pushes aren't represented in the caller, it may be necessary to look at the argument size of one function (callee) when unwinding another function (caller).

I was hoping I could represent this information in the Symbol class without increasing its size. Hence I was looking at the various other bits of information stored in there, and seeing if any of those can be removed.

Anyway, it looks like the "stabs" code and the various Symtab bits associated with it are going to stay. I'm not sure yet what does this mean for my unwinding effort, as I am still in the process learning how this stuff actually works, and whether the Symtab stuff is really needed, but I figured it would be good to at least explain the my motivations here.

On 26/07/2019 22:57, Chris Bowler wrote:> IBM is currently adding support for AIX to LLVM and we still have
> customers that use STABS for debug. I expect customers to try to move
> to DWARF but I think the DWARF support on AIX needs some improvement
> before we can fully transition. I kindly request that we defer removal
> of the STABS support until IBM has a better handle on whether or not
> we'll want it for AIX.

Chris,

it seems that the code in question is going to stay for a while. Unfortunately, it looks like it won't be of much use to you, should you decide to add STABS support to lldb (it is in macho-specific parts of code, and is not a real STABS parser anyway).

Cheers,
pavel