Linux ELF header e_ident[EI_OSABI] value

I’ve been trying to understand why some Linux core files fail to load in lldb.

The problem seems to be that in the ELF header Linux uses the ELFOSABI_NONE (0x0) value rather than ELFOSABIT_LINUX (0x3).If I either change the e_ident[EI_OSABI] byte to 0x3 in the header or the code in ArchSpec.cpp to treat ELFOSABI_NONE as Linux then LLDB will open these core files perfectly. The Linux core dumps that are being opened successfully seem to be doing so because lldb is using extra optional information in the notes section. Either because it contains notes “owned” by Linux or because of the file names contained in the FILE note type. A lot of core dumps (it appears to be those created by the kernel following a crash rather than gcore) don’t contain the “LINUX” notes and the file paths in the FILE note can vary a lot by Linux distribution. (For example Ubuntu cores work but Redhat cores I’ve tried don’t as the libraries are in different locations.)

Linux doesn’t seem to use the ELFOSABIT_LINUX value (0x3) but sticks to the ELFOSABI_NONE (0x0) value. This apppears to be true for both executables and core dumps, LLVM was changed to follow this convention (see: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150921/301607.html ) but lldb still looks for ELFOSABI_LINUX in ELF headers even though executables and core files seem to contain ELFOSABI_NONE in practise. If I compile code with clang the resulting executable uses ELFOSABI_NONE in the e_ident[EI_OSABI] byte. (If I change the byte manually Linux doesn’t appear to care. I think it’s probably ignoring the byte.)

I’d like to submit a patch to change lldb to treat ELF files with ELFOSABI_NONE set as Linux as a) it would allow lldb to open Linux cores reliably and b) it would match how clang treats e_ident[EI_OSABI]. The code to detect whether lldb is opening a Linux core has changed a lot recently and I don’t know the history or if any other ELF platforms leave this byte set to 0x0 in which case this would be confusing, though as this value is currently unused it seems safe.

Does anyone know of any reason not to change this? If not I’ll submit a patch for review.

Howard Hellyer
IBM Runtime Technologies, IBM Systems
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

I would change it so that the "os" doesn't get set to anything when it detects this in the core file. When an OS is not specified, the llvm::Triple will return OSUnknown as the enumeration value for the OS _and_ the llvm::StringRef value will return an empty string. This is known in LLDB term as a "unspecified unknown". This means the triple will be "x86_64-*-<vendor>". An unspecified unknown is a wildcard. Any plugins that are trying to load a core file should watch for this and use it accordingly.

So the answer is not "treat ELF files with ELFOSABI_NONE set as Linux", but "treat ELF files with ELFOSABI_NONE set as *". Please submit a patch that implements this when you get the chance. Let me know if you have any questions.

Greg Clayton

We don't want to make ELFOSABI_NONE mean Linux. ELFOSABI_NONE is historically ELFOSABI_SYSV, and used by a lot of things. So not all core files identified as ELFOSABI_NONE are Linux.

Whe lldb loads a core file with a target binary, it will merge the 2 triples. If it can't identify the OS from the core file, it will use the OS from the target file. For example, I just loaded a Hexagon Linux core file, which lldb didn't identify as Linux, and a Hexagon Linux target, which lldb did identify as Linux. The final triple is correct - hexagon-*-linux:
(lldb) file u:\hexagon-linux\crash -c u:\hexagon-linux\core
Core file 'u:\hexagon-linux\core' (hexagon) was loaded.
(lldb) tar list
Current targets:
* target #0: u:\hexagon-linux\crash ( arch=hexagon-*-linux, platform=remote-linux, pid=13818, state=stopped )

ObjectFileELF::RefineModuleDetailsFromNote looks for a note with type NT_FILE, then looks in that for a path that starts with "/lib/x86_64-linux-gnu". If it finds that, it will set the core file's OS to Linux. Teaching that to speak the Linux dialect you're interested in is probably the right way to go.

Indeed, and that's true for binaries and libraries too. For one
specific example, FreeBSD/arm64 binaries have ELFOSABI_NONE (as
specified by the AArch64 ABI).

LLDB's OS detection from binaries and core files is (or was?) rather
awkward and I hope we can clean it up, but treating ELFOSABI_NONE as
Linux is a nonstarter.

> I would change it so that the "os" doesn't get set to anything when
> it detects this in the core file. When an OS is not specified, the
> llvm::Triple will return OSUnknown as the enumeration value for the
> OS _and_ the llvm::StringRef value will return an empty string. This
> is known in LLDB term as a "unspecified unknown". This means the
> triple will be "x86_64-*-<vendor>". An unspecified unknown is a
> wildcard. Any plugins that are trying to load a core file should
> watch for this and use it accordingly.
>
> So the answer is not "treat ELF files with ELFOSABI_NONE set as
> Linux", but "treat ELF files with ELFOSABI_NONE set as *". Please
> submit a patch that implements this when you get the chance. Let me
> know if you have any questions.

I think that’s the current behaviour in ArchSpec.cpp. Setting it deliberately to UnknownOS prevents the code later on from refining it any further from the notes section of the core file. (Unless you meant somewhere else in which case I’ll take a look.)

Howard Hellyer
IBM Runtime Technologies, IBM Systems |


|

  • | - |

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

> We don't want to make ELFOSABI_NONE mean Linux. ELFOSABI_NONE is
> historically ELFOSABI_SYSV, and used by a lot of things. So not all
> core files identified as ELFOSABI_NONE are Linux.

I agree that other OS's may use it or have used it in the past but I don't know if any of those are supported by LLDB at the moment. (If they are then they probably have the same problem.)
It's definitely annoying that Linux doesn't seem to use the value that makes sense but as it stands the case statement in ArchSpec.cpp won't actually hit its Linux case at the moment (which is quite confusing). I guess I just didn't want to bypass the trivial fix if it didn't affect anything else in practise.

> ObjectFileELF::RefineModuleDetailsFromNote looks for a note with
> type NT_FILE, then looks in that for a path that starts with "/lib/
> x86_64-linux-gnu". If it finds that, it will set the core file's OS
> to Linux. Teaching that to speak the Linux dialect you're interested
> in is probably the right way to go.

The problem with that is the Redhat cores I have to hand (from various test machines) have the FILE note section but the library files are in /usr/lib (32 bit) or /usr/lib64 (64 bit). That looks sufficiently generic that identifying the OS as Linux based on those would probably have the same effect as using ELFOSABI_NONE. The paths LLDB currently knows about (and match my Ubuntu box) are /lib/i386-linux-gnu and /lib/x86_64-linux-gnu. Since they have "linux" in them they a much safer bet.

I also have some other cores taken from Ubuntu running in a containerised environment where the library path in the core is actually the full path from outside the container, so it only ends in /lib/x86_64-linux-gnu, the full path is /packages/rootfs_cflinuxfs2/[very long UID value]/rootfs/lib/x86_64-linux-gnu/[library].so. (This may be a container problem though, I'm not sure if using core dumps to discover this path is actually a bug.)

Howard Hellyer
IBM Runtime Technologies, IBM Systems |


|

  • | - |

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

FWIW, I’ve taken a few whacks at getting Linux detected better over the last few years, and haven’t yet found a reliable way to detect it from quite a few samples of cores from a number of different systems. We can spend more time looking into it, but that stone has been turned over several times.

> FWIW, I've taken a few whacks at getting Linux detected better over
> the last few years, and haven't yet found a reliable way to detect
> it from quite a few samples of cores from a number of different
> systems. We can spend more time looking into it, but that stone has
> been turned over several times.

I spent quite a lot of time looking at the output of readelf too. I was kind of hoping Linux was the only platform not using it's OSABI value, which would have worked.

The only other thing I thought of suggesting was having the ELFOSABI_NONE case ifdef'd so that lldb defaults to the platform that it was built for - on the assumption that you are probably opening a core from the machine you are on. (So on Linux ELFOSABI_NONE would mean Linux, on FreeBSD it would mean FreeBSD.) That would have meant lldb behaved differently depending on where it was compiled which seems wrong and would introduce awkward to debug behaviour so I ended up talking myself out of it.

Howard Hellyer
IBM Runtime Technologies, IBM Systems |


|

  • | - |

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

That works fine for host debug, but not so much for embedded. On Hexagon, we support 2 OS cases – standalone (which means no OS, or an OS that lldb doesn’t know anything about) and Linux. Both our standalone simulator and our Linux generate core dumps using ELFOSABI_NONE. We run lldb on both x86 Linux and Windows, and our core dumps need to work on both.

Doesn’t lldb get the correct OS from the target when you load them together?

It is ok for a core file to not pledge allegiance to an OS, it is ok for the OS to be set to "*" or any OS. Linux core files are useless without the main executable anyway so these two things should used together to do the right thing. When creating the core files you use:

lldb::ProcessSP
ProcessElfCore::CreateInstance (lldb::TargetSP target_sp, lldb::ListenerSP listener_sp, const FileSpec *crash_file)
{

So you are handed the target. You can get the executable file from the target and also check the target's architecture or the main executable's architecture. There shouldn't be a problem figuring this out right?

Greg

After the core file is loaded in ProcessElfCore::DoLoadCore, the logic under "target create" will merge the ArchSpec of the target and the core, replacing the "unknown" OS in the core ArchSpec with "linux" from the target ArchSpec.

Howard, are you loading a target executable, or just the core?

> That works fine for host debug, but not so much for embedded. On
> Hexagon, we support 2 OS cases – standalone (which means no OS, or
> an OS that lldb doesn’t know anything about) and Linux. Both our
> standalone simulator and our Linux generate core dumps using
> ELFOSABI_NONE. We run lldb on both x86 Linux and Windows, and our
> core dumps need to work on both.

Ah, cross compiling/debugging makes that an even worse idea.
``
> Doesn’t lldb get the correct OS from the target when you load them together?

Yes, I'm just used to getting dumps from customers which don't always come with the executable, or the right executable, and was testing that scenario when I started looking into this.

Howard Hellyer
IBM Runtime Technologies, IBM Systems
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unfortunately there isn't enough of any ELF file in memory that allows us to load any symbols or anything sense of anything without having all of the binaries. Linux core files must have the executables otherwise they are not very useful. For instance there are no symbol tables in memory when using ELF, so there isn't a way to create symbols so that LLDB can find out the function bounds so that we can actually backtrace correctly. Many Linux systems have a shell script that can be run on the linux system that is creating the core file and these shell scripts are often tasked with using the core file _and_ all the files on the machine itself to make a useful crash log. These tools often get the core file streamed to it and then the core file will go away afterward. That is the place to actually do the hard work of symbolicating if you are not going to have access to the exact machine later. One could create a shell tool that links against LLDB.framework which can load the core file on that machine and actually symbolicate the crash for you and dump it out in some nice structured data format (JSON, XML, Yaml, etc).