File offset usage on ObjectFileELF

Hi all,

Why is ObjectFileELF using the file offset to parse the ELF headers?

For the changes I'm suggesting, skip to the end of the message (3 last
paragraphs).

From what I can tell, ObjectFileELF's constructor will get a

FileSpec+file_offset and a DataBuffer+data_offset. CreateInstance made
sure the DataBuffer had all the data we needed.

Then ObjectFileELF's constructor calls ObjectFile's constructor with
those arguments too. ObjectFile's constructor sets its private m_data
field to point to the argument DataBuffer's contents (+ offset), and
simply stores the FileSpec and file_offset.

When ObjectFileELF::ParseHeader() is called, it gets the file_offset
and then parses the header from m_data with that offset. This makes no
sense, for me, since the offset is only for the file, not the
DataBuffer.

I couldn't find any other reference to ObjectFile::GetFileOffset on
either ObjectFile, ObjectFileELF, or ObjectFileMachO. There is one
reference to m_file_offset in ObjectFileMachO, though. Which seems
legit, but is weird since CreateInstance makes sure that the
DataBuffer has all the file, before calling the constructor.

I'm proposing changing ObjectFileELF::ParseHeader() to just create a
variable that is 0 as the offset and use that as the offset for the
ELFHeader::Parse() method.

Either ObjectFileMachO calls the GetFileOffset() or it's not doing
anything and we can delete it. Otherwise bug can appear out of nowhere
if anything is changed in one of those.

Is it possible that ObjectFileMachO actually needs to read the file
again? When can this happen? (ObjectFileMachO's constructor is not
private, but I think it's only called from CreateInstance)

Should I prepare a patch, or did I miss something?

Thank you,

  Filipe

P.S: I found this while trying to make a reader for a format based on
ELF which has an extra header. I just bumped the offsets by that
header's size, on CreateInstance, and created an ObjectFileELF, that's
why I bumped into this bug.

This is fallout from changes over the years.

If indeed ObjectFileELF creates its own data that is zero based, then the file offset should be ignored in other areas. It used to be each object file would lazily parse data from the file, but that led to crashes when someone would modify the file during a debug session. We would go to a file offset and read bytes that didn't make sense based off of the data we parsed when the first opened the file (like program and section headers).

To fix this correctly, we now always mmap the entire object file when it first gets parsed so that we don't ever run into this case. During these changes, I switched all object file parsers over to always make their file data correct and zero based (go to file_offset, and read entire file up front).

More answers below.

Hi all,

Why is ObjectFileELF using the file offset to parse the ELF headers?

It shouldn't if the m_data member is filled with the ELF file data starting at file_offset to begin with.

For the changes I'm suggesting, skip to the end of the message (3 last
paragraphs).

From what I can tell, ObjectFileELF's constructor will get a
FileSpec+file_offset and a DataBuffer+data_offset. CreateInstance made
sure the DataBuffer had all the data we needed.

Indeed, changes which happened slowly over the years as LLDB evolved.

Then ObjectFileELF's constructor calls ObjectFile's constructor with
those arguments too. ObjectFile's constructor sets its private m_data
field to point to the argument DataBuffer's contents (+ offset), and
simply stores the FileSpec and file_offset.

When ObjectFileELF::ParseHeader() is called, it gets the file_offset
and then parses the header from m_data with that offset. This makes no
sense, for me, since the offset is only for the file, not the
DataBuffer.

It probably shouldn't. All ELF files start at file offset zero currently as we have no clients that are parsing ELF files from within .a files (BSD archives). I am sure things will fail badly when and if we put ELF files into .a files.

I couldn't find any other reference to ObjectFile::GetFileOffset on
either ObjectFile, ObjectFileELF, or ObjectFileMachO. There is one
reference to m_file_offset in ObjectFileMachO, though. Which seems
legit, but is weird since CreateInstance makes sure that the
DataBuffer has all the file, before calling the constructor.

I'm proposing changing ObjectFileELF::ParseHeader() to just create a
variable that is 0 as the offset and use that as the offset for the
ELFHeader::Parse() method.

That is correct as long as m_data is filled in to have the ELF data start at offset zero which I currently believe it is.

Either ObjectFileMachO calls the GetFileOffset() or it's not doing
anything and we can delete it. Otherwise bug can appear out of nowhere
if anything is changed in one of those.

Is it possible that ObjectFileMachO actually needs to read the file
again? When can this happen? (ObjectFileMachO's constructor is not
private, but I think it's only called from CreateInstance)

GetFileOffset() needs to stay for universal files and for BSD archives. It is mainly needed right now for comm page binaries (see DynamicLoaderMacOSXDYLD::AddModulesUsingImageInfos()). These binaries are mach-o files that are embedded inside a mach-o file section. So please do not remove it.

Should I prepare a patch, or did I miss something?

The flexibility is currently needed for the universal mach-o files and BSD archives. Even worse you can have a universal BSD archive, or a BSD archive full of universal mach-o files. The m_file_offset is useless for ELF as it is always zero, unless you have a BSD archive full of ELF files.

Thank you,

Filipe

P.S: I found this while trying to make a reader for a format based on
ELF which has an extra header. I just bumped the offsets by that
header's size, on CreateInstance, and created an ObjectFileELF, that's
why I bumped into this bug.

Gotcha. Please do fix the ELF parser to not take the file offset into account more than once.

Greg