ELFObjectFile changes, llvm-objdump showing 'wrong' values?

Hi all,

I'm using the MC framework for a project, and while updating to latest
trunk (r148672) encountered the following issue:

It seems that SymbolRef::getAddress and SymbolRef::getFileOffset have
been changed to add the symbol's offset to the offset of the
containing section?

This has the following implications:

To get the /actual/ fileoffset, I now need to do:
Symbol.getFileOffset() - ContainingSection.getFileOffset()
And to get the address relative to the section, I do:
Symbol.getFileOffset() - 2*ContainingSection.getFileOffset()

I suspect this isn't the desired functionality (what use is the
original value?)?

You can also see the impact of this on the tool llvm-objdump (as well
as llvm-nm), as shown below:

Normal objdump: Normal objdump on 'test' - Pastebin.com
vs llvm-objdump: llvm-objdump on 'test' - Pastebin.com

I believe r148653 caused this, but haven't verified directly. This
didn't happen as of r148100.

Am I missing something (my code borrows a good deal from llvm-objdump
and llvm-nm, so if they are doing something wrong with respect to
these new changes, so am I), or is this something that should be
fixed?

Thanks for your time!

~Will

Hi Will,

I've committed the recent change to ELFObjectFile (r148653). It was supposed to add new functionality, not break existing one. I'll take a look at this and will keep you updated.

Eli

Hi,

I would like to examine the implications you mention in more detail.

(1) Symbol address
According to the ELF standard, in a symbol table entry st_value means: "In relocatable files, st_value holds a section offset for a defined symbol. That is,
st_value is an offset from the beginning of the section that st_shndx identifies." (*)

Therefore, when queried about a symbol's address what would the right answer be? In ELFObjectFile::getSymbolAddress, previously, it was simply symb->st_value (which is the relative offset to the section). Now, Section->sh_addr is added to reflect the actual address of the symbol.

Ignoring for the moment the change this imposes on objdump & nm (which can be amended), what would the expected address be for clients of getSymbolAddress?

(2) Symbol offset
Again, referring to the definition of the "st_value" field above, the file offset of the symbol is the section offset plus the symbol's offset in the section, which is reflected in the new code:

    Result = symb->st_value +
             (Section ? Section->sh_offset : 0);

The old code subtracted Section->sh_addr from that for reasons that are not entirely clear to me.

I'm not sure where this creates a problem for you? AFAICS, neither llvm-objdump nor llvm-nm use the symbol's file offset. It's also not clear from your pastes of llvm-objdump and objdump what the significant difference are.

Eli

(*) ELFObjectFile represents a relocatable file

Hi,

I would like to examine the implications you mention in more detail.

Thank you!

(1) Symbol address
According to the ELF standard, in a symbol table entry st_value means: "In relocatable files, st_value holds a section offset for a defined symbol. That is,
st_value is an offset from the beginning of the section that st_shndx identifies." (*)

Therefore, when queried about a symbol's address what would the right answer be? In ELFObjectFile::getSymbolAddress, previously, it was simply symb->st_value (which is the relative offset to the section). Now, Section->sh_addr is added to reflect the actual address of the symbol.

Ignoring for the moment the change this imposes on objdump & nm (which can be amended), what would the expected address be for clients of getSymbolAddress?

I trust your interpretation and implementation of the relevant spec's,
and don't mean to suggest a mistake there. I apologize if I did so
previously.

What I do know is that now ELFObjectFile doesn't seem to work on
executables, as it did before. Accordingly the tools that use
ELFObjectFile (llvm-objdump, llvm-nm) no longer accurately display
symbol information on such files (and my project, using code from
these tools, doesn't either). Since these tools used to do this
"correctly", as do their non-llvm counterparts, and because they made
use of ELFObjectFile for this purpose, I assumed that was a supported
use case. It appears that's incorrect, and the output working for
executables was always a coincidence. I wish this wasn't the case,
but I understand things change and will update my project accordingly
(or move away from MC if that's not possible, I suppose). I assume
there's no somewhat-equivalent class/etc that will enable a client to
reason about non-relocatable ELF files now that ELFObjectFile doesn't
support them?

(2) Symbol offset
Again, referring to the definition of the "st_value" field above, the file offset of the symbol is the section offset plus the symbol's offset in the section, which is reflected in the new code:

Result = symb->st_value +
(Section ? Section->sh_offset : 0);

The old code subtracted Section->sh_addr from that for reasons that are not entirely clear to me.

I'm not sure where this creates a problem for you? AFAICS, neither llvm-objdump nor llvm-nm use the symbol's file offset. It's also not clear from your pastes of llvm-objdump and objdump what the significant difference are.

The difference in the pastes, and my apologies for not explicitly
pointing this out originally, is that the symbol addresses (see
'main') now seem to double-include the section address in their value.
Notice how llvm-objdump gives address of 00800850 for main while
objdump shows 004004a0. Note that before your changes llvm-objdump's
output was aligned with that of normal objdump in this regard.

Eli

(*) ELFObjectFile represents a relocatable file

It appears 100% of the/my problem is thinking ELFObjectFile was
suitable for use on non-relocatable files such as executables. Since
this appears to be wrong (it gives the wrong results for such files as
detailed above, and probably others), and because this is by design
not mistake, might I suggest something similar to updating
Binary::createBinary (in lib/Object/Binary.cpp) to reflect this to
avoid future confusion (as it presently uses ELFObjectFile for all ELF
file types, not just relocatables). I don't know how the correct
person to bug about this, hopefully addressing llvmdev@ is sufficient
here.

Thank you for your time Eli, your detailed explanation, and your
continued work. Have a good one :slight_smile:

~Will

> (1) Symbol address
> According to the ELF standard, in a symbol table entry st_value means:
> "In relocatable files, st_value holds a section offset for a defined
> symbol. That is, st_value is an offset from the beginning of the
> section that st_shndx identifies." (*)
>
> Therefore, when queried about a symbol's address what would the right
answer be? In ELFObjectFile::getSymbolAddress, previously, it was simply
symb->st_value (which is the relative offset to the section). Now, Section-
>sh_addr is added to reflect the actual address of the symbol.
>
> Ignoring for the moment the change this imposes on objdump & nm (which
can be amended), what would the expected address be for clients of
getSymbolAddress?

I trust your interpretation and implementation of the relevant spec's, and
don't mean to suggest a mistake there. I apologize if I did so previously.

What I do know is that now ELFObjectFile doesn't seem to work on
executables, as it did before. Accordingly the tools that use ELFObjectFile
(llvm-objdump, llvm-nm) no longer accurately display symbol information on
such files (and my project, using code from these tools, doesn't either).
Since these tools used to do this "correctly", as do their non-llvm
counterparts, and because they made use of ELFObjectFile for this purpose, I
assumed that was a supported use case. It appears that's incorrect, and the
output working for executables was always a coincidence. I wish this wasn't
the case, but I understand things change and will update my project
accordingly (or move away from MC if that's not possible, I suppose). I
assume there's no somewhat-equivalent class/etc that will enable a client to
reason about non-relocatable ELF files now that ELFObjectFile doesn't
support them?

I did not mean to make a sweeping claim that ELFObjectFile doesn't support anything but relocatable files. ELF is flexible enough to allow the same class to support several types of objects, but I don't know if ELFObjectFile actually attempts this. I *assumed* it was meant to mainly support relocatable files, due to the intentions of the libObject library (linker, etc).

In any case, if the intention is to support both relocatable and executable files, then perhaps more sophistication is required. Take st_value, for example. For relocatable files, it's the offset from the section the symbol points to (in st_shndx). For executables (and .so), however, it's just the virtual address. So, for ELFObjectFile::getSymbolAddress to support both, it would probably first need to decide which kind of object it deals with (information ELF makes available in the e_type field of the header).

Looking at it this way, the old code (r148652) assumed executable (since it simply returns st_value for the address), and the new code (r148653) assumes relocatable (since it adds st_value to the section address).

At this point, I would really like to hear more from others at @llvmdev. What would the best approach be? I don't have a problem to change the code moving our new calculation of the address to DyldELFObject where we really need it for the dynamic loading in MC-JIT, but maybe something can be done to accommodate both directions (e.g. going the old way for e_type = ET_EXEC or ET_DYN and the new way for ET_REL?).

>
> (2) Symbol offset
> Again, referring to the definition of the "st_value" field above, the file
offset of the symbol is the section offset plus the symbol's offset in the
section, which is reflected in the new code:
>
> Result = symb->st_value +
> (Section ? Section->sh_offset : 0);
>
> The old code subtracted Section->sh_addr from that for reasons that are
not entirely clear to me.
>
> I'm not sure where this creates a problem for you? AFAICS, neither llvm-
objdump nor llvm-nm use the symbol's file offset. It's also not clear from
your pastes of llvm-objdump and objdump what the significant difference
are.
>

The difference in the pastes, and my apologies for not explicitly pointing this
out originally, is that the symbol addresses (see
'main') now seem to double-include the section address in their value.
Notice how llvm-objdump gives address of 00800850 for main while
objdump shows 004004a0. Note that before your changes llvm-objdump's
output was aligned with that of normal objdump in this regard.

Now I see it, thanks. However, I still don't see where llvm-objdump uses the file offset at all. It prints the symbol address in the first column, calling SymbolRef::getAddress, which delegates to ELFObjectFile:: getSymbolAddress. Is the file offset an actual problem for you, or only the address?

Neither can I understand the computation done in the old code to obtain the offset:

  case ELF::STT_FUNC:
  case ELF::STT_OBJECT:
  case ELF::STT_NOTYPE:
    Result = symb->st_value +
             (Section ? Section->sh_offset - Section->sh_addr : 0);

Why is the address subtracted?

Eli