> (1) Symbol address
> According to the ELF standard, in a symbol table entry st_value means:
> "In relocatable files, st_value holds a section offset for a defined
> symbol. That is, st_value is an offset from the beginning of the
> section that st_shndx identifies." (*)
> Therefore, when queried about a symbol's address what would the right
answer be? In ELFObjectFile::getSymbolAddress, previously, it was simply
symb->st_value (which is the relative offset to the section). Now, Section-
>sh_addr is added to reflect the actual address of the symbol.
> Ignoring for the moment the change this imposes on objdump & nm (which
can be amended), what would the expected address be for clients of
I trust your interpretation and implementation of the relevant spec's, and
don't mean to suggest a mistake there. I apologize if I did so previously.
What I do know is that now ELFObjectFile doesn't seem to work on
executables, as it did before. Accordingly the tools that use ELFObjectFile
(llvm-objdump, llvm-nm) no longer accurately display symbol information on
such files (and my project, using code from these tools, doesn't either).
Since these tools used to do this "correctly", as do their non-llvm
counterparts, and because they made use of ELFObjectFile for this purpose, I
assumed that was a supported use case. It appears that's incorrect, and the
output working for executables was always a coincidence. I wish this wasn't
the case, but I understand things change and will update my project
accordingly (or move away from MC if that's not possible, I suppose). I
assume there's no somewhat-equivalent class/etc that will enable a client to
reason about non-relocatable ELF files now that ELFObjectFile doesn't
I did not mean to make a sweeping claim that ELFObjectFile doesn't support anything but relocatable files. ELF is flexible enough to allow the same class to support several types of objects, but I don't know if ELFObjectFile actually attempts this. I *assumed* it was meant to mainly support relocatable files, due to the intentions of the libObject library (linker, etc).
In any case, if the intention is to support both relocatable and executable files, then perhaps more sophistication is required. Take st_value, for example. For relocatable files, it's the offset from the section the symbol points to (in st_shndx). For executables (and .so), however, it's just the virtual address. So, for ELFObjectFile::getSymbolAddress to support both, it would probably first need to decide which kind of object it deals with (information ELF makes available in the e_type field of the header).
Looking at it this way, the old code (r148652) assumed executable (since it simply returns st_value for the address), and the new code (r148653) assumes relocatable (since it adds st_value to the section address).
At this point, I would really like to hear more from others at @llvmdev. What would the best approach be? I don't have a problem to change the code moving our new calculation of the address to DyldELFObject where we really need it for the dynamic loading in MC-JIT, but maybe something can be done to accommodate both directions (e.g. going the old way for e_type = ET_EXEC or ET_DYN and the new way for ET_REL?).
> (2) Symbol offset
> Again, referring to the definition of the "st_value" field above, the file
offset of the symbol is the section offset plus the symbol's offset in the
section, which is reflected in the new code:
> Result = symb->st_value +
> (Section ? Section->sh_offset : 0);
> The old code subtracted Section->sh_addr from that for reasons that are
not entirely clear to me.
> I'm not sure where this creates a problem for you? AFAICS, neither llvm-
objdump nor llvm-nm use the symbol's file offset. It's also not clear from
your pastes of llvm-objdump and objdump what the significant difference
The difference in the pastes, and my apologies for not explicitly pointing this
out originally, is that the symbol addresses (see
'main') now seem to double-include the section address in their value.
Notice how llvm-objdump gives address of 00800850 for main while
objdump shows 004004a0. Note that before your changes llvm-objdump's
output was aligned with that of normal objdump in this regard.
Now I see it, thanks. However, I still don't see where llvm-objdump uses the file offset at all. It prints the symbol address in the first column, calling SymbolRef::getAddress, which delegates to ELFObjectFile:: getSymbolAddress. Is the file offset an actual problem for you, or only the address?
Neither can I understand the computation done in the old code to obtain the offset:
Result = symb->st_value +
(Section ? Section->sh_offset - Section->sh_addr : 0);
Why is the address subtracted?