Object File Library. llvm-nm changed to match binutils-nm for COFF and ELF (32, little endian).

A few days ago we were discussing object files and linking on IRC. I
had been thinking about working on this for a while, and this
discussion finally got me to do it.

Attached are patches of a preliminary implementation of a generic
object file library, and a few changes to llvm-nm to make use of it.

The main goal in the design of the API is to allow a fast
implementation by avoiding memory allocation and repeated or unneeded
parsing of the object file. This is currently achieved in part by
allowing symbol incrementation to be simple a simple pointer
increment. And most object files will support random access to the
symbol and section table.

The API needs lots of work. Some of the current problems include:

* Error handling.
* Symbols only.
* Can only access one symbol table efficiently.
* Read only.
* Weird interface between SymbolRef and ObjectFile.

My current plan to support modifying and creating new object files is
to have a generic internal representation that has the same external
API as everything else. When the API client calls any function that
modifies the object file, a "changes" object is created that stores
all of the changes required when outputting the file. This changes
object will be transparent to the client, and would make the API calls
required to write an object file out in a different format simple. It
would also allow an optimal implementation if it is being written out
in the same format, as the specific object file format class knows
exactly what is already in the file and where.

An alternative to this is to fully parse every object file into an
intermediate representation on load. This would simplify the library,
but would come at a steep performance cost, and tools seldom modify an
object file compared to reading it.

I currently envision this library being used in the following ways:

* An ld and link.exe compatible linker.
* A loader.
* Add support to lli for loading dynamic libraries referenced from .bc
files when JITing.
* Executable compression, encryption.
* llvm-ar
* llvm-nm
* llvm-ranlib
* llbm-objdump
* etc...

I decided to make a new library instead of adding this to MC because I
felt that MC is not really designed for generic object file handling.
It is and should be designed for working with machine code.

And a test of the largest object file generated by clang -g while
compiling LLVM.

bigcheese@CHIBISERV /tmp> nm --version
GNU nm (GNU Binutils for Ubuntu) 2.20.1-system.20100303
Copyright 2009 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.

bigcheese@CHIBISERV /tmp> llvm-nm -version
Low Level Virtual Machine (http://llvm.org/):
  llvm version 2.8git
  Optimized build.
  Host: i386-pc-linux-gnu
  Host CPU: k8-sse3

  Registered Targets:
    (none)

bigcheese@CHIBISERV /tmp> time nm -a
/home/bigcheese/open-source/llvm-build/debug/lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86ISelLowering.cpp.o

big-nm.dump

real 0m0.054s
user 0m0.028s
sys 0m0.016s

bigcheese@CHIBISERV /tmp> time llvm-nm -a
/home/bigcheese/open-source/llvm-build/debug/lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86ISelLowering.cpp.o

big-llvm-nm.dump

real 0m0.024s
user 0m0.012s
sys 0m0.008s

- Michael Spencer

object-file-library-patches.zip (25.9 KB)

I currently envision this library being used in the following ways:

* An ld and link.exe compatible linker.
* A loader.
* Add support to lli for loading dynamic libraries referenced from .bc
files when JITing.
* Executable compression, encryption.
* llvm-ar
* llvm-nm
* llvm-ranlib
* llbm-objdump
* etc...

I decided to make a new library instead of adding this to MC because I
felt that MC is not really designed for generic object file handling.
It is and should be designed for working with machine code.

I don't have time to review this now, but I do think it is a very nice
idea. Having llvm implementations of the basic tools in a toolchain
would avoid the confusion that there is around the llvm-* tools being
replacements for the system ones :slight_smile:

I also agree that a library other than MC is a good thing. You
probably don't want llvm-nm linking with MC since it handles only
symbols, not machine code.

- Michael Spencer

Cheers,

I have continued to work on this and now have a working llvm-objdump
-d (object file disassembler). It currently supports both x86 and
x86-64 COFF and ELF, and it would be trivial to add support for other
architectures. Currently the output is not exactly the same as
binutils-objdump -d, and it doesn't display relocation info in the
instructions (so you end up with stuff like "call 0").

This includes all previous patches because I changed most of them.
Keeping the patches in sync isn't difficult because I use git;
however, it's getting harder to review all this code to get it
committed. I'm going to post the first 3 patches directly to
llvm-commits, but I would appreciate any help that would allow
development of this library to move to trunk, so even taking a brief
look to point out any obvious blockers would help.

- Michael Spencer

object-file-library-patches.zip (59 KB)