llvm-objdump related patch

Hi,
     I am new to llvm, not familiar with c++, after some use with llvm-objdump, and finding the broken output, I try to debug and fix the code so it can become usable. Please help review the patch, so that they can be merged.
     And there's still two major problem I have found about arm disassembler:
1. arm instruction decoder cannot recognise bx series instructions.
2. As gcc will generate thumb and arm instruction mixed binary, we have to switch from each other.
we can tell if a function is thumb or arm code by looking at the symbol table entry, when in thumb code, the lowest bit of the symbol value will be set to '1'.
so how these logic can be implemented while still adapt to the structure of the code?

Songmao

0001-Implement-sectionContainsSymbol-preparing-for-the-ob.patch (1.15 KB)

0002-Fix-objdump-various-problem.patch (6 KB)

For the first patch. The code is only valid for executable files, not
relocatable files. st_shndx should be used to determine if the symbol
is in the given section. Also st_value can hold the offset into
st_shndx, not the actual address. Also it doesn't handle non-function
symbols.

For the second patch. Could you explain what exactly you are trying to
fix? I see some stuff that I know is wrong, but it would help if I
knew the intent. As for what I do know.
* The error function already prints out the error. If you want to
print additional info, add an overload of error that allows that.
* Please use spaces instead of tabs. Lots of the code doesn't line up
properly for me.
* Setting the size to 4 to skip bytes is arbitrary, and won't always
give decent results on different platforms.

Thank you for working on this.

As for your comments on the arm disassembler.

1) I am not familiar with ARM, but I do know the decoder is currently
being worked on.

2) We were just discussing this in IRC. The idea is to simply handle
ARM disassembly as a special case and inspect the bit to decide how to
disassemble the symbol.

- Michael Spencer

Neo,

I have just found out that arm disassembler use tblgen -gen-disassembler not -gen-arm-decoder, so I have looked at the wrong code, but can anyone explain what the arm-decoder is for?
The llvm-objdump failed on bx lr insn(0xe12fff1e),because the condition(Bits & ARM::HasV4TOps) has failed, the Bits is 0, so it failed, but I haven't found out why.

Songmao

Michael,
     I have rework the patch according to your suggestion. And I have read binutil/objdump source code and found that it has a logic that if there's no symtab, it will use dynsym, which is missing in llvm-objdump.

Songmao

0002-Fix-the-address-calculation-for-llvm-objdump.patch (2.96 KB)

0001-Implement-sectionContainsSymbol-preparing-for-the-ob.patch (1.29 KB)

Owen,

Add -triple="armv7-unknown-unknown" can fix the problem.

Songmao

@@ -747,12 +747,28 @@ error_code ELFObjectFile<target_endianness, is64Bits>
template<support::endianness target_endianness, bool is64Bits>
error_code ELFObjectFile<target_endianness, is64Bits>
                           ::sectionContainsSymbol(DataRefImpl Sec,
                                                   DataRefImpl Symb,
                                                   bool &Result) const {
- // FIXME: Unimplemented.

Sorry for the late reply, please help question below

@@ -747,12 +747,28 @@ error_code ELFObjectFile<target_endianness, is64Bits>
  template<support::endianness target_endianness, bool is64Bits>
  error_code ELFObjectFile<target_endianness, is64Bits>
                            ::sectionContainsSymbol(DataRefImpl Sec,
                                                    DataRefImpl Symb,
                                                    bool&Result) const {
- // FIXME: Unimplemented.
+
    Result = false;
+ const Elf_Sym *sym = getSymbol(Symb);
+ const Elf_Shdr *sec = reinterpret_cast<const Elf_Shdr *>(Sec.p);
+
+ if (getSection(sym->st_shndx) != sec) {

This version of getSection will not handle extended section indices
properly. Use the version that takes a const Elf_Sym*. This also fails
when there is no section table, which is not required for executables
or shared libraries.

how can I make object file that has extended section indices, and that has no section table,
so that I can verify the code

+ Result = false;
+ return object_error::success;
+ }
+
+ uint64_t Address;
+ getSymbolOffset(Symb, Address);
+
+ if (sym->getType() == ELF::STT_FUNC

Again, this will only work for functions. Symbols without this type
can also be in a section.

how to make some test cases?

+ && Address>= sec->sh_addr
+ && Address< (sec->sh_addr + sec->sh_size))
+ Result = true;
+
    return object_error::success;
  }

  template<support::endianness target_endianness, bool is64Bits>
  relocation_iterator ELFObjectFile<target_endianness, is64Bits>

@@ -232,11 +235,11 @@ static void DisassembleObject(const ObjectFile *Obj) {
      if (error(i->getSize(SectSize))) break;

      // Disassemble symbol by symbol.
      for (unsigned si = 0, se = Symbols.size(); si != se; ++si) {
        uint64_t Start = Symbols[si].first;
- uint64_t End = si == se-1 ? SectSize : Symbols[si + 1].first - 1;
+ uint64_t End = si == se-1 ? SectionVMA + SectSize : Symbols[si
+ 1].first - 1;

This goes over 80 col.

        outs()<< '\n'<< Symbols[si].second<< ":\n";

  #ifndef NDEBUG
          raw_ostream&DebugOut = DebugFlag ? dbgs() : nulls();
  #else
@@ -244,22 +247,22 @@ static void DisassembleObject(const ObjectFile *Obj) {
  #endif

        for (Index = Start; Index< End; Index += Size) {
          MCInst Inst;

- if (DisAsm->getInstruction(Inst, Size, memoryObject, Index,
+ outs()<< format("%8x:\t", Index);
+ if (DisAsm->getInstruction(Inst, Size, memoryObject, Index -
SectionVMA,
                                     DebugOut, nulls())) {
- uint64_t addr;
- if (error(i->getAddress(addr))) break;
- outs()<< format("%8x:\t", addr + Index);
- DumpBytes(StringRef(Bytes.data() + Index, Size));
+ DumpBytes(StringRef(Bytes.data() + Index - SectionVMA, Size));
            IP->printInst(&Inst, outs(), "");
            outs()<< "\n";
          } else {
            errs()<< ToolName<< ": warning: invalid instruction encoding\n";
            if (Size == 0)
- Size = 1; // skip illegible bytes
+ Size = 4; // skip illegible bytes

This isn't right for most instruction sets. This value needs to be
computed based on the current instruction set being disassembled.

we can add a virtual method in the target disassembler class, so that the decision is made by the target code.

+ DumpBytes(StringRef(Bytes.data() + Index - SectionVMA, Size));
+ outs()<< "\n";
          }
        }
      }
    }
  }

Once you fix these, I think it will be good to commit. Thanks for
working on this!

- Michael Spencer

Thank you for your help.

Songmao