Distinguish between ARM and Thumb


Nowadays I am using LLVM to do ARM binary analysis. I was wondering is llvm available to provide some debugging information on the mode of ARM.

For example, llvm-dwarfdump could dump some instructions information for debugging. Is it able to know the mode for each instruction? Or we may write some llvm pass to help us to know the instruction mode? Any suggestions are welcomed. Many Thanks


Hello Muhui,

If you are disassembling a non-stripped ELF binary you can find out
the Arm/Thumb state by looking at the mapping symbols $t and $a,
alternatively each ELF symbol of type STT_FUNC will have bit 0 set to
0 for Arm state and bit 1 for Thumb state. Hence with the symbol table
you can reconstruct the state at each address by finding a symbol.
More information is available in ELF for the Arm Architecture [1].

If you have got a stripped binary without any symbolic information
then life gets a lot more difficult. There are some encoding rules [2]
that can help you find out whether a Thumb instruction is 2 or 4 bytes
long but in general you'll at least need to know whether you are
starting on an Arm or Thumb instruction and will need to trace control
flow instructions to track state changes and to avoid interpreting
literal data as instructions.

For the former I don't think you need to do much beyond reading the
symbol table. I don't think LLVM does passes to reconstruct binaries,
that logic would usually lie in a tool like objdump.

Hope this helps


[1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf
(search for mapping symbols)
[2] https://developer.arm.com/products/architecture/a-profile/docs/ddi0406/latest/arm-architecture-reference-manual-armv7-a-and-armv7-r-edition
(search for Thumb instruction encoding)

Hi Peter

Thank you so much for your detail and quick reply.

I think I have already known how to do it on non-stripped binary.