Bring features of fromelf of ARM to llvm-objcopy

Hi All,

Currently the feature of generating binary file from ELF using objcopy can’t be directly used for generating plain binary files that are ready to be flashed on ARM devices.

This discussion is about supporting the features of fromelf from ARM toolchain in llvm-objcopy.

We can start by adding the support for --bin in fromelf as --dump-segments= in llvm-objcopy.

Below is a basic test case to show behaviour of fromelf vs llvm-objcopy currently

cat > 1.c <<!
int foo() { return 0; }
int bar() { return 0; }

int main() { return foo() + bar(); }

cat > script.t <<!

.foo : {
.bar 0x2000000 : AT(0x2000000) {
.main : {
.text : {

arm-clang -c 1.c -o 1.o -ffunction-sections
arm-link 1.o -o 1.out -T script.t

llvm-objcopy -O binary 1.out b_1.out
fromelf --bin --output fromelf_b_1 1 1.out

Number of Binaries Generated:
llvm-objcopy: 1, size: ~820 MB
fromelf: 3 , size: [8,8,64]

Note: the number of binaries generated = the PT_LOAD segments in 1.out

Note: the name of binaries is as per the first section in the segment to section mapping

the asked modification of adding support via --dump-segments= aims at replicating this behaviour for llvm-objcopy such that it will also create 3 binaries with same name under

I agree that it would be nice for llvm-objcopy to be able to extract data based on the ELF program-header segment view rather than the section view. The segment view is what you’re supposed to use to determine what’s loaded into memory where, after all. But currently llvm-objcopy focuses 100% on the section view. Proof: if you remove the segment header table completely from an image via llvm-objcopy --strip-sections, and feed the output to another llvm-objcopy, it doesn’t see that there’s anything in the file at all to be copied, and will generate (for example) a trivial empty ihex output file.

I looked into this last year, and found that the most immediate problem is that yaml2obj also doesn’t understand the idea of an ELF file with only a segment view, so that would need fixing first in order to write test cases for the new mode of llvm-objcopy!

But I must admit I’m less enthusiastic about copying fromelf’s policy for naming the output files. I can tell you that that confuses actual fromelf users fairly often. I think a better (in particular, more stable) approach would be to name the output files after their base address.

As I understand it the main difference between fromelf --bin and objcopy is that fromelf by default will output one file per PT_LOAD segment (called Load Region in the fromelf documentation Documentation – Arm Developer). I think llvm-objcopy and GNU objcopy will produce one binary that is the concatenation of all the PT_LOAD segments.

I think that the user is expected to use --only-section if they want something different, but this isn’t ideal for PT_LOAD segments that contain multiple output sections. A possible alternative (or additional) option would be --only-segment although this would need to be an index as segments do not have a name.

As statham-arm suggests fromelf’s current naming convention, which is a bit of a hack as it tries to find the first section in the segment to use for the name, can be confusing.

p.s. I’m from the same team in Arm that maintains fromelf and we also contribute to llvm-objcopy. If there is agreement that fromelf features are useful we may be able to help with patches, clarifications and or reviews.

I don’t know if that’s expected behaviour for ihex output compared to other variations of objcopy (e.g. GNU objcopy), but you should see that if you feed a stripped object into llvm-objcopy without any options, you’ll get a more-or-less identical copy in the output, because data covered by program headers should be preserved as-is. So it’s not quite true that llvm-objcopy focuses 100% on the section view.

yaml2obj for ELF has the concept of “Fill” sections, that is sections that don’t have a corresponding entry in the section header table. By having a program header consisting entirely of such Fill sections, and by omitting the section header table itself (I forget the exact incantation to do that off the top of my head, but I know it’s possible), you’ll get a valid ELF without sections, but with contents in program headers. The alternative option is simply to use --strip-sections first to create an object with only program headers and their contents.

As a regular reviewer of llvm-objcopy, I’ve got nothing against additional features being added if they are shown to be useful. Dumping segments seems like a reasonable option to me, and I wouldn’t expect it to be too hard to implement.

Proof: if you remove the segment header table completely from an image via llvm-objcopy --strip-sections

(Of course, I meant “remove the section header table”. I often wish ELF had not chosen quite such similar names for two things that are important not to mix up!)

I don’t know if that’s expected behaviour for ihex output compared to other variations of objcopy (e.g. GNU objcopy)

In a quick test, GNU objcopy also doesn’t do anything useful with a sectionless ELF image, but it at least knows it, and prints an error message:

$ llvm-objcopy --strip-sections unstripped.elf stripped.elf
$ llvm-objcopy -O ihex unstripped.elf unstripped-llvm.ihex
$ llvm-objcopy -O ihex stripped.elf stripped-llvm.ihex
$ objcopy -O ihex unstripped.elf unstripped-gnu.ihex
$ objcopy -O ihex stripped.elf stripped-gnu.ihex
objcopy: error: the input file 'stripped.elf' has no sections
$ wc -l *.ihex
     2 stripped-llvm.ihex
  3002 unstripped-gnu.ihex
  3002 unstripped-llvm.ihex
  6006 total

LLVM and GNU objcopy have both output an equivalent sensible ihex file from the original ELF object with a section header table. Remove the section header table, and GNU objcopy knows it can’t handle the result, whereas LLVM objcopy doesn’t notice, and sees the file as empty.

But surely both are wrong. A sectionless, segments-only ELF image can be loaded and run successfully by an ELF loader, which should only be looking at the segments in any case. ihex output is part of one possible loading process (its purpose is to outsource ELF parsing to something more capable than the ultimate target device), so that ought to be based on the segment view too.

A more subtle problem occurs if an image does have a section header table, but the sections do not wholly cover the contents of a PT_LOAD segment. In this situation both llvm-objcopy and GNU objcopy will only output ihex for the parts of the segment covered by sections, and miss out a piece in the middle.

I think that is also wrong, and that conversion of ELF to any loading-oriented format such as ihex or srec or plain binary files should be based entirely on the segment rather than section view, perhaps unless the user has a good reason to ask for some non-default treatment. That’s what Arm fromelf will do. But making that change in llvm-objcopy would introduce a significant difference from GNU objcopy.

Another difficulty of adding segment-based operation to llvm-objcopy is that it has so many other options that are section-focused, so you’d have to diagnose all the incompatibilities when using those options in segment mode, or else develop a complicated hybrid handling for combinations like “we’re mostly in segment mode but the user said --remove-section=foo”.

I think it would be much easier to write a completely separate tool that handles conversion from ELF to binary/hex formats, whose entire job is to look only at the segment view. Then it could have a completely separate set of command-line options appropriate to that different functionality.

In fact when I ran into this problem last year (in a context where fromelf wasn’t available) I solved it by writing a fresh tool of this kind called elf2bin from scratch. We’d be happy to upstream it as a separate llvm-elf2bin if you’d like!

Our compatbility policy is that we mirror GNU objcopy’s behaviour, but only insofar as it makes sense. If there’s a valid reason to diverge, we are happy to accept differences. It could also be considered that GNU’s behaviour is broken and should be reported as a GNU bug.

Actually, llvm-objcopy’s ELF writing algorithm is segment-based these days, falling back to sections to cover data that is outside segments. This is how it manages to preserve data that falls outside sections within segments. As such, there isn’t really any need for “segment mode” or “section mode”. Options like --remove-section have either a well-defined meaning (in the case of section removal it’s “write null bytes in the space where the section was within the segment”) or are rejected where there might be a conflict.

ihex and binary output formats at least in theory already handle this combination. Individual segment dumping shouldn’t be significantly different. If you wanted to write segments as the output, you’d probably want to define a new output format, and implement it in a similar manner to those two, specifically by implementing a SeparateSegmentWriter or something to that effect that derives from the Writer class.

Thank you all for the valuable inputs

Summarizing the key take aways from discussions till date:

  • Having the feature of fromelf is beneficial
  • Reference or existing tools - fromelf, elf2bin

Sugessted implementations

  1. Add a new tool called llvm-elf2bin
  2. Add a new output format by implementing a SegmentWriter for llvm-objcopy

IMO adding a new output format to llvm-objcopy would be the way to go
Everyone please feel free to post your opinions
Based on the shared opinions the decision for the implementation can be taken

This would be very easily confused with llvm-objcopy -I elf -O binary, so if you opt to add a new tool please make sure its name is very clear in how it differs from that. But I agree that augmenting llvm-objcopy seems like the way to go.