compatibility with gnu binutils

From: James Henderson via llvm-dev <llvm-dev@lists.llvm.org>
To: Oliver Stannard <oliver.stannard@linaro.org>
Cc: LLVM Dev <llvm-dev@lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Case insensitive assembly directives for
all targets

+1 to all of what Oliver said. We aim for compatibility with GNU in most
(all?) of our other binutils, so why should the assembler be any different?

This doesn’t sound right. GNU binutils have a large quantity of legacy cruft, not least the redundancy between tools like readelf and objdump which are capable of doing the same task in exchange for different command line arguments.

Our from-scratch binutils suite has the opportunity to be much easier to use than GNU’s tooling. Where was this policy, which sounds like replicating their design mistakes bug-for-bug, agreed upon and documented?

Aside from the above query, case sensitive asm doesn’t sound like a good feature to me either.

Thanks

From: James Henderson via llvm-dev <llvm-dev@lists.llvm.org>
To: Oliver Stannard <oliver.stannard@linaro.org>
Cc: LLVM Dev <llvm-dev@lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Case insensitive assembly directives for
all targets

+1 to all of what Oliver said. We aim for compatibility with GNU in most
(all?) of our other binutils, so why should the assembler be any different?

This doesn’t sound right. GNU binutils have a large quantity of legacy cruft, not least the redundancy between tools like readelf and objdump which are capable of doing the same task in exchange for different command line arguments.

Our from-scratch binutils suite has the opportunity to be much easier to use than GNU’s tooling. Where was this policy, which sounds like replicating their design mistakes bug-for-bug, agreed upon and documented?

Many tools (readelf, objdump, nm, objcopy etc) are used in many people’s build systems, quite often in deep configure scripts that are hard to maintain or update for whatever reason. There have been multiple talks on this sort of topic at various LLVM conferences (examples include Bernhard Rosenkränzer and Jordan Rupprecht’s talks at Brussels 2019), where people have highlighted pain points. Additionally, I have been part of or run both BoFs (again Brussels 2019) and round tables on the topic. The overwhelming consensus from everyone there was that people wanted compatibility with GNU to make it easier for them to switch over to using the LLVM tools. These discussions were written up on the mailing list (see http://lists.llvm.org/pipermail/llvm-dev/2019-April/132032.html and https://lists.llvm.org/pipermail/llvm-dev/2019-April/132033.html for two recent examples). The principle is discussed on multiple reviews of changes for the tools too.

To be clear, if there is a bug in the GNU tool, we don’t try to match that. We’ve also made multiple extensions and improvements over what GNU does in some tools, some of which were also adopted in the GNU equivalent afterwards.

Note that there are some tools (llvm-readobj, llvm-symbolizer) which are not GNU compatible, and go their own way in output styles and command-line processing. These both have switches and tool aliases that allow them to be used in a GNU-like manner though too.

This doesn’t sound right. GNU binutils have a large quantity of legacy cruft, not least the redundancy between tools like readelf and objdump which are capable of doing the same task in exchange for different command line arguments.

Our from-scratch binutils suite has the opportunity to be much easier to use than GNU’s tooling. Where was this policy, which sounds like replicating their design mistakes bug-for-bug, agreed upon and documented?

Many tools (readelf, objdump, nm, objcopy etc) are used in many people’s build systems… principle is discussed on multiple reviews of changes for the tools too.

Thanks for your response. I read through the links but haven’t gone looking for diff reviews.

Yes, I see why people presently using gnu tools would want llvm tools with corresponding names to behave identically. My concern is that meeting this goal takes time from the very few binutils developers that could otherwise be spent producing new binary manipulation tooling. Programmers wanting to rewrite their binaries doesn’t necessarily imply a determination to stick with the GNU API - your example of generating json instead semi-arbitrarily delimited text is a good example.

To be clear, if there is a bug in the GNU tool, we don’t try to match that. We’ve also made multiple extensions and improvements over what GNU does in some tools, some of which were also adopted in the GNU equivalent afterwards.

Bugs vs features are a bit context dependent but I’m glad to hear dev effort is also going on improving matters. I’m not in the binutils space (as a dev or as a user) anymore so haven’t been paying much attention to it.

Note that there are some tools (llvm-readobj, llvm-symbolizer) which are not GNU compatible, and go their own way in output styles and command-line processing. These both have switches and tool aliases that allow them to be used in a GNU-like manner though too.

Taking it on faith that the llvm binutils are implemented as a relatively thin layer on top of libraries, perhaps we should ship ‘objdump’ which takes the same arguments as gnu objdump and does our best effort at matching the semantics, and also ship llvm-objdump which is under no obligation to match arguments or the precise semantics. Optionally as the same file which checks the name it was invoked as. That would avoid a proliferation of strip-all-gnu and similar.

Thanks,

Jon

Where was this policy, which sounds like replicating their design mistakes bug-for-bug, agreed upon and documented?

James responded already, but just to add my perspective: on the subject of llvm vs gnu binutils compatibility, I’ve heard everything in the range from “let’s do our own completely separate thing” to “let’s be byte-for-byte compatible”. The general consensus is closer towards the closer compatibility side, so the happy medium we’ve tried to apply is “be gnu compatible, except when it doesn’t make sense” – support for ancient platforms, bugs, weird formatting, etc. We definitely take things on a case by case basis, there’s no firm policy that we replicate all the bugs.

And to be clear, this is only for tools with the same name as GNU tools. As an example: we have llvm-readobj.cpp which prints things in an llvm way, accepts llvm flags, etc.; and we have llvm-readelf (which is just a symlink) that adds gnu-specific flag aliases, prints in a different mode, etc. We don’t change llvm-readobj format to match gnu compatibility, generally speaking.

This doesn’t sound right. GNU binutils have a large quantity of legacy cruft, not least the redundancy between tools like readelf and objdump which are capable of doing the same task in exchange for different command line arguments.

Our from-scratch binutils suite has the opportunity to be much easier to use than GNU’s tooling. Where was this policy, which sounds like replicating their design mistakes bug-for-bug, agreed upon and documented?

Many tools (readelf, objdump, nm, objcopy etc) are used in many people’s build systems… principle is discussed on multiple reviews of changes for the tools too.

Thanks for your response. I read through the links but haven’t gone looking for diff reviews.

Yes, I see why people presently using gnu tools would want llvm tools with corresponding names to behave identically. My concern is that meeting this goal takes time from the very few binutils developers that could otherwise be spent producing new binary manipulation tooling. Programmers wanting to rewrite their binaries doesn’t necessarily imply a determination to stick with the GNU API - your example of generating json instead semi-arbitrarily delimited text is a good example.

To be clear, if there is a bug in the GNU tool, we don’t try to match that. We’ve also made multiple extensions and improvements over what GNU does in some tools, some of which were also adopted in the GNU equivalent afterwards.

Bugs vs features are a bit context dependent but I’m glad to hear dev effort is also going on improving matters. I’m not in the binutils space (as a dev or as a user) anymore so haven’t been paying much attention to it.

Note that there are some tools (llvm-readobj, llvm-symbolizer) which are not GNU compatible, and go their own way in output styles and command-line processing. These both have switches and tool aliases that allow them to be used in a GNU-like manner though too.

Taking it on faith that the llvm binutils are implemented as a relatively thin layer on top of libraries,

I can confirm that they are all essentially wrappers around libObject (except objcopy/strip, since it mutates) – although not as thin as we’d like them to be, sadly. Any work towards that is greatly appreciated.

perhaps we should ship ‘objdump’ which takes the same arguments as gnu objdump and does our best effort at matching the semantics, and also ship llvm-objdump which is under no obligation to match arguments or the precise semantics. Optionally as the same file which checks the name it was invoked as. That would avoid a proliferation of strip-all-gnu and similar.

There may be some subtleties that prevent this. The tools actually inspect the filename – for instance, take a look at llvm-ar: https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-ar/llvm-ar.cpp#L1205
Based on the filename, the same tool might be dlltool, ranlib, lib, or ar. There’s no guarantee that the tool will be named exactly “ar” or “llvm-ar” in someone’s toolchain – it may be “${triple}-ar”, “llvm-ar-11”, “archiver”, or whatever someone wants to use. We’re already playing games with the filenames, I think playing further games is bound to raise too many surprises/bugs.

I feel your pain with --strip-all-gnu though. Often the failures I see there are what I’d call “EFS” (ELF filesystem) – where, as part of a build process, a step will build an object, shove something in an elf section, strip it, and be consumed later. Since llvm-strip is more aggressive (yay smaller binaries), these sections get dropped. Fortunately, the --keep-section, which originated in llvm-strip, was added to fix this and also ported to GNU strip, so you can keep the same command line irrespective of which toolchain you use.

Aside from the above query, case sensitive asm doesn’t sound like a good feature to me either.
FTR – I don’t have any opinion on this. But we do have the right to say “that’s weird, don’t do that”. Others more familiar with the situation may explain why it’s not weird. [I didn’t read the original thread]. As I said at the beginning, we should take these all on a case by case basis.

So given that we got around with this for years, how much use do
non-lower case assembler pseudops actually see?

Joerg

So given that we got around with this for years, how much use do

non-lower case assembler pseudops actually see?

The original issue was in newlib[0] and I also went through the GNU AS docs [1] and found a few targets that list directives as non lower case (ARC, MMIX, V850). However they would accept any case. “.ABORT” is called out specifically mostly as a compatibility note. Again, accepts any case.

I’m not holding this up as a super high priority issue, the bug above is the only one I know of. I just happened across it and thought it would be good to make it consistent.

[0] https://bugs.llvm.org/show_bug.cgi?id=39527
[1] https://sourceware.org/binutils/docs/as/

Aside from the above query, case sensitive asm doesn't sound like a
good feature to me either.

So given that we got around with this for years, how much use do
non-lower case assembler pseudops actually see?

Joerg

So given that we got around with this for years, how much use do
non-lower case assembler pseudops actually see?

The original issue was in newlib[0] and I also went through the GNU AS docs [1] and found a few targets that list directives as non lower case (ARC, MMIX, V850). However they would accept any case. ".ABORT" is called out specifically mostly as a compatibility note. Again, accepts any case.

I'm not holding this up as a super high priority issue, the bug above is the only one I know of. I just happened across it and thought it would be good to make it consistent.

[0] 39527 – Assembly directives handling needs to be case ignorant
[1] Top (Using as)

I agree with Joerg and Jon regarding case-insensitive assembly
directives: they can be categorized as legacy cruft. The portability fix
on the newlib side shouldn't be too difficult.