Accept --long-option but not -long-option for llvm binary utilities

Many llvm utilities use cl::ParseCommandLineOptions() (include/Support/CommandLine.h) to parse command line options. The cl library accepts both -long-option and --long-option forms, with the single dash form (-long-option) being more popular.

We also have many binary utilities (llvm-objcopy llvm-objdump llvm-readobj llvm-size …) whose names reflect what they imitate. For compatibility with GNU binutils (and some Unix utilities transitively), these utilities accept many short options. People who use llvm utilities as replacement of GNU binutils may use the grouped option syntax (POSIX Utility Conventions), e.g. -Sx => -S -x, -Wd => -W -d, -sj.text => -s -j.text

The problem is, grouped short options don’t play well with -long-option. Sometimes there can be ambiguity. The issue is more prominent if the short option accepts an argument.

An approach to prevent the ambiguity is to just disallow -long-option.
In D60439, I plan to make llvm-objcopy accept --long-option but not -long-option.
It will make its command line option parsing behave more like GNU objcopy and less like a regular llvm utility. What do people think of the divergence?

Further, can we make similar changes to other llvm binary utilities (their names give people the expectation), especially those with many short options such as llvm-objdump and llvm-readobj? llvm-readobj behaves like GNU readelf if you name it “llvm-readelf”. (I don’t suggest disallowing -long-option for utilities other than binutils)

(Note, llvm-objcopy is a new member of the family and it uses tablegen based include/llvm/Option/Opton.h, instead of cl:: as other utilities do.)

That change makes sense to me. I’d like to note that the policy should be set per-command basis, as some commands are required to accept both -- and - for long option names. lld is such command.

For binutil compatibility, and in general for any new tools, this sounds reasonable to me. But I’d worry that things like llvm-readobj have existed for a long time and people are used to flags like “-sections”, and it may be complicated to change that now. (I guess this RFC is a check to see if this is true for anyone on the mailing list).

What happens if you make this change and someone does use “-sections” – will the command line parser suggest “–sections”, or will it just fail because one of -s, -e, -c, etc. is not a valid option?

As I think I said elsewhere, I find it weird that LLVM tools accept long arguments with a single dash, and I’d be happy for the binary utilities at least to move away from this approach, if it improves compatibility/reduces gotchas etc. One of the points from the BoF on the LLVM binutils at the recent Euro LLVM developers’ meeting was that if we are going to break compatibility with previous versions of LLVM, we’re better off doing it now, rather than leaving it a long time. The longer it gets left, the more users, and therefore the more likely somebody has come to rely on it in a non-trivial-to-fix way.

If we are particularly concerned with llvm-readobj, we could (if it is practical) try handling it differently to llvm-readelf. An approach as suggested by Michael Spencer at the BoF could be to migrate away from using cl::opt and follow the same route as llvm-objcopy. That would allow us to have different option sets for the two versions of that tool, if we wanted.

It’s actually a bit weirder than you might think. The CommandLine parser will happily eat as many dashes as you care to write, e.g., ----sections is the same as -sections.

It’s actually a bit weirder than you might think. The CommandLine parser will happily eat as many dashes as you care to write, e.g., ----sections is the same as -sections.

That seems true, and I’m a bit surprised. Accepting three or more dashes is definitely a bug.

It does this in a few places:

// Eat leading dashes.
while (!ArgName.empty() && ArgName[0] == ‘-’)
ArgName = ArgName.substr(1);

Are you proposing to make this the new style across all LLVM utilities? That seems needlessly disruptive. There are plenty of scripts that call opt and llc directly with single dash long options, regardless of how much we claim that they are not public facing, and are only developer tools.

If you want to add a flag to ParseCommandLineOptions so that individual LLVM tools can opt into the new behavior gradually, I think that would be reasonable.

Are you proposing to make this the new style across all LLVM utilities?

No. Only drop --long-option for GNU binutils replacements (people sometimes call them LLVM binary utilities): llvm-objcopy (D60439), llvm-ar, llvm-size, llvm-nm, etc. llvm-objdump (not sure what to do with mach-o specific dump options), llvm-readelf (not sure what to do with llvm-readobj)

> Are you proposing to make this the new style across all LLVM utilities?

No. Only drop --long-option for GNU binutils replacements

Did you mean --long-option (double-dash), or -long-option (single dash)?

Single-dash variants look *very* unusual, i personally never seen them
used elsewhere.
It would be good to try to converge on double-dash variant.

Dropping support for double-dash variants instead seems rather disruptive,
especially since that will prevent usage of llvm tools as drop-in
replacement for GNU tools.

(people sometimes call them LLVM binary utilities): llvm-objcopy (D60439), llvm-ar, llvm-size, llvm-nm, etc. llvm-objdump (not sure what to do with mach-o specific dump options), llvm-readelf (not sure what to do with llvm-readobj)

Are you proposing to make this the new style across all LLVM utilities? That seems needlessly disruptive. There are plenty of scripts that call `opt` and `llc` directly with single dash long options, regardless of how much we claim that they are not public facing, and are only developer tools.

If you want to add a flag to ParseCommandLineOptions so that individual LLVM tools can opt into the new behavior gradually, I think that would be reasonable.

Many llvm utilities use cl::ParseCommandLineOptions() (include/Support/CommandLine.h) to parse command line options. The cl library accepts both -long-option and --long-option forms, with the single dash form (-long-option) being more popular.

We also have many binary utilities (llvm-objcopy llvm-objdump llvm-readobj llvm-size ...) whose names reflect what they imitate. For compatibility with GNU binutils (and some Unix utilities transitively), these utilities accept many short options. People who use llvm utilities as replacement of GNU binutils may use the grouped option syntax (POSIX Utility Conventions), e.g. -Sx => -S -x, -Wd => -W -d, -sj.text => -s -j.text

The problem is, grouped short options don't play well with -long-option. Sometimes there can be ambiguity. The issue is more prominent if the short option accepts an argument.

An approach to prevent the ambiguity is to just disallow -long-option.
In D60439, I plan to make llvm-objcopy accept --long-option but not -long-option.
It will make its command line option parsing behave more like GNU objcopy and less like a regular llvm utility. What do people think of the divergence?

Further, can we make similar changes to other llvm binary utilities (their names give people the expectation), especially those with many short options such as llvm-objdump and llvm-readobj? llvm-readobj behaves like GNU readelf if you name it "llvm-readelf". (I don't suggest disallowing -long-option for utilities other than binutils)

(Note, llvm-objcopy is a new member of the family and it uses tablegen based include/llvm/Option/Opton.h, instead of cl:: as other utilities do.)

--
宋方睿
_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
llvm-dev Info Page

--
宋方睿
_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
llvm-dev Info Page

Roman

Did you mean --long-option (double-dash), or -long-option (single dash)?

Sorry I made a typo :frowning:

In these GNU binutils replacements (llvm-objcopy llvm-readelf llvm-objdump …) we have many short options that can be grouped (cl::Grouping). I meant to keep --long-option but drop -long-option. The scope of this RFC is restricted to these utilities.

In other (regular) utilities (opt llc lli llvm-as …) we don’t use grouped syntax and there is no confusion at all. People (me included) who are used to double dash options may find -long-option weird. However, changing those utilities would be too disruptive.

Fāng-ruì,

This change has been made for llvm-objdump but not for other binutils, is that right? Is there a plan to make all of them switch by a certain point? It would be great for us to avoid this change spanning a release, for example.

-Brian