[RFC] Frontend driver: action vs non-action options

Hi All,

[Flang = LLVM Flang]

I wanted to dive a bit deeper into one aspect of the design of our *frontend driver*, `flang-new -fc1`. It is intended as the main tool for Flang frontend developers and we should make sure that it meets our requirements. If not, perhaps we can identify how to make it work better? More specifically, does the current split into action and non-action options makes sense?

Please note - this discussion does not involve the compiler driver, `flang-new`.

# *Action vs non-action options*
In the frontend driver, you can split options into two groups:
     * action options (e.g. `-emit-llvm`)
     * non-action options (feature/configuration flags, e.g. `-ffixed-form`)
Action options specify "what" the compiler will do, e.g. "parse the input file and lower it to LLVM IR" for `-emit-llvm`. Non-action options allow users to configure the frontend to behave in a certain way, e.g. treat all input files as fixed-form, regardless of the file extension. Another example for the latter is `-Werror` - make all warnings into errors. This tells the compiler "how" the deal with certain scenarios, but does not specify "what" to do.

AFAIK, this design is consistent with `clang -cc1`.

# *Default action*
The frontend driver will run `-fsyntax-only` by default, i.e. the following two invocations are identical:

$ flang-new -fc1 input.f95
$ flang-new -fc1 -fsyntax-only input.f95

If users want a different action to be run, they need to specify it explicitly, e.g.

$ flang-new -fc1 -fdebug-unparse input.f95

This behaviour is consistent with `clang -cc1`.

# *Selected Implementation Details*
For every action flag, there's a dedicated specialisation of the `FrontendActions` class in FrontendActions.h [1]. Right now, the only way to identify which flag is an action flag, is to look up the implementation. For example, options decorated with `Action_Group` in Options.td [2] are action flags. Alternatively, you can check the switch statements in CompilerInvocation.cpp [3] that selects the frontend action based on the input flags.

# *One action at a time*
The frontend driver will only run the rightmost frontend action specified on the command line. This is illustrated below:

# Runs -fdebug-unparse, -fdebug-pre-fir-tree is ignored
$ flang-new -fc1 -fdebug-pre-fir-tree -fdebug-unparse input.f95
# Runs -fdebug-pre-fir-tree, -fdebug-unparse is ignored
$ flang-new -fc1 -fdebug-unparse  -fdebug-pre-fir-tree input.f95

This approach means that the design is relatively simple. Allowing multiple actions per invocation would require the driver to schedule and to keep track of the requested actions. That would complicate the implementation.

`flang-new -fc1` does not warn about action flags being ignored. This is counter-intuitive, but I wasn't able to identify an easy fix for this. We've recently discussed this in Bugzilla [4].

AFAIK, this behaviour is consistent with `clang -cc1`.

# *Does this work for you*?

With the "one action at a time" rule, it is crucial to make sure that the split into action and non-action options is optimal. So far we have used a few heuristics:
* if a flag requires something unique to happen, it's an action (e.g. `-fdebug-unparse` requires a call to `Fortran::parser::Unparse` that's otherwise not required)
* if something is an action in `clang -cc1`, it should also be an action in `flang-new -fc1` (e.g. `-emit-llvm`)
* if a flag configures some state, it's not an action (e.g. `-ffixed-form`)

To make this a bit easier to visualise, I've compiled the list of options in `flang-new -fc1` in a spreadsheet [5]. I've also added actions options from `clang -cc1` for comparison. Note that out of ~820 options in `clang -cc1` (extracted from `clang -cc1 -help`), only 36 are action options.

Note: I've tried to make my spreadsheet [5] as accurate as possible, but wasn't able to generate it automatically and might have missed something.

So, does the current split make sense to you? Should any of the options be re-implemented? I may not have the time to implement the suggested changes myself, but will be available to help you and to review your patches. And I will also be adding more documentation upstream. In fact, first patch is already available [6].

Thank you for taking a look!
-Andrzej

[1] llvm-project/FrontendActions.h at main · llvm/llvm-project · GitHub
[2] llvm-project/Options.td at main · llvm/llvm-project · GitHub
[3] llvm-project/CompilerInvocation.cpp at main · llvm/llvm-project · GitHub
[4] 52095 – Front end driver does not allow multiple dump options
[5] Flang frontend driver - action vs non-action options - Google Sheets
[6] ⚙ D111573 [Flang][driver] Update docs

One way to see it is that an action controls what the output of the
invocation is, i.e. what is written to the file specified by -o (or
stdout). Specifying multiple actions would be a contradiction in what
would be written to the output file.

IMHO the compiler should error-out if multiple (different) actions are
specified. If one wants to have multiple outputs (such as the AST dump
as well as the assembly output) one needs to invoke the compiler
multiple times. An alternative would be to have options that specify
additional output file, as -MF does (i.e. -M/-MF is not an action but
an option).

In some contexts it may make sense to allow contradicting options on
the command line as a result of a build system combining multiple
sources for flags. For instance, CMAKE_BUILD_TYPE=Release will add -O3
by default, but I can override this with e.g. CMAKE_CXX_FLAGS=-O2. I
don't think this makes sense with actions as the output what the build
system expects would be completely different.

Thanks to Andrej's list, there are quite some dump options. Maybe it
makes sense to have a single dump action (llvm.org/PR52095) and an
additional option for selecting what is to be dumped to avoid multiple
invocations. Eg. `-fdump=unparse,symbols -o dump.txt`. Alternatively,
like `-M`, have options that specify where the dump output goes. These
could be added e.g. CMAKE_CXX_FLAGS to have dumps written next to the
main output while using the main build system.

As Andrej mentioned, the clang/flang option parsing system behaviour
to ignore all but the last for exclusive options. Maybe we should
extend it such that in the .td file we can specify that it is an
error/warning if mutually exclusive options are specified?

Michael

Michael, that's a very re-assuring reply, thank you! Also, these are very helpful and much appreciated suggestions.

IMHO the compiler should error-out if multiple (different) actions are
specified.

Here's my naive attempt to implement this behaviour: ⚙ D111781 [flang] Make the frontend driver error out when requesting multiple actions. This extends the ArgList API rather than the TableGen logic. It has its limitations, so I'm not sure whether that's the right approach.

I really like your idea of `-fdump=unparse,symbols -o dump.txt`. It would nicely extend the current approach, so might be the easiest path towards more flexible "dumping" options.

-Andrzej

Since we can only have one action option at a time, any option that might reasonably appear in combination with another option should not be an action option. As Michael noted, we have many dump options, and I personally use them routinely in various combinations. This, to me, argues that none of them should be action options.

Things in the compiler proceed in a linear series of operations -- preprocessing, parsing, semantic analysis, code generation. It make sense to me to have the options that control this sequence to be action options. I would think that everything else would be flag options.

I agree with Michael that conflicting options should produce error messages. I think that this should apply to options generally, not just action options.

Pete

Some dump options require extra work that otherwise is not needed, e.g. `-fdebug-measure-parse-tree` or `-fget-definition`. So I think that these should remain as actions.

I guess that `-fdebug-dump-parse-tree` could be replaced with an option flag that would make the compiler dump the parse tree in every action that generates one. But what do we do with the option for specifying the output (i.e. `-o`)? Currently, this prints to stdout:

flang-new -fc1 -emit-mlir input.f90 -o -

If `-fdebug-dump-parse-tree` is an option (rather than an action), should everything be dumped to stdout in the following case?

flang-new -fc1 -emit-mlir -fdebug-dump-parse-tree input.f90 -o -

And should `flang-new -fc1` print everything to `output.o` in this case:

flang-new -fc1 -emit-mlir -fdebug-dump-parse-tree input.f90 -o output.o

I also played with `gfortran`. I didn't have much luck with their frontend driver (`f951` [1]), so I focused on `gfortran` itself. The following generates an object file *and* dumps the parse tree to stdout:

gfortran  -fdump-parse-tree  hello.f95

I believe that that's the behaviour that you'd like to see in `flang-new`, Pete? I've noticed some other confusing behaviour with `gfortran` though.

Btw, I've also raised this issue with Clang folks: [cfe-dev] clang -cc1: confusing behaviour with multiple actions

Thank you,
Andrzej

[1] About GNU Fortran - The GNU Fortran Compiler

I'm not an expert on this, but I believe that the "-o" option just specifies the name of the file for MLIR output. So if you say "-o foo.mlir", it will write it to "foo.mlir". Furthermore, I think that there's a special notation specific to the MLIR infrastructure that uses the character "-" to denote standard out when "-" is used in conjunction with the "-o" option. Thus, "-o -" tells the compiler to output the MLIR to standard out. I believe that the default location for dumping MLIR is the base name of the source file with ".mlir" attached. So if you don't use the "-o" option at all, the MLIR output of compiling "hello.f90" when MLIR gets generated will go to "hello.mlir".

I'm pretty sure that the other dump options, for example "-fdebug-dump-parse-tree" just use standard out, and that there's no option to specify an alternate file. I personally don't see a need to have a way to specify a file other than standard out for dumping the parse tree.

So, in the current state of things, if you specify "-fdebug-dump-parse-tree -o -" for an invocation of the compiler that generates MLIR, both the parse tree dump and the MLIR output will go to standard out. As long as the two outputs doesn't get mixed, that seems OK to me.

Pete

This may break build systems that assume you can do that. E.g. user
options are just appended to the command line allowing to override any
flags that the build system has set.
When compiling clang, the command line generated by CMake contains

cc ... -DNDEBUG -UNDEBUG

Michael

You are correct.

And from your example, it looks like changing this behavior for clang might not be worth the effort.

But we're just starting out with flang. I personally would opt for more error checking in flang's option specifications. I don't know enough about how options processing is implemented to know how difficult it would be to make flang's behavior different from clang's.

Pete

I'm not an expert on this, but I believe that the "-o" option just specifies the name of the file for MLIR output. So if you say "-o foo.mlir", it will write it to "foo.mlir". Furthermore, I think that there's a special notation specific to the MLIR infrastructure that uses the character "-" to denote standard out when "-" is used in conjunction with the "-o" option. Thus, "-o -" tells the compiler to output the MLIR to standard out. I believe that the default location for dumping MLIR is the base name of the source file with ".mlir" attached. So if you don't use the "-o" option at all, the MLIR output of compiling "hello.f90" when MLIR gets generated will go to "hello.mlir".

AFAIK, this behaviour is standard across most tools in LLVM and Clang (including "-o -"), not only MLIR. We should make sure that `flang-new` and `flang-new -fc1` don't diverge from this.

I'm pretty sure that the other dump options, for example "-fdebug-dump-parse-tree" just use standard out, and that there's no option to specify an alternate file.

That's currently the case for `-fdebug-dump-parse-tree`, but not for e.g. `-fdebug-unparse`. That's for the sake of consistency with `f18`. Since that constraint is gone, we can refine this.

So, in the current state of things, if you specify "-fdebug-dump-parse-tree -o -" for an invocation of the compiler that generates MLIR, both the parse tree dump and the MLIR output will go to standard out. As long as the two outputs doesn't get mixed, that seems OK to me.

Perhaps the output from "-fdebug-dump-parse-tree" should go to stderr so that it's easier to filter it out?

All this sounds good to me, but violates the one-invocation-one-output rule. Perhaps that's fine if we consider Flang's parse-tree as strictly for debugging purposes? For comparison, Clang's AST can be used as input for `clang` and is also the primary representation for various tools to work on (e.g. static analysis tool). If we change how Flang's parse tree is generated/dumped, we might limit our options in the future. But this is very hypothetical at this point.

-Andrzej

We would like Flang's driver to be a drop-in replacement for e.g. `gfortran`. If we change this particular behaviour, we will be making the transition for our users harder.

-Andrzej

Let me just quickly summarise.

The behaviour of `flang-new -fc1` is consistent with `clang -cc1`, but slightly different to Gfortran (in particular, with respect to the `-fdump-parse-tree`). Also, the current implementation works as expected, i.e. there are no known bugs that would require immediate attention.

One could (and perhaps should?) refactor the dump options - currently there's quite a few of them. There are a few options here - `-fdump=unparse,symbols -o dump.txt` looks particularly promising.

I'm not sure whether any of Flang's dump options should be refactored as regular flags, but it seems like the most natural way to replicate GFortran's behaviour here. I won't block any efforts in this direction, but please keep in mind that we will likely need Flang's parse tree for future static analysis tools/frameworks (e.g. sanitizers?). This means that the parse tree needs to be considered as regular compiler input/output (as opposed to a plain debug dump).

Please let me know if you want to work on any of the above - I'll be available for your questions and to review your patches. As for making `flang-new -fc1` error out when multiple actions are specified, there's ⚙ D111781 [flang] Make the frontend driver error out when requesting multiple actions. I've reached out to cfe-dev to see what they think [1], but there seemed to be little appetite for discussion. Let me add some reviewers from LLVM instead. Pete has already accepted it (thank you!), but since it touches both Flang *and* LLVM, I want to make sure that developers from LLVM are also OK with this.

Thank you for all your feedback!

-Andrzej

[1] [cfe-dev] clang -cc1: confusing behaviour with multiple actions