Flang driver - next steps

Hello,

[Flang == "LLVM Flang"]

Until recently we were focusing on replacing `f18` (the old driver) with `flang-new`, the new driver. This was completed in [1]. The natural next step is to add support for code-generation and linking phases. Although we have been discussing most of this in our community calls, not everyone has been able to attend them. Here's a quick overview and some open questions.

# *Code generation*
This will almost entirely be implemented inside the frontend driver (i.e. llvm-project/flang). It will support the following stages:
   1. ParseTree --> MLIR
   2. MLIR --> LLVM IR
   3. LLVM IR --> assembly/object code
This might be better expressed in terms of compiler options. Basically, we plan to add the following options to the frontend driver:
   * -emit-mlir (ParseTree --> MLIR)
   * -emit-llvm (ParseTree --> MLIR --> LLVM IR)
   * -emit-obj (ParseTree --> MLIR LLVM IR --> object code)
   * -S (ParseTree --> MLIR --> LLVM IR --> assembly)
   * -save-temps/-save-temps=<value>
   * --target=<value>
   * -print-target-triple
   * -print-targets
Do you see any key code-generation related options missing here? We would like to keep this list short and focus on the bare minimum that will best inform the design. Note that adding these options will lead to some duplication in implementation _and_ functionality between `flang-new -fc1` and `bbc`/`tco`. We will need to make sure that these tools stay in sync and that the level of testing for all is similar.

This work has already started and a patch implementing `-emit-mlir` is currently under review [2]. Note that we are only able to work on the above in the `fir-dev` branch of f18-llvm-project [3].

# *Linking*
This phase will be managed by `clangDriver`, so it will likely require some changes there. Compared to Clang, we do have one additional intermediate representation to support (MLIR). We may want to/need to make `clangDriver` aware of it. Otherwise, do you see anything that we might be missing here? Or any specific options that may require extra work within LLVM Flang?

We have not started working on this yet. Instead, we have been just using `clang` manually to drive the linking.

# *Implementation details*
Quite a few of us have been using glue scripts to achieve end-to-end workflow (i.e. Fortran source --> executable). We want to focus on supporting this workflow in `flang-new` as soon as we can. We will try to achieve this without major design changes and that's why we want to focus on the bare minimum of options required here.

However, we probably want to introduce `ParseTreeConsumer` and `FIRConsumer` (or something similar) at some point. These abstraction layers would be similar to `ASTConsumer` in Clang and would allow a cleaner separation between the driver and various consumers of the intermediate representations in LLVM Flang. This could be a good time to introduce them, but it's an "implementation detail" that we can refine/revisit later.

# *Bash script*
Once `flang-new` is capable of generating executables, shall we rename it as `flang` and then rename the `flang` [4] bash script as e.g. `flang-to-gfortran` (or something similar)?

Thank you for reading,
Andrzej

[1] ⚙ D105811 [flang][driver] Delete `f18` (i.e. the old Flang driver)
[2] https://github.com/flang-compiler/f18-llvm-project/pull/1008
[3] GitHub - flang-compiler/f18-llvm-project: Fork of llvm/llvm-project for f18. In sync with f18-mlir and f18.
[4] https://github.com/llvm/llvm-project/blob/main/flang/tools/f18/flang

From a user perspective, I think the proposed renaming is great. I think most new users are surprised to find that invoking flang leads to gfortran compiling the code. Renaming flang-new to flang will solve that – albeit at the cost of potentially breaking existing build workflows that rely on the current behavior, but the solution seems simple for such situations (such a user could just replace their flang invocation with gfortran if that’s what they actually want).

Damian

Hi,

Something annoying with the clang driver is the lack of
reproducibility of the optimization pipeline (`clang -O3` and `opt
-O3` gives different results).
It'd be great if flang was set up from the beginning to avoid this
kind of issues!

Otherwise your plan looks good to me :slight_smile:

Best,

Hi Damian,

From a user perspective, I think the proposed renaming is great. I think most new users are surprised to find that invoking `flang` leads to `gfortran` compiling the code.

Thank you, I also find this very confusing. However, somebody pointed out in one of our calls that users might expect `flang` the be capable of generating executables. Hence the delay in renaming (you cannot generate code with`flang-new`, yet). Personally, I am quite keen to get there sooner rather than later.

Hi Mehdi,

Something annoying with the clang driver is the lack of
reproducibility of the optimization pipeline (`clang -O3` and `opt
-O3` gives different results).
It'd be great if flang was set up from the beginning to avoid this
kind of issues!

Thanks for pointing this out! This should be relatively easy to achieve within LLVM Flang if we have a replacement for LLVM's `opt`. IIUC, that's what `tco` [1] is for. We just need to make sure that `flang-new` and `tco` share pass-pipeline definitions. Are there any other reasons for `clang -O3` and `opt -O3` to give different results?

-Andrzej

[1] https://github.com/flang-compiler/f18-llvm-project/blob/fir-dev/flang/tools/tco/tco.cpp

Thanks for working on this, Andrzej.

Eventually, we'll need to be able to specify whether to generate debug information, and we'll need an option for that.

Also, I don't understand why we have two drivers, one for the compiler and another for the front end which is invoked by the "-fc1" option. Are we planning to simplify this in the future?

Pete

Hi Pete,

Thank you for your feedback!

Eventually, we'll need to be able to specify whether to generate debug information, and we'll need an option for that.

Perhaps we could introduce `-g` (and other debug options) once we start discussing optimisation pipelines and `-O{0|1|2|3|s|z}` flags? That would come once basic code-generation is available.

Also, I don't understand why we have two drivers, one for the compiler and another for the front end which is invoked by the "-fc1" option. Are we planning to simplify this in the future

Tl;Dr The plan is to keep the two drivers separate.

In Clang's "toolchain driver" model, frontend driver is just yet another
specialised tool next to a linker, assembler or a sanitizer. Hence the
separation. We probably could try to implement it as one monolith (i.e.
together with the compiler driver), but that would be against the grain of `clangDriver`.

W decided to re-use Clang's painstakingly-constructed code for a compiler driver so that in LLVM Flang we don't have to re-write the logic to e.g. find system libraries or to link programs on various targets/platforms. This will also simplify the path towards a compiler that can work with mixed C/Fortran and C++/Fortran applications. Currently we are not benefiting too much from re-using `clangDriver` - first we need to teach the new driver how to generate code that could be linked. Once we are there, we should be able to see the benefits of this approach.

I think that we could achieve a bit cleaner design (both in LLVM Flang and Clang) by extracting `clangDriver` out of Clang and by making various drivers share less. Currently it can be tricky to identify bits that belong strictly to the compiler driver and bits that belong to the frontend driver. One could start by creating a dedicated Options.td file for `clang`, `clang -cc1`, `flang-new` and `flang-new -fc1`. That's just an idea. I'm not aware of anyone working on this.

Thank you,
-Andrzej

Thanks, Andrzej. This clears up a lot.

What's included in the front end that's controlled by the "-fc1" option? Does it include the preprocessor?

Also, I notice that the front end driver documentation specifies a limitation:

What's the correspondence between the command line options and the front end actions?

Pete

Thanks, Andrzej. This clears up a lot.

What's included in the front end that's controlled by the "-fc1" option? Does it include the preprocessor?

The front-end driver is aware of everything that the Flang front end is capable of, but not more. This includes the preprocessor (which in Flang cannot be separated from the front-end anyway). But linking requires a linker, which is a separate tool. That is not part of the front-end.

Conversely, the compiler driver will know how to generate object files (by calling the frontend driver) and how to link them (and what libraries to use) by calling a linker. But it should only have a limited view of the front-end internals. It's the interface for the end-users.

Currently, the compiler driver delegates everything to the frontend driver. That's because we have to stop after the semantic checks, i.e. we are not able to generate the output required to move to the linking phase (or to consider calling an external assembler). This probably makes things a bit confusing, but should start making sense once we move to linking.

Going back to your question on the preprocessor - was it inspired by the output from `-ccc-print-phases` (mentioned in the Flang driver documentation)? Personally, I find it very enlightening, but it can be a bit confusing too. In particular, `preprocessing` is listed as a separate phase. However, in Flang we cannot control it, i.e. it is always "on". The `-ccc-print-phases` option displays the `clangDriver` view. In theory, `clangDriver` allows one to use a dedicated tool for every of the compilation phases, e.g. preprocessing, object code generation, assembling, linking. Hence these are listed as separate phases.

Also, I notice that the front end driver documentation specifies a limitation:

Which limitation are you referring to?

What's the correspondence between the command line options and the front end actions?

>

Options with `Group<Action_Group>`, e.g. [1] for `-fdebug-unparse`, correspond to some frontend action and in all cases have a dedicated instance of `FrontendAction` (e.g. [2] for `-fdebug-unparse`). The mapping between options and actions happens in `ParseFrontendArgs` (in CompilerInvocation.cpp). Specialisation of `ExecuteAction` (see [4] for `DebugUnparseAction::ExecuteAction()` implement steps that are unique to the corresponding action. Any shared logic is implemented in other methods/hooks that are shared across actions (e.g. `PrescanAndSemaAction::BeginSourceFileAction()`). Note that all this is happening inside the frontend driver. Somewhat confusingly, there are other types of "actions" inside the compiler driver. I try to be specific and usually refer to `FrontendAction` (i.e. the class inside the front end driver).

I should probably write this down in the driver doc. Thanks for asking!

-Andrzej

[1] https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Driver/Options.td#L4607-L4609
[2] https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Frontend/FrontendActions.h#L98-L100
[3] https://github.com/llvm/llvm-project/blob/main/flang/lib/Frontend/CompilerInvocation.cpp#L129-L131
[4] https://github.com/llvm/llvm-project/blob/main/flang/lib/Frontend/FrontendActions.cpp#L132-L149

Going back to your question on the preprocessor - was it inspired by the output from `-ccc-print-phases` (mentioned in the Flang driver documentation)?

I don't the answer to this. It all happened before I started working on this project.

Pete