Howdy! I have a question coming from the Clang side of the house – is there a reason flang doesn’t share the preprocessor with Clang? I noticed llvm-project/preprocessor.cpp at main · llvm/llvm-project · GitHub and was a bit surprised to see it existed. Nothing is on fire, but I’m wondering how much flang folks worry about Clang folks working on preprocessor extensions or other such things (does flang intend to use Clang’s preprocessor eventually)?
AFAIU there are no plans to share the preprocessor with Clang. This was covered in a previous discussion. RFC: refactoring clangDriver - diagnostics classes - #9 by Timothy_Keith
Thank you, that’s good to know! I think that means it’s safe for folks to extend the Clang preprocessor with features without worrying about impact on Fortran.
Out of curiosity, should flang support newer preprocessor features that have been standardized in C and C++? Specifically, I’m wondering about #elifdef
and #elifndef
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2645.pdf) but this will also matter for things like #embed
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm).
I would leave it for the experts to answer this.
I think the preprocessor in Fortran is not standardised and it mostly follows the C/C++ preprocessor while ensuring it works with Fortran tokens and with some additions. I would guess that these would not be supported unless people start using it in some Fortran applications.
I can comment more on this topic if the question has not yet been resolved.
I think I’m all set, but thank you! I didn’t know that Fortan’s preprocessor wasn’t standardized (I was largely wondering if it was based on the C preproc in terms of standards).
Well, this got long, but it was mostly finished when I saw you didn’t want it, so I’ll reply anyway in case this material is of interest to others.
The Fortran standard committee defined an optional “CoCo” conditional compilation feature back in the 90’s that no production compiler implemented. It was withdrawn later. Every Fortran compiler today makes something available to programs that looks a lot like C’s preprocessing capabilities, and sometimes is implemented by calling an external K&R or tokenizing C’89-style cpp program, and so results vary a lot, making it hard to write truly portable (though non-standard) code with conditional compilation or macro replacement, especially for “fixed-form” Fortran where spaces are optional. (See this Flang document for the details.)
Flang’s Fortran parser’s design takes a novel approach to the weirdness of Fortran source code – two source forms with their own line continuation methods between which one can switch on a line-by-line basis with non-standard but near-universal directives, insignificance of spaces, no reserved words, case insensitivity, legacy horrors like Hollerith, widespread syntactic ambiguity resolved from semantics – that depends on an unusual phase structure. The first phase, the “prescanner”, handles the job of reading source files, implementing each source form’s comment conventions and line continuation, normalizing the results, and building one big contiguous character string in memory that is the “cooked source stream”. That becomes the input to the backtracking recursive descent parser, which is its own story, and relieves the parser from having to care about Fortran source oddities.
Flang doesn’t have a preprocessor per se, nor does it call an external C preprocessor. Preprocessing is just something that the prescanner phase does along the way while generating the “cooked character stream”. This way, macro replacement of function-like macro calls works like function calls as well as they can even when the macro arguments span multiple lines using both of Fortran’s line continuation methods, ignoring fixed-form card image columns after column 72 and other Fortran commenting methods along the way. The interaction between the prescanner and the code that implements various preprocessing actions is somewhat complicated – each can call the other – and it’s best to view the preprocessing code as a nested module.
Flang’s the only Fortran compiler I know of that gracefully handles the problem of having both a C-like #include
directive as well as Fortran’s own standard INCLUDE
line in the language. Results are strange to users when #include
is processed first and then the INCLUDE
lines are expanded later without further preprocessing.
There’s a complicated mechanism in Flang’s frontend that maps each byte in the “cooked character stream” back to the means by which it got there, allowing the error message facility to give long source location messages that explain that an error was found in a macro expansion from an include file from a source file, for example, showing all the intermediate steps. This provenance facility is used when Flang is asked to produce -E
output – we have to reconstruct what might have come from a stand-alone Fortran-aware C preprocessor.
The Fortran standard revision expected in 2028 or 2029 may have a standardized C-like preprocessing capability. We hope that it’s informed by our experience but goodness only knows what they’ll come up with.
TLDR: To get the best results, Flang implements preprocessing as part of a larger initial phase that sets up the character stream for the parser, not as a distinct phase or external tool. Adding support for new preprocessing directives from the C world could be easily done by extending the code that deals with #if
et al. in those preprocessing functions, if we are ever asked to do so.
Thank you for the detailed explanation, that’s very helpful!
Standardizing the preprocessor was one of the most-requested features for Fortran 202y (2028 or 2029, as @klausler mentioned). The plan is to standardize sensible Fortran-aware behavior, much as he outlined that Flang performs. The last formal paper I can find that has any detail is the preprocessor section of paper 23-124.pdf, quoted below:
fpp Update: Feb 2023 – JoR has decided to proceed with a “somewhat more Fortran Friendly” approach. More or less adopting what existing existing fpp’s do today, when the most Fortran friendly flags are used. This includes:
- Better handling of fixed form. Token expansion will not cause expanded lines to treat text beyond col 72 as commentary.
- Fortran tokens will be the recognized tokens for token replacement, not C language tokens. This makes the definition of the preprocessor easier. Case will be ignored when identifying tokens. This also implies insignificant blanks are ignored when scanning for tokens (fixed form).
- We believe this enhanced functionality will not adversely affect many users who are using a less Fortran friendly pre-processor now, and will provide a much more portable preprocessor that will be widely adopted by the user community. Comments are welcome. Send them to Lorri M.
- Gary K has gathered about 34 [gak: now 45] million lines of Fortran, with 400,000+ [gak: now 500,000] cpp-like preprocessor directives and is analyzing these codes.
Areas where we may go beyond existing practice to make the feature more Fortran-friendly include:
-
Case-insensitivity, as Fortran is case-insensitive.
#define foo 1
followed by the use ofFOO
seems like it should replaceFOO
with1
, but reasonable people disagree. (I hope looking at existing code bases will tell us if it matters.) -
Token-based replacement, not string-based. Fortran fixed-form causes the need for many more continuations than you see in C, which often breaks tokens across line boundaries. It would be sensible for the Fortran preprocessor to “see” identifier names that are broken across line continuations. Most C-like preprocessors don’t.
-
Some form of Fortran expression syntax along with C-style expressions. (This is complicated sightly by the different operator precedences between C and Fortran for operators that look identical.)
So far, though, I haven’t done anything but a cursory analysis of the existing code bases to understand the varieties of preprocessor experience.