Need for new FORTRAN front-end for LLVM ?

Hello All,

I believe that Clang is now in par with other well known industry standard compiler front-ends with compliant to standards, and also far better than other compiler front-ends in few aspects. On the other hand, many industry compiler vendors are started using LLVM internally in their compilers and tools chain for one or the other purpose. Also, many high performance server compiler vendors started looking into LLVM compiler due to various reasons. As all of us know, FORTRAN was and is the language of choice for developing high permanence (scientific) applications, for various known reasons.

In this line, do you guys think, it is acceptable, if someone seriously start supporting new FORTRAN front-end for LLVM? I understand that Clang, being an umbrella for implementing C family of languages for LLVM, may not cover well, the non C family languages like FORTRAN.

However, I am curious know about the thoughts (positive or negative) of Clang community regarding this topic (need for new FORTRAN front-end for LLVM). If you guys agree that there is a need for it, what is the best way to go about implementing it?, I mean, do we need to completely separate it from Clang project? Or is it possible and is it good to leverage some of the Clang implementation here? In any case, in your experience, how much effort and time duration is required to come-up with a well defined and designed basic infrastructure implementation so that people who are interested can start contributing to it?

I welcome your suggestions/evaluations, be it positive or negative, regarding this topic.

A high-quality LLVM-backed FORTRAN frontend would be great. I do not think it'd be a good idea in the long run to implement it by translating FORTRAN into Clang's C ASTs, and there are probably relatively few things that you could re-use from Clang's source code. I recommend using Clang primarily as a design model, rather than as a shortcut to a working implementation.

If you're seriously interested in this, there are ways that we could suggest to *improve* on Clang as an implementation model. In particular, I would recommend introducing a third IR, so that the translation goes like so:
  source code -> AST -> high-level IR -> LLVM IR -> machine code
In high-performance scientific applications, it is going to be extremely valuable to be able to apply high-level optimizations to (e.g.) combine operations and eliminate unnecessary copies; those can be straightforward to do in a high-level representation, but quite difficult to do after lowering to LLVM IR.

John.

> I believe that Clang is now in par with other well known industry
> standard compiler front-ends with compliant to standards, and also
> far better than other compiler front-ends in few aspects. On the
> other hand, many industry compiler vendors are started using LLVM
> internally in their compilers and tools chain for one or the other
> purpose. Also, many high performance server compiler vendors
> started looking into LLVM compiler due to various reasons. As all
> of us know, FORTRAN was and is the language of choice for
> developing high permanence (scientific) applications, for various
> known reasons.
>
> In this line, do you guys think, it is acceptable, if someone
> seriously start supporting new FORTRAN front-end for LLVM? I
> understand that Clang, being an umbrella for implementing C family
> of languages for LLVM, may not cover well, the non C family
> languages like FORTRAN.
>
> However, I am curious know about the thoughts (positive or
> negative) of Clang community regarding this topic (need for new
> FORTRAN front-end for LLVM). If you guys agree that there is a need
> for it, what is the best way to go about implementing it?, I mean,
> do we need to completely separate it from Clang project? Or is it
> possible and is it good to leverage some of the Clang
> implementation here? In any case, in your experience, how much
> effort and time duration is required to come-up with a well defined
> and designed basic infrastructure implementation so that people who
> are interested can start contributing to it?

A high-quality LLVM-backed FORTRAN frontend would be great.

I strongly agree (and I'd like to work on this as well). I started a
discussion about this a few months ago, see:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120430/057199.html

I do not
think it'd be a good idea in the long run to implement it by
translating FORTRAN into Clang's C ASTs, and there are probably
relatively few things that you could re-use from Clang's source
code. I recommend using Clang primarily as a design model, rather
than as a shortcut to a working implementation.

I think that there are a few things that can be shared, the two largest
pieces are probably:

- The driver (both C and family and Fortran share many of the same
   requirements for finding basic system libraries and tools and
   running them)

- The C preprocessor (any production Fortran compiler needs a C
   preprocessor (with slightly-modified tokenization rules). With some
   refactoring, this should also be shared.

If you're seriously interested in this, there are ways that we could
suggest to *improve* on Clang as an implementation model. In
particular, I would recommend introducing a third IR, so that the
translation goes like so: source code -> AST -> high-level IR -> LLVM
IR -> machine code

Do you view this high-level IR as being C-ish in scope (with better
aliasing rules)? For example, do you think that array slices will be
expanded at this high-level IR, or would that wait until CodeGen?

-Hal

* I'm willing to *pay* someone to work on the parsing and sema aspects of this. We (PathScale) however require support for F90, 95, 2003 and 2008 though. So while "modern" Fortran may have a lot of the same OO design principles there's still some legacy involved with the older standards. (Anyone thinking they don't have to support F95+ is kidding themselves - there's just too much legacy code out there if you want it to actually be used)

* Have you looked or considered contributing to Bill's work?
https://github.com/isanbard/flang/

* I fully agree with John that a higher level IR is necessary, but I view AST->IR generation to be a separate project in itself. Thinking of the high-level IR as C-ish is fundamentally flawed. Fortran treats arrays as 1st class citizens and trying to code Fortran C-style is just wrong. (imho)

For anyone interested in working on sema/parsing - I have private testsuites which extensively cover the various Fortran standards. ping me offlist to coordinate.

Best,

./C

>
>> A high-quality LLVM-backed FORTRAN frontend would be great.
> I strongly agree (and I'd like to work on this as well). I started a
> discussion about this a few months ago, see:
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120430/057199.html
>
>> I do not
>> think it'd be a good idea in the long run to implement it by
>> translating FORTRAN into Clang's C ASTs, and there are probably
>> relatively few things that you could re-use from Clang's source
>> code. I recommend using Clang primarily as a design model, rather
>> than as a shortcut to a working implementation.
> I think that there are a few things that can be shared, the two
> largest pieces are probably:
>
> - The driver (both C and family and Fortran share many of the same
> requirements for finding basic system libraries and tools and
> running them)
>
> - The C preprocessor (any production Fortran compiler needs a C
> preprocessor (with slightly-modified tokenization rules). With
> some refactoring, this should also be shared.
>
>> If you're seriously interested in this, there are ways that we
>> could suggest to *improve* on Clang as an implementation model. In
>> particular, I would recommend introducing a third IR, so that the
>> translation goes like so: source code -> AST -> high-level IR
>> -> LLVM IR -> machine code
> Do you view this high-level IR as being C-ish in scope (with better
> aliasing rules)? For example, do you think that array slices will be
> expanded at this high-level IR, or would that wait until CodeGen?
* I'm willing to *pay* someone to work on the parsing and sema
aspects of this. We (PathScale) however require support for F90, 95,
2003 and 2008 though. So while "modern" Fortran may have a lot of
the same OO design principles there's still some legacy involved with
the older standards. (Anyone thinking they don't have to support
F95+ is kidding themselves - there's just too much legacy code out
there if you want it to actually be used)

There is a bunch of Fortran 77 out there too :wink:

* Have you looked or considered contributing to Bill's work?
https://github.com/isanbard/flang/

I think that this is a good start, but the highest-productivity
solution will require a greater reuse of clang components (driver,
preprocessor, etc.).

* I fully agree with John that a higher level IR is necessary, but I
view AST->IR generation to be a separate project in itself. Thinking
of the high-level IR as C-ish is fundamentally flawed.

I wouldn't go that far :wink: -- at some point the array slicing semantics
need to be lowered into explicit allocations, loops, etc. and so my
question is, specifically, where does that happen. Regardless, I
certainly agree that we'll want higher-level transformations prior to
that lowering.

-Hal

Dear All,

Thanks for your positive replays. I really value all your inputs so far. Any product engineering in general can be successful only through collective effort from right minded and skilled people and co-operation among them. And, that is why the projects under LLVM compiler infrastructure are very successful till date.

I have not taken in-depth look into https://github.com/isanbard/flang/. Will plan to spend some time going through it.

More ever, I belong to a semi-conductor company, where we are currently in need of a latest standard compliant, LLVM targeted FORTRAN compiler, with backward compatibility to older standards too. We are currently debating all possible options as it usually happens in all companies within the framework of their business models.

So, please give me some time. I will get back to you on this topic. If it works out well, we all of us put our collective efforts together and make this project a reality and serves the need of different people like industry people, academicians, researchers, etc.

I do not
think it'd be a good idea in the long run to implement it by
translating FORTRAN into Clang's C ASTs, and there are probably
relatively few things that you could re-use from Clang's source
code. I recommend using Clang primarily as a design model, rather
than as a shortcut to a working implementation.

I think that there are a few things that can be shared, the two largest
pieces are probably:

- The driver (both C and family and Fortran share many of the same
  requirements for finding basic system libraries and tools and
  running them)

- The C preprocessor (any production Fortran compiler needs a C
  preprocessor (with slightly-modified tokenization rules). With some
  refactoring, this should also be shared.

Okay, interesting.

If you're seriously interested in this, there are ways that we could
suggest to *improve* on Clang as an implementation model. In
particular, I would recommend introducing a third IR, so that the
translation goes like so: source code -> AST -> high-level IR -> LLVM
IR -> machine code

Do you view this high-level IR as being C-ish in scope (with better
aliasing rules)? For example, do you think that array slices will be
expanded at this high-level IR, or would that wait until CodeGen?

I think it would be FORTRAN-ish in scope. :slight_smile: I think you would
specifically try to avoid lowering out any potentially-useful language
abstractions. You'll probably find yourself wanting to reimplement a
few LLVM optimizations on top of it, like mem2reg and possibly GVN.

John.