LLVM + FORTRAN 95

Hi all,

I want to compile FORTRAN 95 code with LLVM. More specifically, I
would like to get an AST dump of the program I compile, statically
analyze the AST, make modifcations to the AST and then feed it back to
LLVM. Do you have any hints as to how I should proceed about doing
this. I noticed that clang has an ast-dump option but don't know
whether it supports FORTRAN 95.

I am a LLVM newbie and I am trying to install it on Linux. I followed
the setup instructions carefully but I am still facing the following
error when trying to compile a simple hello.c:
llvm-gcc: error trying to exec 'cc1': execvp: No such file or directory

Can anybody help me here? Also I am not sure if I should post these
kind of questions to a different mailing list.

Thanks,
Nilesh.

Your best bet is to use llvm-gfortran. I don't know what you mean by
"AST." Do you really want an AST or something else (LLVM IR, something
higher-level, etc.)? LLVM doesn't understand ASTs directly.

Longer term, it sure would be nice to have flang. :slight_smile:

                             -Dave

Your best bet is to use llvm-gfortran. I don't know what you mean by
"AST." Do you really want an AST or something else (LLVM IR, something
higher-level, etc.)? LLVM doesn't understand ASTs directly.

Probably for high-level optimisations, or just to see if the parser is
good, as I do in my compiler.

But AST is language/compiler specific, I also recommend you to
transform everything to LLVM IR and do your stuff there. The LLVM IR
is more high level and extremely more expressive (types and
everything) than GCC IR, so you probably get everything you want from
there.

Longer term, it sure would be nice to have flang. :slight_smile:

That supports HPF!! Yeah!

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Hi David/Renato,

By AST I mean Abstract Syntax Tree. We are writing an optimization
pass for some FORTRAN95 + MPI code that requires us to analyze the
AST. We thought of 2 ways of doing this:
1. Compile the code using Clang/llvm-gfortran, get the textual AST
dump (somehow), analyze the AST dump using Ruby, modify it and then
feed back the modified AST to LLVM.
2. Do the analysis as an LLVM module.

From your comments, I get the feeling that 2nd option is the better option.

Thanks for your responses!
Nilesh.

P.S. Does anybody have an idea about the 'llvm-gcc: error trying to
exec 'cc1': execvp: No such file or directory' error?

And your analysis can be applied to other languages as well, maybe
even C/C++ using MPI bindings?

That'd be a great addition to LLVM! :wink:

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

I was doing something similar last year and tried writing my own Fortran
lexer/parser and reusing some of the existing ones. I found it so hard that I
ended up rewriting the 800kLOC of Fortran code in a more modern language by
hand. Basically, the Fortran-related open source tools are so poorly written
and unreliable that they are not worth using. AFAIK, the llvm-gfortran
compiler is just an LLVM backend on GCC's Fortran front-end. GCC is awful so
I would not recommend trying to get anything sensical out of it.

One project I did have limited success with was g95-xml, which is a hacked
version of GCC's g95 compiler that can output the nearest thing Fortran has
to an AST as XML:

  http://g95-xml.sourceforge.net/

The "First attempts" version that I used was a Perl programmer's idea of a
parse tree though. :wink:

For example:

<fortran>
  <statement id="0xbdf7b30" type="PROGRAM" loc="[0,6,0,18]"/>
  <statement id="0xbdf8420" type="TYPE_DECLARATION" loc="[1,6,1,23]"
    decl_type="0x705820" decl_kind="0xbdf7fe0" decl_symbols="0xbdf8290"/>
  <statement id="0xbdf8f90" type="ASSIGNMENT" loc="[2,6,2,12]"
expr1="0xbdf8100"
    expr2="0xbdf8b00"/>
  <expr id="0xbdf8100" type="VARIABLE" loc="[2,6,2,7]" symbol="0xbdf8290"/>
  <expr id="0xbdf8b00" type="CONSTANT" loc="[2,10,2,12]" value="1.E+0"/>
  <statement id="0xbdf9550" type="END_PROGRAM" loc="[3,6,3,17]"/>
</fortran>

The edges between nodes in the AST are represented by those hexadecimal values
(!). IIRC, after a lot of effort writing OCaml code to decipher that "XML", I
discovered that it did not, in fact, contain all of the information from the
source code and could not be used to perform the automated transformation
that I wanted.

So my advice is certainly to compile your Fortran into LLVM IR because that is
a far more sane and malleable format.

You realize that LLVM doesn't have data dependence analysis, so supporting HPF (with any Fortran front-end) won't be out-of-the-box. :slight_smile:

-bw

The llvm-gcc executable is a driver-driver for the compiler. It "execs" the cc1 program. It's apparently not finding it. My first guess is that it's not in the place it expects it to be. On my system (Mac OS X), here's what I get from the `find' command:

./libexec/gcc/i386-apple-darwin10.0.0/4.2.1/cc1
./libexec/gcc/i386-apple-darwin9.2.2/4.2.1/cc1
./libexec/gcc/i386-apple-darwin9.4.0/4.2.1/cc1
./libexec/gcc/i386-apple-darwin9.5.0/4.2.1/cc1
./libexec/gcc/i386-apple-darwin9.6.0/4.2.1/cc1
./libexec/gcc/x86_64-apple-darwin10.0.0/4.2.1/cc1

-bw

I was just pushing a bit further, there might be someone listening
that would like to do it... :wink:

But it is my opinion that any software written today that does not
contemplate intrinsic parallelism (any type, but preferable most
types) is a waste of time.

The era of one processing unit per box was abandoned decades ago and
yet, we seem to keep thinking serially, even when programming for
multiple processors...

Though, that's a long discussion for a completely different mailing list... :wink:

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm