Google SOC - Idea

Hi,

I noticed that LLVM had signed up as a mentoring organization for
Google's summer of code. LLVM looks like an exciting project that
overlaps some of my interests.

I would be interested in developing an additional front end for a
language it does not currently support (I'm open to what language). I
do not know much about what this entails in regards to what LLVM
requires from its front ends. But I have experience with the ANTLR
parser generator that looks like it could be used to generate such a
front end.

Are you folks interested in this?

If so, please let me know ASAP and I'll put together an application.

-Scott

The LLVM Compiler Infrastructure Project :
"Write a new frontend for some other language"

So I expect they'd be most interested.

You may wish to do it using HLVM. I know Reid is going to restart
work on that in April or so and has expressed interest in having
people use it while he develops it, in order to be sure the feature
set is appropriate.

~ Scott McMurray

Yes!, particularly if it is in a domain that LLVM isn't known to handle well currently. I'd suggest ocaml/haskell or python/ruby as representatives of the functional and scripting language communities. If you were to do a frontend port, I'd strongly suggest starting from the existing compiler (e.g. the python bytecode compiler), modifying it to emit LLVM code.

-Chris

I think PyPy already generates LLVM bytecode.

Aaron

Hi Scott,

Hi,

I noticed that LLVM had signed up as a mentoring organization for
Google's summer of code. LLVM looks like an exciting project that
overlaps some of my interests.

I would be interested in developing an additional front end for a
language it does not currently support (I'm open to what language). I
do not know much about what this entails in regards to what LLVM
requires from its front ends. But I have experience with the ANTLR
parser generator that looks like it could be used to generate such a
front end.

Recently, there has been work done on a Fortran frontend. Unfortunately, it was abandonded because of the switch the LLVM team is currently making to a new bytecode format and, afaik, a new GCC version in the near future. The rub is that the gfortran part of GCC 4.0 isn't really mature, but the ones in 4.1 and the upcoming 4.2 are.

I think getting a Fortran frontend working will highly benefit the relevance of LLVM for researchers. Quite a few of the SPEC CPU2000 and CPU2006 benchmarks are fully or partially written in Fortran, and the SPEC benchmarks remain by far the most important benchmarks used in computer architecture and compiler-related research. Allowing all of the SPEC CPU200x benchmarks to compile using LLVM will vastly increase it's use imho.

greetings,

Kenneth

Some additional benefits which I forgot to mention:

- Creating a fully functional Fortran frontend would allow evaluation of the current LLVM optimizations on Fortran-based code. Maybe there are additional possibilities here which haven't been identified for C/C++-based code.

- Supporting Fortran would allow new possibilities for mixed-language programmed programs (for example, some of the SPEC CPU2006 benchmarks): bits from both languages can be mutually optimized. This might open gateways to faster code for these SPEC benchmarks, which no other compiler can reach...

K.

Getting the front end for Fortran finished is definitely something I
would be interested in working on. I will draft up a little proposal
and send it out to this list.

-Scott

I completely agree. This makes the most amount of sense to do when we merge gcc 4.2 (as a whole) into llvm-gcc. I was talking to someone recently about doing it (Scott M?) but I don't remember who and I don't remember who far along it is. Anyone want to update us?

-Chris

Hi,

Here is a rough draft of the application -- a FORTRAN front-end to
LLVM. In accordance with the summer of code specifications it is split
into two portions: the abstract which describes the project, and the
details description which describes me and how I plan to complete the
project.

It's a little long, (but about half the max length the application
directions specify). Would someone be willing to read it in full
(Chris?, Kenneth?) and see if the project as specified would really be
useful to LLVM and something you guys would support?

I will submit the application on the 23rd of March, so if anyone would
like to offer feedback or suggest changes to the direction of the
project, please email them to me before then.

Cheers,
Scott

ABSTRACT

Some initial thoughts:
- Your abstract is almost half as long as the full proposal, so you
might want to move some of its content into the detailed description.
- Error on the side of being too formal. I doubt "stuffy",
"ridiculous", and "shit" will do much to help you.

~ Scott McMurray

P.S. How many Scotts are there on this list, anyways? :stuck_out_tongue:

Hi Scott!

Some comments inlined:

ABSTRACT
----------------

The purpose of this project is to develop a FORTRAN front-end to the
Low Level Virtual Machine (LLVM) compiler infrastructure. LLVM is a
mature collection of tools which provide a powerful resource for
language developers and end-user programmers. LLVM consists of roughly
three components. The first is a front-end to a language (such as C,
C++, or virtually any other) that parsers the language and converts it

s/parsers/parses/

into LLVM's Intermediate Representation (IR). The LLVM IR is a
language- and target- independent representation. The next component
of LLVM is a collection of powerful optimization routines that operate
on the IR. The final module of LLVM is a backend that can compile
optimized code to a number of platforms including x86 and PowerPC, can
emit an optimized C representation of the original program, or use a
Just-In-Time compiler to interpret the program on a variety of
platforms.

If you want, you could put a URL to LLVM's homepage in here somewhere.

To claim FORTRAN is mature is an understatement. In use for over 50
years, FORTRAN is utilized in a wide variety of legacy code bases.
While younger and flashier languages get all the press, FORTAN still
enjoys wide usage in many scientific fields and other
businesses—especially when legacy code is involved.

Quite true! I've worked at some of these places. :slight_smile:

The implementation of a FORTAN front end would benefit both the
FORTRAN user-base and the LLMV project. The scientific uses of
FORTRAN—along with its other applications—are often heavily involved
simulations and calculations that are very resource demanding and
could greatly benefit from LLVM's powerful optimization mechanisms.
The LLVM project will, of course, benefit from having another
front-end language and the resulting larger "market" available that
can utilize LLVM. Additionally, the LLVM optimization team will have
another case study to explore the effectiveness of its optimization
routines. Especially the development of mixed-language optimization
routines involving FORTAN and one or more other language may be
explored and implemented.

Deliverables for this project are:
* FORTRAN front-end to LLVM
* Documentation and tools for using the front-end

Could you also work on a suites of tests? It doesn't have to be compete or full-featured, but it's good to have tests hanging around so that you make sure you get things done correctly.

DETAILED DESCRIPTION
----------------------------

The reader should refer to the abstract for a description of the goals
of this project and their justifications. This section is devoted to
the applicant's experience, interests, qualifications, and plan for
completing the project.

* Personal Background

I am a Junior, Environmental Engineering and Economics double major at
Swarthmore College and do not come from a traditional Computer Science
background. My story is the standard "taught himself to program at
twelve, no time for stuffy computer science courses" narrative. My
programming ideology is one of problem solving: I encounter a problem
in my life and solve it using whatever tools or resources are needed
to do so. To this end, I have dabbled in a wide range of fields from
databases, to statistical analysis, to GUI applications, to web
applications, and more. Examples of this work are an anti-censorship,
in-browser web-browser (it's ridiculous, I know, but darn useful)
[http://palary.com] and Longhand a calculator program for OS X
[http://longhand.palary.com]. I have also worked for a number of
clients developing GUI applications, data analysis applications, and
web applications.

I became interested in the area of language development as a result of
a desire for better tools to deal with the environmental modeling and
economical modeling issues that I came into contact with in my
studies. It seemed to me like these areas could benefit greatly from
domain specific languages that were tailored to their specific needs
(such as built in units in the environmental modeling case). I am
currently working toward developing such a language for my Senior
Thesis here at Swarthmore.

I don't know how much Google wants in this section. One question I come away with is, "How does Fortran relate to your language?"

* Motivation

To gain the background to carry out the complicated task of developing
a domain specific modeling language, I have since wiggled my way
around the pre-requisites and enrolled in an upper-level compiler
course here at Swarthmore. Additionally, I have actively immersed
myself in the field. In this immersion I came across the LLVM project
and I believe that its IR would be an excellent target for the
language I eventually create. LLVM is, of course, a complex tool and
I wish to gain much more familiarity with it.

This is my primary motivation for working on a FORTRAN front-end:
gaining experience and background. I am here to learn, and if my
learning allows both the LLVM community and the FORTRAN community to
receive an excellent tool, as it assuredly will, so much the better.

* The Plan

I will possess roughly three-months this summer to work on the
front-end, and I am very confident that it will be completed on time
(I am actually thinking that I could do much more in that period, but
I learned long ago not to stick my head out on thing like this :).

I am not as familiar with the technologies involved as I would like
to, so my planning is necessarily imprecise. My rough plan proceeds as
follows:

=====
- 2 weeks – Become familiar with the technologies: LLVM, FORTAN (I've
programmed a lot of languages, but never that), and GCC's FORTAN
implementation. Do not engage in any direct work on the projects but
gain experience with the tools. Ascertain what previous work has been
accomplished towards developing a FORTRAN front-end.

- 4 weeks – Build the FORTRAN front-end.

- 2 weeks – Smooth things out, unit tests, etc…

- 1 week – Documentation, make sure that the front-end can be
maintained by someone else.

I would suggest writing documentation along the way. Even if they are just notes to yourself. It will help this week go by faster (developers *hate* writing docs).

- 3 weeks – "Shit Happens"

I plan on first attempting to implement the FORTRAN front-end by
co-opting the GCC FORTRAN parser. If that fails, I will build a
front-end using ANTLR [http://antlr.org] a parser generator with which
I am familiar and for which a FORTRAN grammar is already available
(targeting an obsolete version of ANTLR, but it should not be too
difficult to update).

One thing you don't mention is which version of Fortran you want to do: Fortran 77, 90, 95, ??. Each has their own challenges. If you're going to do a subset of one of the languages, then define how much of it you want to do, and then provide a roadmap for future work. I think it would be a good idea to scour the web looking for examples of Fortran programs out there. Build your front-end using these as testcases. Start with the simplest one, get it going, then get more complex. It will then be easy to make these small programs into testcases to include in the LLVM tester.

* In Short

I'm psyched :slight_smile:

I am confident that this is a very manageable project, I will complete
it on time, and I will learn a great deal in the process of
implementing it.

Awesome! Good luck on it!! :slight_smile: A Fortran FE will be a much welcome addition.

-bw

Hi,

Here is a rough draft of the application -- a FORTRAN front-end to
LLVM. In accordance with the summer of code specifications it is split
into two portions: the abstract which describes the project, and the
details description which describes me and how I plan to complete the
project.

It's a little long, (but about half the max length the application
directions specify). Would someone be willing to read it in full
(Chris?, Kenneth?) and see if the project as specified would really be
useful to LLVM and something you guys would support?

Some (quite a few actually, some more important than others) remarks:

"that parsers the language and converts it"
=> "that PARSES the language and converts it"

"The next component of LLVM"
=> "The second component of LLVM"

"To claim FORTRAN is mature ... is involved."
=> mention SPEC CPU2000 and CPU2006 here as important examples of widely-used Fortran benchmarks for both the research community and industry

"My story is the standard "taught himself to program at twelve, no time for stuffy computer science courses" narrative."
=> as Scott McMurray pointed out, don't use words like "spuffy", makes you sound like a script kiddie... Something like "My programming experience is self-taught, without major computer science courses."

"Examples of this work are an anti-censorship,
in-browser web-browser (it's ridiculous, I know, but darn useful)
[http://palary.com] and Longhand a calculator program for OS X
[http://longhand.palary.com]."
=> references are good, but I don't think there's a worse way of selling them... Better: "Examples of this work are anti-censorship, an in-browser web-browser (which may sound superfluous, but is quite usefull; see http://palary.com), and a calculator program for OS X (Longhand, see http://longhand.palary.com)".

"I have since wiggled my way
around the pre-requisites and enrolled in an upper-level compiler
course here at Swarthmore"
=> don't use wiggled (but since I'm not native English, I can't anything suitable at this time)

"LLVM is, of course, a complex tool and
I wish to gain much more familiarity with it."
=> drop the "of course" part, makes LLVM sound too bloody complex (it is complex, but also well structured)

"This is my primary motivation for working on a FORTRAN front-end:
gaining experience and background. I am here to learn, and if my
learning allows both the LLVM community and the FORTRAN community to
receive an excellent tool, as it assuredly will, so much the better."
=> "My primary motivation for working an a FORTRAN front-end for LLVM is gaining experience and background in software development. I want to learn from this experience, and would like to contribute to both the LLVM and FORTRAN community doing so."

"I will possess roughly three-months this summer to work on the front-end,"
=> "I will be able to spend roughly three-months this summer to work on the front-end"

"LLVM, FORTAN (I've programmed a lot of languages, but never that), and GCC's"
=> drop the part between brackets, makes the project sound less likely to succeed

"- 1 week – Documentation, make sure that the front-end can be maintained by someone else."
=> I think it is consired better if you develop your documentation along with the implementation. Experience has taught me if you plan to comment your code afterwards, you won't. You could also state that the Documentation part will consist of hacking up examples or similar, but don't make it sound that you'll document your code afterwards, because you won't.

"- 3 weeks – "Shit Happens"
=> Don't use shit :slight_smile: And maybe put this in a phrase: "The three remaining weeks will be used to solve problems related to the project which are not in the current plan"

"I'm psyched :slight_smile:

I am confident that this is a very manageable project, I will complete
it on time, and I will learn a great deal in the process of
implementing it."

=> "I am thrilled to start working on this project, and feel that I will be able to complete it on time succesfully. While working on this project, I hope to learn a great deal."

Please don't hesitate to show us your updated proposal again before sending it.

greetings,

Kenneth

Hi Scott, I'm currently porting the Ada gcc front-end to LLVM.
This is similar to what you want to do, so here are some comments
from the trenches...

I plan on first attempting to implement the FORTRAN front-end by
co-opting the GCC FORTRAN parser.

Good plan. However the Fortran front-end that comes with gcc 4.0
is known to be weak (llvm-gcc is based on gcc 4.0). That's because
gcc 4.0 is based internally on a new infrastructure compared to gcc 3.0,
and it took the front-ends a version or two to catch up and sort out
the bugs. That was the case for Ada too, which is why the first thing
I did was to backport the Ada front-end from gcc 4.3 to llvm-gcc. I
advise you to backport the gcc 4.2 fortran front-end to llvm-gcc. For
Ada this was quite easy to do.

Last time I tried to build the fortran front-end in llvm-gcc it got
quite a long way before it died. This is a good sign.

I've found LLVM to be solid for code that can be produced by C, and
easily fixed for the rest. There are four classes of problems:
(1) build failures because some tree code is not handled. CIEL_DIV_EXPR
is an example that is used by Fortran but isn't implemented yet, but these
are usually easy enough to implement (I need to implement this one for Ada,
so we'll see who gets there first!). (2) build failures because you're
outside the world of C. I suspect Fortran will be less problematic than
Ada, but even in the Ada case the LLVM design has proved sound, and each
problem has individually been simple to fix. The tricky ones I had were
(a) resolution of forward declarations; (b) handling of exotic packed
bit fields. (3) build failures due to bugs in gcc 4.0. It can be tricky
to tell if problems are due to a bug in the front-end or a gcc 4.0 bug.
I had an example with the front-end producing non-constant constructors
for global variables. It was actually the fault of one of the gcc 4.0
helper routines, and nothing to do with the front-end at all. These tend
to be fairly easy to fix, because they've usually been fixed in more recent
versions of gcc! So you have to muck around in gcc 4.2 to find out how
come it works there, then backport the fix. (4) wrong code. I haven't
seen many of these. Having a good testsuite helps to flush these out.
I'm currently working my way through the Ada testsuite, which is quite
comprehensive. Hopefully the fortran one is too.

If that fails, I will build a
front-end using ANTLR [http://antlr.org] a parser generator with which
I am familiar and for which a FORTRAN grammar is already available
(targeting an obsolete version of ANTLR, but it should not be too
difficult to update).

Bad plan. I doubt you can build a serious fortran compiler in this way
in the time-frame you are considering.

Good luck!

Duncan.

Duncan Sands wrote:

If that fails, I will build a 
front-end using ANTLR [] a parser generator with which
I am familiar and for which a FORTRAN grammar is already available
(targeting an obsolete version of ANTLR, but it should not be too
difficult to update).
    

Bad plan.  I doubt you can build a serious fortran compiler in this way
in the time-frame you are considering.
  

I agree. I used to work on Fortran compiler long, long ago in a former life. There is no way you’re going to write a usable front-end from scratch in a few months, even with the head start of a parser. The parser is the easy bit.

Also – and I’m surprised no one else noticed this – you misspell FORTRAN as FORTAN as often as not. That does not make a good impression :slight_smile:

I don't think he's saying he'll build that Fortran front-end from scratch, but will use an ANTLR-based parser for the front-end if the GCC parser causes problems.. Or am I mistaking?

And yes, fix the FORTAN typos :slight_smile:

K.

Kenneth Hoste wrote:

Duncan Sands wrote:

If that fails, I will build a front-end using ANTLR [http://antlr.org] a parser generator with which I am familiar and for which a FORTRAN grammar is already available (targeting an obsolete version of ANTLR, but it should not be too difficult to update).

Bad plan. I doubt you can build a serious fortran compiler in this way in the time-frame you are considering.

I agree. I used to work on Fortran compiler long, long ago in a former life. There is no way you're going to write a usable front-end from scratch in a few months, even with the head start of a parser. The parser is the easy bit.

I don't think he's saying he'll build that Fortran front-end from scratch, but will use an ANTLR-based parser for the front-end if the GCC parser causes problems.. Or am I mistaking?

A parser only handles syntax. Most of a front-end deals with semantics. The GCC front-end does both syntax and semantics. The only ANTLR grammar I could find for FORTRAN does syntax only; it produces just an AST. It also only handles FORTRAN 77, which is very out of date. It didn't even have vector expressions, which were added in FORTRAN 90 and present in at least one commercial compiler in the late eighties.

To be honest, I'm not sure he realizes what he's getting himself into.