[GSoC] "Improving Clang's AST source-fidelity" proposal

Hello everybody. This is my other proposal for the Google Summer of Code.
Since the student application deadline is very near, I’ve already submitted it
(and the other one about lambdas) to melange.
Comments and suggestions are really appreciated.

Hello everybody. This is my other proposal for the Google Summer of Code.

[...]

Title: Improving Clang's AST source-fidelity

[...]

This list of
issues has been obtained in strict cooperation with Roberto Bagnara,
Abramo Bagnara and Enea Zaffanella, who actively work with clang on
source-based applications and would be very happy to mentor this idea.

As said by Nicola, we are really interested in seeing this project
completed and hence will be ready to serve as mentors to help him
achieving the stated goals.

Abramo and Enea.

Hi Nicola,

I think this is a good proposal. My main concern about this project is the potential impact it will have on compile-time performance and memory footprint of the compiler. I'd like to see in the proposal a plan on measuring regressions on these fronts (which are bound to happen by retaining more information in the ASTs) as well as what is deemed an acceptable level for performance to slide. For example, if compile time regresses by 10%, I would strongly object to the changes. Even 5% would probably be too high.

The other thing to keep in mind is that any AST changes will also require changes to the PCH format. It's critical that PCH remain efficient, and that whatever changes we make to the AST does not adversely impact how much data is pulled from the PCH file (thus slowing down compile times). For most of the topics you outline I don't think this is an issue, but this is definitely something to keep in mind.

I think the proposal covers a bunch of areas, and it is entirely possible that you won't have time to do them all. If you were to tackle this project, my preference is that you work on one area to completion, and then proceed to the next one.

Cheers,
Ted

Hi Nicola,

Hello Ted.

I think this is a good proposal. My main concern about this project is the potential impact it will have on compile-time performance and memory footprint of the compiler. I’d like to see in the proposal a plan on measuring regressions on these fronts (which are bound to happen by retaining more information in the ASTs) as well as what is deemed an acceptable level for performance to slide. For example, if compile time regresses by 10%, I would strongly object to the changes. Even 5% would probably be too high.

The other thing to keep in mind is that any AST changes will also require changes to the PCH format. It’s critical that PCH remain efficient, and that whatever changes we make to the AST does not adversely impact how much data is pulled from the PCH file (thus slowing down compile times). For most of the topics you outline I don’t think this is an issue, but this is definitely something to keep in mind.

I agree with you. I’ve updated the proposal to reflect your concern. The new text is on melange at
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/gigabytes/1001

Since this proposal is made up of a number of almost independent issues, performance measurement can
clearly be done on a per-issue basis. What ‘acceptable loss’ means in this context will be discussed here.

I think the proposal covers a bunch of areas, and it is entirely possible that you won’t have time to do them all. If you were to tackle this project, my preference is that you work on one area to completion, and then proceed to the next one.

Your suggestion is entirely right. Every issue will be addressed completely before proceeding to the next. It’s better to have
n totally correct patches than 2*n approximative ones.

Cheers,
Ted

Bye,
Nicola