[GSoc 2016] Proposal - Capture Tracking Improvements

Hello,

Attached is the proposal that I have submitted. I would be grateful for any and all feedback provided.

Many Thanks,
Scott

GSoc Proposal 2016 - Scott Egerton.pdf (382 KB)

Adding Mehdi Amini and Phillip Reames to CC.

(I've copied the body of the text from the PDF into the email and am replying inline to particularly parts.)

Hello,

Attached is the proposal that I have submitted. I would be grateful for any and all feedback provided.

Many Thanks,
Scott

C APTURE T RACKING I MPROVEMENTS
ABSTRACT
Capture tracking analysis is an analysis pass used to determine which pointers are
“captured”. This means that a function has made a copy of a pointer that has outlived
the function that called it.

Subtle but important clarification: "has outlived" to "may potentially outlive". As with most compiler analysis, we have to be conservative when we can't fully analyze.

This information is useful during the optimisation process. I
believe that this is used to improve memory management.

Via aliasing primarily. Yes.

The capture tracking analysis is currently inefficient and inaccurate in cases due to
the fact that it returns false positives and expensive functions are unnecessarily
repeated.

We'll always have false positives. The trick is to get a reasonably small number with a reasonable impact on compile time. The universal problem balance in a compiler.

This is where I would make improvements. It could be improved in a
number of ways, as mentioned by Philip Reames on the mailing list. I would like to
use this opportunity to take my previous experience within LLVM and apply it to other
areas of LLVM.

BENEFITS

Improvements to pointer capture tracking would be greatly beneficial. Changes such
as the removal of false positives as well as the reduction to the cost through the
introduction of caching will be made. Caching the results of certain functions, storing
the result, will cause performance increases, should the same data be requested
again as the result will already be stored. As a result of this, it could become a more
valuable tool to have within the LLVM suite. This would also cause improvements to
the code optimisation process and potentially increase the quality of code compiled
with LLVM. It should also use up fewer resources during the analysis pass.

DELIVERABLES

There are cases where “potentially captured” is returned incorrectly. As a
starter task I would remove the known and unknown cases that cause this to
occur. This would serve as a good introduction to LLVM compiler analysis and
would be achievable in a short time span.
Approximately 2 weeks.

It would be good to gather a concrete list of known false positives. I can help with this, but I'd like you to do a first pass over the code and bug tracking system first. Another approach is to instrument the analysis, print each potentially captured result, and scan through the output when compiling a largish piece of code. Do you see a common pattern which looks worth analyzing further?

FYI, two weeks may be very optimistic. I suspect you'll iterate on this a couple of times finding more and more cases each time. Getting each change implemented and reviewed will take a couple of days. Thankfully, the analysis and review time should easily overlap. We can also overlap this with the design work for the next part.

By making changes to the current design, this could be made to be more cost
effective than it currently is. I would do this by caching the results as
previously suggested. This will be used to invalidate results when required.
Approximately 6 weeks.

You will definitely need a more concrete proposal for what caching and invalidation scheme you're using. I'd suggest reviewing callers of PointerMayBeCaptured to get a sense for what's going on. I can also help with this. (We should setup a skype call or something if your proposal gets approved.)

Once we have a proposal, we'll share this before you start implementation. That will give us a chance to find design problems early. :slight_smile:

The analysis could be made more accurate in order to recognise object sub-
graphs which do not escape.
Approximately 5 weeks.

I honestly doubt you'll get to this in the summer, but that's okay. Once we get closer to actually working on this, we'll need to drill in and make a more concrete proposal. There's a bunch of algorithms which could be used here and picking one will take some thought.

1 | S COTT E GERTONThis work will link in with other ongoing work within LLVM and will assist other
developers working on compiler analysis. I believe that I may be able to complete
this work within 11 weeks; however I have allocated the extra time to account for any
issues I may encounter.

I'm going to push strongly to have you contributing incrementally through the normal LLVM development process. This will mean that if something runs long and you don't get to finish, most of the supporting work will already have landed. (e.g. If we don't get to the object graph part, at least we'll have the caching improvements.)

BIOGRAPHICAL INFORMATION
I have been working on LLVM for the past year as an industrial placement student
within the MIPS compiler team and am keen to do more LLVM work. I am a BSc
Computer Science student at the University of Hull and will begin my final year of
study in September. In the past I have worked on a University project which required
me to be inventive with data structures in order to efficiently solve the given problem.
I thoroughly enjoyed this and would be grateful for another opportunity to exercise
my design skills.
Due to other commitments with my current industrial placement, unfortunately I will
be unable to work on this project during working hours (9-5 GMT) until the first week
of June. However, I am more than happy to make up this time as the summer
progresses.

June is fine by me. If anything, that works better for my schedule anyways.

Philip

Hi Phillip,

Thank you for the feedback. I will be updating this as I learn more.

Many thanks,
Scott