[GSoC 2016] Capture Tracking Improvements - Mid term report

Hello,

This is a more detailed overview of my progress than the weekly reports, which can also be found on the mailing list.

Over the past two weeks I have been learning a lot more about capture tracking. From this I was able to instrument the current implementation in order to identify some of the false positives in it. I was hoping to have more definitive results by now than what I currently have, but due to some unforeseen issues with the output becoming scrambled I cannot provide this just yet. I have now resolved this and plan to have some more concrete results within the coming week.

The current algorithm for Capture tracking will look through the uses of a pointer. If there are too many uses it will conservatively say that the value is captured to avoid taking up too much compile time. If not it will then determine whether or not the Value is captured in various ways based on its opcode.
However there are some deficiencies with the current implementation. Such as the false positives and the fact that it takes rather a lot of compile time to run.

I am currently compiling a large piece of code with an instrumented version of LLVM in order to identify false positives. I then plan to fix all the identified false positives. After this is complete I will be be moving onto adding the caching of results to the current design. This will make it more efficient. If I complete this, I will work on making the analysis more accurate in order to recognize object sub-graphs which do not escape.

Please let me know if you have any suggestions.

Many thanks,
Scott

Hi Scott,

Over the past two weeks I have been learning a lot more about capture
tracking. From this I was able to instrument the current implementation in
order to identify some of the false positives in it. I was hoping to have

How are you instrumenting the analysis to identify false positives?

more definitive results by now than what I currently have, but due to some
unforeseen issues with the output becoming scrambled I cannot provide this
just yet. I have now resolved this and plan to have some more concrete
results within the coming week.

The current algorithm for Capture tracking will look through the uses of a
pointer. If there are too many uses it will conservatively say that the
value is captured to avoid taking up too much compile time. If not it will
then determine whether or not the Value is captured in various ways based on
its opcode.
However there are some deficiencies with the current implementation. Such as
the false positives and the fact that it takes rather a lot of compile time
to run.

I am currently compiling a large piece of code with an instrumented version

Can you give some more details here? What is this large piece of
code?

of LLVM in order to identify false positives. I then plan to fix all the
identified false positives. After this is complete I will be be moving onto

This sounds reasonable, but I'd say it is better to avoid this kind of
workflow:

  for (every_false_positive)
    understand_false_positive;
  for (every_false_positive)
    fix_false_positive;

but have it be more like:

  for (every_false_positive) {
    understand_false_positive;
    fix_false_positive;
  }

i.e. you don't have to understand _all_ of the cases where our capture
tracking is too conservative to make it better. Find one _specific_
case where LLVM today is stupid today, and fix it; and iterate.
Making these kind of small changes will also increase the trust the
community has in you, which will be helpful when you start proposing
bigger changes (e.g. perhaps for non-escaping subgraphs).

-- Sanjoy

Hi Sanjoy,

Hi Scott,

Over the past two weeks I have been learning a lot more about capture
tracking. From this I was able to instrument the current implementation in
order to identify some of the false positives in it. I was hoping to have

How are you instrumenting the analysis to identify false positives?

I've instrumented it so that it will output the pointer value to llvm::dbgs() whenever MayBeCaptured or MayBeCapturedBefore returns true in order to identify all "positives", false or not. Then I am manually filtering out all of cases which appear to be the correct behaviour. Hopefully everything left over will be a false positive.

more definitive results by now than what I currently have, but due to some
unforeseen issues with the output becoming scrambled I cannot provide this
just yet. I have now resolved this and plan to have some more concrete
results within the coming week.

The current algorithm for Capture tracking will look through the uses of a
pointer. If there are too many uses it will conservatively say that the
value is captured to avoid taking up too much compile time. If not it will
then determine whether or not the Value is captured in various ways based on
its opcode.
However there are some deficiencies with the current implementation. Such as
the false positives and the fact that it takes rather a lot of compile time
to run.

I am currently compiling a large piece of code with an instrumented version

Can you give some more details here? What is this large piece of
code?

It's the code for Firefox. I saw someone on IRC was using it as well and thought it would be good as it is supported well enough to "just work".

of LLVM in order to identify false positives. I then plan to fix all the
identified false positives. After this is complete I will be be moving onto

This sounds reasonable, but I'd say it is better to avoid this kind of
workflow:

   for (every_false_positive)
     understand_false_positive;
   for (every_false_positive)
     fix_false_positive;

but have it be more like:

   for (every_false_positive) {
     understand_false_positive;
     fix_false_positive;
   }

i.e. you don't have to understand _all_ of the cases where our capture
tracking is too conservative to make it better. Find one _specific_
case where LLVM today is stupid today, and fix it; and iterate.
Making these kind of small changes will also increase the trust the
community has in you, which will be helpful when you start proposing
bigger changes (e.g. perhaps for non-escaping subgraphs).

I agree that this does seem like a better workflow, however I am finding it difficult to identify a single false positive without at least finding a few because of the process I'm using, filtering the instrumented output.

-- Sanjoy

Many thanks,
Scott

Hi Scott,

Scott Egerton wrote:
> I've instrumented it so that it will output the pointer value to llvm::dbgs() whenever MayBeCaptured or
> MayBeCapturedBefore returns true in order to identify all "positives", false or not. Then I am manually filtering out
> all of cases which appear to be the correct behaviour. Hopefully everything left over will be a false positive.

I'm sure you know this already, but for things like capture tracking a
"false positive" to the human eye can easily be wrong. :slight_smile: Have you
tried extracting out small C/C++ (or even LLVM IR) test cases that you
can point to and say "here we're clearly being too conservative"?
Once you have these, it'd be great to file a bug for each one of them:
that way even if you cannot get to some of these by the end of your
GSoC period, you or someone else can get to them later.

>> Can you give some more details here? What is this large piece of
>> code?
>
> It's the code for Firefox. I saw someone on IRC was using it as well and thought it would be good as it is supported
> well enough to "just work".

That sounds good, assuming Firefox has good internal unit test
coverage. I'd have gone with using clang/llvm itself, but any
code-base that has a good set of unit tests (so that you'll know if
you've been too aggressive in inferring nocapture) is okay.

> I agree that this does seem like a better workflow, however I am finding it difficult to identify a single false
> positive without at least finding a few because of the process I'm using, filtering the instrumented output.

That's fine, and I'm not asking you to stop trying to get a holistic
view of our weak points; *but* I'd urge you to _prioritize_ finding
and fixing a specific case, i.e. just pick any one false positive (at
random) and see why it is we're too conservative and fix the
underlying issue. That will give you a much better idea of what's
approachable, and what isn't; and will help build trust within the
community.

Thanks,
-- Sanjoy