RFC - Profile Guided Optimization in LLVM

[Splitting this out from the original thread to reduce noise in it]

Do you want to take over this effort or should I poke more at it?

Since you've already started, it's easier if you poke more at it. Thanks. I've got a whole bunch of other things to go through.

OK, will do.

Jakob any comments on the patch? The only interesting part is in LiveIntervals::getSpillWeight. Do we have to scale the values somehow there?

Yes, BlockFrequency::getFrequency() is poorly named, it returns a fixpoint number. I think you should scale it to be relative to the entry block frequency.

+LiveIntervals::getSpillWeight(bool isDef, bool isUse, BlockFrequency freq) {
+ return (isDef + isUse) * freq.getFrequency();
}

This computation can overflow.

Yep, I went down the easy route and converted it to floating point arithmetic. Is that OK here?

@@ -178,9 +180,10 @@ bool SpillPlacement::runOnMachineFunction(MachineFunction &mf) {

  // Compute total ingoing and outgoing block frequencies for all bundles.
  BlockFrequency.resize(mf.getNumBlockIDs());
+ MachineBlockFrequencyInfo &MBFI = getAnalysis<MachineBlockFrequencyInfo>();
  for (MachineFunction::iterator I = mf.begin(), E = mf.end(); I != E; ++I) {
- float Freq = LiveIntervals::getSpillWeight(true, false,
- loops->getLoopDepth(I));
+ float Freq =
+ LiveIntervals::getSpillWeight(true, false, MBFI.getBlockFreq(I));
    unsigned Num = I->getNumber();
    BlockFrequency[Num] = Freq;

I think you should leave LiveIntervals out of this and just the block frequency directly, scaled as above. The getSpillWeight() function is just used here to get the non-overflowing frequency approximation.

Fixed.

Otherwise, looks good.

The attached patch also fixes the regression tests that failed. Mostly minor changes in register allocation. The SPARC one is a bit scary because an instruction now gets moved across inline assembly into a delay slot. Do you know if that is a safe transformation?

block-frequency-spilling.patch (25 KB)

[Splitting this out from the original thread to reduce noise in it]

+LiveIntervals::getSpillWeight(bool isDef, bool isUse, BlockFrequency freq) {
+ return (isDef + isUse) * freq.getFrequency();
}

This computation can overflow.

Yep, I went down the easy route and converted it to floating point arithmetic. Is that OK here?

Yes, that should be fine.

+LiveIntervals::getSpillWeight(bool isDef, bool isUse, BlockFrequency freq) {
+ float EntryFreq = BlockFrequency::getEntryFrequency();
+ return (isDef + isUse) * (freq.getFrequency() / EntryFreq);

Nitpick: The float division can be constant folded and turned into a multiplication:

  const float Scale = 1.0f / getEntryFrequency();
  return (isDef + isUse) * (freq.getFrequency() * Scale);

I wouldn’t trust the compiler to do that without -fast-math.

The attached patch also fixes the regression tests that failed. Mostly minor changes in register allocation. The SPARC one is a bit scary because an instruction now gets moved across inline assembly into a delay slot. Do you know if that is a safe transformation?

That sounds OK. As long as it is safe to move that instruction across the inline asm.

<block-frequency-spilling.patch>

LGTM

Thanks,
/jakob

I really don't think this is fine. I would like for LLVM to not rely on
floating point arithmetic on the host.

I tend to agree. The piece of code I was referring to is replacing other floating point code and interacts with other floating point code so this is not a regression.

- Ben

Apple folks are also gearing up to push on the PGO front. We are primarily interested in using instrumentation, rather than sampling, to collect profile info. However, I suspect the way profile ended up being used in the various optimization and codegen passes would be largely similar.

Excellent! We are initially interested in instrumentation, as well. This is where we draw most of our performance with GCC. Sampling is showing a lot of promise, however. And it really is not much different than instrumentation. Most of what changes is the source of profile data.

There is also some interests in pursuing profile directed specialization. But that can wait. I think it makes sense for us to get together and discuss our plans to make sure there won't be duplication of efforts.

Sure. My initial plan is fairly simple. Triage the existing instrumentation code and see what needs fixing. I'm starting this in the next week or so. What are your plans?

Your short term plan sounds great. We have some tentative plans build / improve the instrumentation tools and start to enhancing various passes to use profile data. Beyond that, our plan is still vague. Bob, comments?

Evan

I've been working on prototyping a new approach to instrumentation for both PGO and code coverage testing. I want to use the same data for both of those uses, and for code coverage we really need to have accurate source location info, including column positions and the starting and ending locations for every block of code. Working backward from the debug info to find source locations hasn't worked very well. The debug info just doesn't have enough detailed information. Instead, I am planning to insert the instrumentation code in the front-end. The raw profile data in this scheme is tied to source locations. One disadvantage is that it conflicts with the goal of having graceful degradation. Simply adding a blank line invalidates all the following source locations. My plan is to ignore any profile data whenever the source changes in any way. We had some discussions about this, and it is easy to come up with cases where even a trivial change to the source, e.g., enabling a debugging flag, causes massive changes in the profile data. I don't think graceful degradation should even be a goal. It is too hard to sort out insignificant changes from those that should invalidate all the profile data.

I am still at least a few weeks away from having any details to share about this new instrumentation scheme. I want to convince myself that it is viable before making a proposal to add it to LLVM.

I don't know if this new kind of instrumentation will fit everyone's needs for PGO. I hope that it will, but regardless the most important thing is that we standardize on the representation of profile data in the IR. We have a good start on that already with the branch probability metadata. The missing piece is adding per-function metadata to record the execution counts. We had planned to add that along with the branch probabilities but haven't gotten it done yet. You can see the lack of it in BlockFrequency::getEntryFrequency(), which just returns a constant value. We will want that to allow comparing profile data across different functions, which you can't easily get from the branch probabilities alone. I don't think anyone has yet come up with a specific proposal for that.

As long as we agree on the canonical metadata, there should be no problem with supporting different kinds of profile generators, including things like Auto FDO.

I also agree about the importance of getting inlining to use the PGO data. Chandler had looked into that and found that it was really difficult to do it properly with the current pass manager. Has any progress been made on that front?

Apple folks are also gearing up to push on the PGO front. We are primarily interested in using instrumentation, rather than sampling, to collect profile info. However, I suspect the way profile ended up being used in the various optimization and codegen passes would be largely similar.

Excellent! We are initially interested in instrumentation, as well. This is where we draw most of our performance with GCC. Sampling is showing a lot of promise, however. And it really is not much different than instrumentation. Most of what changes is the source of profile data.

There is also some interests in pursuing profile directed specialization. But that can wait. I think it makes sense for us to get together and discuss our plans to make sure there won't be duplication of efforts.

Sure. My initial plan is fairly simple. Triage the existing instrumentation code and see what needs fixing. I'm starting this in the next week or so. What are your plans?

I've been working on prototyping a new approach to instrumentation for both PGO and code coverage testing. I want to use the same data for both of those uses, and for code coverage we really need to have accurate source location info, including column positions and the starting and ending locations for every block of code. Working backward from the debug info to find source locations hasn't worked very well. The debug info just doesn't have enough detailed information.

I'm curious what problems you've had. You've surely not mentioned them
before. Note that I'm not saying that I think this is the best method,
but I'm curious what problems you've had and what you're running into.

Instead, I am planning to insert the instrumentation code in the front-end. The raw profile data in this scheme is tied to source locations. One disadvantage is that it conflicts with the goal of having graceful degradation. Simply adding a blank line invalidates all the following source locations. My plan is to ignore any profile data whenever the source changes in any way. We had some discussions about this, and it is easy to come up with cases where even a trivial change to the source, e.g., enablin!
g a debugging flag, causes massive changes in the profile data. I don't think graceful degradation should even be a goal. It is too hard to sort out insignificant changes from those that should invalidate all the profile data.

Even a comment?

-eric

Apple folks are also gearing up to push on the PGO front. We are primarily interested in using instrumentation, rather than sampling, to collect profile info. However, I suspect the way profile ended up being used in the various optimization and codegen passes would be largely similar.

Excellent! We are initially interested in instrumentation, as well. This is where we draw most of our performance with GCC. Sampling is showing a lot of promise, however. And it really is not much different than instrumentation. Most of what changes is the source of profile data.

There is also some interests in pursuing profile directed specialization. But that can wait. I think it makes sense for us to get together and discuss our plans to make sure there won't be duplication of efforts.

Sure. My initial plan is fairly simple. Triage the existing instrumentation code and see what needs fixing. I'm starting this in the next week or so. What are your plans?

I've been working on prototyping a new approach to instrumentation for both PGO and code coverage testing. I want to use the same data for both of those uses, and for code coverage we really need to have accurate source location info, including column positions and the starting and ending locations for every block of code. Working backward from the debug info to find source locations hasn't worked very well. The debug info just doesn't have enough detailed information.

I'm curious what problems you've had. You've surely not mentioned them
before. Note that I'm not saying that I think this is the best method,
but I'm curious what problems you've had and what you're running into.

The main issue is that I want precise begin/end source locations for each statement. Debug info doesn't do that. Even if we enable column information, debug info just gives you one source location for each statement. That's not a problem with debug info -- it's just not what it was intended to be used for.

There's also the issue that I want PGO instrumentation to work the same regardless of the debug info setting.

Instead, I am planning to insert the instrumentation code in the front-end. The raw profile data in this scheme is tied to source locations. One disadvantage is that it conflicts with the goal of having graceful degradation. Simply adding a blank line invalidates all the following source locations. My plan is to ignore any profile data whenever the source changes in any way. We had some discussions about this, and it is easy to come up with cases where even a trivial change to the source, e.g., enablin!
g a debugging flag, causes massive changes in the profile data. I don't think graceful degradation should even be a goal. It is too hard to sort out insignificant changes from those that should invalidate all the profile data.

Even a comment?

Yes. Obviously you could do that, but it would add significant complexity and little real value.

The main issue is that I want precise begin/end source locations for each statement. Debug info doesn't do that. Even if we enable column information, debug info just gives you one source location for each statement. That's not a problem with debug info -- it's just not what it was intended to be used for.

True... somewhat. Precise begin/end source locations exist and the
debug info will map them back to regions of code that are associated
with a particular line and column - this should also change when the
statement changes. Anything else should probably be a bug? The only
difference I can see is that there's no concrete "done with this
statement" in the code. It might take some processing work to get "all
statements that executed between these source ranges" because of code
motion effects, but you should be able to get a concrete range.

There's also the issue that I want PGO instrumentation to work the same regardless of the debug info setting.

Basically it's going to be "line-tables-only" :slight_smile:

Of course, I'm also not signing up to do the work, just curious what
the limitations are in the infrastructure here.

Even a comment?

Yes. Obviously you could do that, but it would add significant complexity and little real value.

Mmm.. I think you're underestimating the amount of usefulness here,
but we can revisit when there's some code. :slight_smile:

-eric

Our 64-bit fixpoint representation of block frequencies doesn’t have the dynamic range to do that. See PR16402.

/jakob

Well, it took me much longer than a few weeks, and I completely abandoned the idea to tie the data to specific source locations and require it to be regenerated for any source change at all, but I finally have a proposal for doing PGO with instrumentation. Because that proposal is to do the instrumentation and the metadata generation in the front-end, so I've sent it to the cfe-dev list. Please have a look if you're interested.