RFC: GLIBCXX_DEBUG ScheduleDAG Patch

Attached is a patch to fix a GLIBCXX_DEBUG error in ScheduleDAGRRList.

The problem is that calls to CapturePred may reprioritize elements in the
priority queue, violating streak weak ordering requirements.

To fix this, I introduced a reference wrapper for containers to obtain access
to the SUnitVec used by std::priority_queue. When CapturePred runs, it
calls updateNode which does a std::make_heap on the underlying SUnitVec.
This restores the correct ordering.

I believe this should also make the scheduler run more like it is supposed to.
Previously, the priority queue ordering was incorrect so we weren't necessarly
scheduling correctly.

The container_reference_wrapper part may be controversial. So I want to
get some opinions on this before I submit it.

Thanks.

                                           -Dave

ScheduleDAG.patch (7.8 KB)

Hi Dave,

This looks great to me!

My only concern is the potential compile time impact. Do you see any? Also, please update the license portion to match what Chris sent out a couple of days ago. I don't see any issue with bringing Boost code into llvm tree. However, does it make sense to move the license to the top of the file? Chris?

Evan

I don't notice any, but then I'm not particularly looking for that either.
I'll run some tests.

I also accidentally included some debugging code I added to track
down this prioritization problem (the queue dumping code). I'll remove
that before I commit the change.

I'll wait for Chris' comments re: licensing.

                                                  -Dave

My only concern is the potential compile time impact. Do you see any?

I don't notice any, but then I'm not particularly looking for that either.
I'll run some tests.

Thanks. 5% slowdowns are generally not noticable, but we care :slight_smile:

I also accidentally included some debugging code I added to track
down this prioritization problem (the queue dumping code). I'll remove
that before I commit the change.

I'll wait for Chris' comments re: licensing.

Re licensing, we have no problem slurping in boost code. Please do make it follow the LLVM standards (80 cols and a doxygen comment above each class saying what it is for). For an example of how to do the license, see include/llvm/ADT/scoped_ptr.h.

Thanks for finding these bugs David!

-Chris

>> My only concern is the potential compile time impact. Do you see any?
>
> I don't notice any, but then I'm not particularly looking for that
> either. I'll run some tests.

Thanks. 5% slowdowns are generally not noticable, but we care :slight_smile:

After some very simple testing, I see slowdowns of around 1.7%. I assume
this is ok, but want to check.

Re licensing, we have no problem slurping in boost code. Please do make
it follow the LLVM standards (80 cols and a doxygen comment above each
class saying what it is for). For an example of how to do the license,
see include/llvm/ADT/scoped_ptr.h.

No problem.

                                       -Dave

My only concern is the potential compile time impact. Do you see any?

I don't notice any, but then I'm not particularly looking for that
either. I'll run some tests.

Thanks. 5% slowdowns are generally not noticable, but we care :slight_smile:

After some very simple testing, I see slowdowns of around 1.7%. I assume
this is ok, but want to check.

Can you clarify? Is this 1.7% slowdown of scheduling time or overall codegen time? If it's the later, then it seems a bit too much. Also, please test it with all the MultiSource/Applications.

Thanks,

Evan

It's 1.7% overall.

Does the nightly test build provide compile-time information? I see from
the Makefile that it uses -time-passes. I assume the "Today's LLC compile"
header on the nightly page comes from that and reflects the compile time
for the test.

                                          -Dave

After some very simple testing, I see slowdowns of around 1.7%. I
assume
this is ok, but want to check.

Can you clarify? Is this 1.7% slowdown of scheduling time or overall
codegen time? If it's the later, then it seems a bit too much. Also,
please test it with all the MultiSource/Applications.

It's 1.7% overall.

That seems somewhat steep. Can you see how much of the scheduling slow down there is?

Does the nightly test build provide compile-time information? I see from
the Makefile that it uses -time-passes. I assume the "Today's LLC compile"
header on the nightly page comes from that and reflects the compile time
for the test.

The better way is to add a custom report for this. See TEST.llc.Makefile and TEST.llc.report under llvm-test as an example. You can do make TEST=llc to generate custom reports. If this change can somehow be optionally controlled, then you can set it as llcbeta option and write a custom report that compare the time spend in some particular passes (TEST.beta-compare is an example, but it comparesf number instructions).

The biggest problem is right now time spent in scheduling is lumped into "DAG->DAG Instruction Selection" time because it's part of the pass. It would be nice if we can somehow bring it out.

Evan

>>> After some very simple testing, I see slowdowns of around 1.7%. I
>>> assume
>>> this is ok, but want to check.
>>
>> Can you clarify? Is this 1.7% slowdown of scheduling time or overall
>> codegen time? If it's the later, then it seems a bit too much. Also,
>> please test it with all the MultiSource/Applications.
>
> It's 1.7% overall.

That seems somewhat steep. Can you see how much of the scheduling
slow down there is?

I got some times from the nightly report, so this is overall compile time.

The worst slowdown is on timberwolfmc "llc compile" which has a 5%
slowdown. The JIT slows down 6%.

Everything else looks to be 1% or less. In some cases the times with
the change are better, probably because this change gets rid of the "pop
everything off and push it back on" way of recreating the heap.[

The better way is to add a custom report for this. See
TEST.llc.Makefile and TEST.llc.report under llvm-test as an example.
You can do make TEST=llc to generate custom reports. If this change
can somehow be optionally controlled, then you can set it as llcbeta
option and write a custom report that compare the time spend in some
particular passes (TEST.beta-compare is an example, but it comparesf
number instructions).

I'll look at this. But it would be nice to get this change (or something
similar) in ASAP. The scheduler is just broken right now. So either we
have a compiler with a slight slowdown in some cases or we have a
compiler that exhibits undefined behavior.

The biggest problem is right now time spent in scheduling is lumped
into "DAG->DAG Instruction Selection" time because it's part of the
pass. It would be nice if we can somehow bring it out.

Dunno if I can dig that deeply into this right now. I'll have to look at
what's involved here.

                                             -Dave

Oops, I don't see these files. I just updated this morning.

                                         -Dave

The better way is to add a custom report for this. See
TEST.llc.Makefile and TEST.llc.report under llvm-test as an example.

Oops, I don't see these files. I just updated this morning.

They are in the test-suite module (aka llvm-test).

-Tanya

Yes, that's where I looked. Strange.

                                             -Dave

Oops, I was looking in the build directory. I found it now.

                                         -Dave

Just want to send a ping on this. This patch is still waiting to go in. Is
the compile time hit too much? Note that sometimes compile time
improves with this patch.

I'd like to get this in ASAP so I can start merging other things back to
upstream.

                                                      -Dave

Have you tested this on SPEC (especially spec2006)? JIT slow down of 6% is really too steep a price to pay.

I'll test this patch when I find some time.

Evan

David Greene wrote:

Just want to send a ping on this. This patch is still waiting to go in. Is
the compile time hit too much? Note that sometimes compile time
improves with this patch.

I'd like to get this in ASAP so I can start merging other things back to
upstream.
  

Please see bug #2155, I am seeing an additional testsuite failure with
ScheduleDAG patch.
I am not saying there is any problem with ScheduleDAG patch itself, it
may be just exposing a previously hidden bug.

[I tried to add you to the Cc: of the bugreport, but bugzilla didn't
find your email address]

Best regards,
--Edwin

This patch has been superseded by a patch from Roman Levenstein.
The backtrace in the bug doesn't give much help. I have seen a few more
GLIBCXX_DEBUG failures as I've updated my owrking copy. I've had
trouble running "make install" lately, though, so I haven't tested in a while.

                                           -Dave