[PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().

Hi,

I have revisited the issue in buildSchedGraph() I talked about previously, and attached a few patches. The first tries to fix the issue, and the other two try to illustrate associated issues, emerged from applying it.

Is it OK to commit the first patch?

    [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().

    Bugfix for missed dependency from store to load in buildSchedGraph().

    Background: When handling underlying objects for a store, the vector
    of previous mem uses, mapped to the same Value, is afterwards cleared
    (regardless of ThisMayAlias). This means that during handling of the
    next store using the same Value, adjustChainDeps() must be called,
    otherwise a dependency might be missed.

    For example, three spill/reload (NonAliasing) memory accesses using
    the same Value 'a', with different offsets:

        SU(2): store @a
        SU(1): store @a, Offset:1
        SU(0): load @a

    In this case we have:

    * SU(1) does not need a dep against SU(0). Therefore,SU(0) ends up in
      RejectMemNodes and is removed from the mem-uses list (AliasMemUses
      or NonAliasMemUses), as this list is cleared.

    * SU(2) needs a dep against SU(0). Therefore, SU(2) must check
      RejectMemNodes by calling adjustChainDeps().

    Previously, for store SUs, adjustChainDeps() was only called if
    MayAlias was true, missing the S(2) to S(0) dependency in the case
    above. The fix is to always call adjustChainDeps(), regardless of
    MayAlias, since this applies both for AliasMemUses and
    NonAliasMemUses.

    No test case found for any in-tree target.

Jonas Paulsson

0001-Bugfix-for-missed-dependency-from-store-to-load-in-b.patch (2.3 KB)

0002-Assert-added-to-trigger-on-edges-between-NonAlias-an.patch (4.92 KB)

0003-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch (4.05 KB)

From: "Jonas Paulsson" <jonas.paulsson@ericsson.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "Sanjin Sijaric" <ssijaric@codeaurora.org>, "Andrew Trick (atrick@apple.com)"
<atrick@apple.com>, llvm-commits@cs.uiuc.edu
Cc: "Mattias Eriksson V" <mattias.v.eriksson@ericsson.com>, llvmdev@cs.uiuc.edu, "Tom Stellard"
<thomas.stellard@amd.com>, "Sergei Larin" <slarin@codeaurora.org>, "Patrik Hägglund H"
<patrik.h.hagglund@ericsson.com>
Sent: Friday, January 30, 2015 4:23:50 AM
Subject: [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().

Hi,

I have revisited the issue in buildSchedGraph() I talked about
previously, and attached a few patches. The first tries to fix the
issue, and the other two try to illustrate associated issues,
emerged from applying it.

Is it OK to commit the first patch?

    [PATCH] Bugfix for missed dependency from store to load in
    buildSchedGraph().

    Bugfix for missed dependency from store to load in
    buildSchedGraph().

    Background: When handling underlying objects for a store, the
    vector
    of previous mem uses, mapped to the same Value, is afterwards
    cleared
    (regardless of ThisMayAlias). This means that during handling of
    the
    next store using the same Value, adjustChainDeps() must be
    called,
    otherwise a dependency might be missed.

Yes, LGTM. However, I'd like an extensive comment added to that part of the code explaining what is going on. That function has far too little in the way of commentary (RejectMemNodes itself even has no description). It is crucial that we change this. As far as I'm concerned, a comment that is as long and explanatory as this patch description is perfectly appropriate. [I say this admitting to having touched this function myself].

Also, I'd really prefer you use reviews.llvm.org for these patches in the future, and upload them with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface), it makes it much easier to see what is going on.

Regarding the other two patches, can you please explain what is going on? It is not obvious to me from the patches.

Thanks again,
Hal

Hi,

I have committed the patch now (svn id 228686).

Regarding the commenting you requested, I attach a patch. Feel free to make changes.

I found it difficult to explain what the code does in isolated places, and thus kept my
commenting quite short. This makes me feel like the code needs a bit of refactorization
to make it more simple and understandable.

Looking at the possibility of refactorization / redesign, I wonder what are the main strong
points of this implementation right now, in your opinion? The mapping to underlying objects
looks nice to me, but I am not sure I understand why to go through the trouble of clearing
lists and using a set of RejectMemNodes with adjustChainDeps(). It is very complex code, and
I wonder what is gained in the end here. Does anyone have any thoughts about it? Is it to handle
huge regions with tons of memory accesses well?

Regarding the other two patches:

The bugfix I just commited makes sure that adjustChainDeps() is called on an SU regardless of
MayAlias. It occoured to me that RejectMemNodes contains a mix of SUs from both Alias and
NonAlias sets, and that the separation of the two domains is lost. I became worried
that some performance might therefore be lost because of my bugfix.

Since I was not sure how big this impact might be, I made a fix for it quickly just to illustrate
what the problem is (one of the previously attached patches). Basically, when an SU's underlying
objects are analyzed, the results are remembered by putting the SU into one or both of the (added)
sets AliasMemUseDefMIs and NonAliasMemUseDefMIs. Later, MIsNeedChainEdge() can return
false, if they were originally in different domains:

  // A NonAliasing node cannot alias an AliasingNode. This is the case
  // where MIa had only non-aliasing underlying objects and MIb had
  // only aliasing underlying objects, or vice versa.
  if ((AliasMemUseDefMIs.count(MIa) && !NonAliasMemUseDefMIs.count(MIa) &&
       !AliasMemUseDefMIs.count(MIb) && NonAliasMemUseDefMIs.count(MIb)) ||
      (!AliasMemUseDefMIs.count(MIa) && NonAliasMemUseDefMIs.count(MIa) &&
       AliasMemUseDefMIs.count(MIb) && !NonAliasMemUseDefMIs.count(MIb)))
    return false;

This should then be an improvement to MIsNeedChainEdge(), although I am not sure if there is
a more elegant solution, which I invite you to suggest.

I also supplied a patch with a lot of ugly code that had the original purpose of finding a test case
on an in-tree-target for the bug I had discovered. You could ignore that patch now, I think.
( An assert would trigger if there was - with the bugfix in place - an edge inserted in the NonAlias case,
which would not have been done before the bugfix.)

I attach the patch for MIsNeedChainEdge() again, and would appreciate some feedback.

/Jonas Paulsson

0001-Two-comments-in-ScheduleDAGInstrs.cpp.patch (1.45 KB)

0002-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch (4.05 KB)

Since you’ve taken the time to understand the algorithm, and are actively benchmarking, it would be great to take advantage of the opportunity to rewrite. Feel free to modify even the core DAG builder implementation, which hasn’t really changed since inception–way before my time. The alias-aware scheduling has been around long enough that it makes sense to redesign the core DAG builder code around it, rather than bolt it on. I agree, in its current form, the AA-aware logic is too hard to follow.

Andy

Hi,

I would be happy to give it a try :slight_smile:

The fact that AA was added at a later point explains the situation a bit, as much fewer SUs should end up in RejectMemNodes without it.

RejectMemNodes is bad in that it mixes all the SUs together again, after having gone through the work of separating them by analyzing their underlying objects.

It is also very confusing to have two “stages” of dependency checking, first by mapping an SU to one or more Values, and then to still

“check everything” that may have been missed.

I would like to try to remove RejectMemNodes (and adjustChainDeps() and iterateChainSucc()) and then simply not clear the MemUses list (while handling a store). “If an SU gets a Value mapping, keep it”.

If that list grows bigger than a certain limit, intelligent alias querying could stop, just as it stops now in iterateChainSucc when *Depth > 200.

But it would then at least be done against the right set of SUs, not against “all SUs that sometime did not need an edge”.

What do you think about this, anyone?

/ Jonas Paulsson

Hi,

I would be happy to give it a try :slight_smile:

The fact that AA was added at a later point explains the situation a bit, as much fewer SUs should end up in RejectMemNodes without it.

RejectMemNodes is bad in that it mixes all the SUs together again, after having gone through the work of separating them by analyzing their underlying objects.
It is also very confusing to have two "stages" of dependency checking, first by mapping an SU to one or more Values, and then to still
"check everything" that may have been missed.

I would like to try to remove RejectMemNodes (and adjustChainDeps() and iterateChainSucc()) and then simply not clear the MemUses list (while handling a store). “If an SU gets a Value mapping, keep it”.
If that list grows bigger than a certain limit, intelligent alias querying could stop, just as it stops now in iterateChainSucc when *Depth > 200.
But it would then at least be done against the right set of SUs, not against "all SUs that sometime did not need an edge".

What do you think about this, anyone?

My main requirements are
- AA-aware dependencies are still off by default
- default behavior is (obviously) not impacted
- default compile time does not significantly increase

I’m not heavily invested in the AA-aware part of the design, so I’ll leave that to you, Hal, Sergei, and whoever else is using it.

Andy

Hi,

http://reviews.llvm.org/D7850

I have now worked on this for a while and you can inspect my results by following the link above.

/Jonas

PS (It seems that phabricator failed to mail llvm-commits for some reason. I added llvm-commits as subscriber in the browser, but nothing happened).

PS (It seems that phabricator failed to mail llvm-commits for some reason. I added llvm-commits as subscriber in the browser, but nothing happened).

It’s because you didn’t write a comment at the same time. The LLVM Phabricator has some modifications so that it doesn’t send emails for comment-less changes like accepting a patch or adding subscribers.