Artificial deps and stores

Andy, et al.,

In ScheduleDAGInstrs::buildSchedGraph, the code for handling stores has this:

      if (!ExitSU.isPred(SU))
        // Push store's up a bit to avoid them getting in between cmp
        // and branches.
        ExitSU.addPred(SDep(SU, SDep::Artificial));

This code does not seem to be in any way specific to compares; and in any case, at least on the PPC A2, scheduling stores in between the compare and the branch would not be a bad thing (because the core is in order, and the compare has a 2-cycle latency, so if there is nothing else to do, a store would not be a bad thing to put there).

Can you explain the motivation for this (why or for what it is bad), and what else it might do (aside from the commented cmp/branch pairing)? I'm wondering if we should make this target dependent.

Thanks again,
Hal

Andy, et al.,

In ScheduleDAGInstrs::buildSchedGraph, the code for handling stores has this:

     if (!ExitSU.isPred(SU))
       // Push store's up a bit to avoid them getting in between cmp
       // and branches.
       ExitSU.addPred(SDep(SU, SDep::Artificial));

This code does not seem to be in any way specific to compares; and in any case, at least on the PPC A2, scheduling stores in between the compare and the branch would not be a bad thing (because the core is in order, and the compare has a 2-cycle latency, so if there is nothing else to do, a store would not be a bad thing to put there).

Can you explain the motivation for this (why or for what it is bad), and what else it might do (aside from the commented cmp/branch pairing)? I'm wondering if we should make this target dependent.

I don’t agree with the existing comment. It’s possible that somewhere, maybe in target specific code, we make use of the extra store->exit edge, but I can’t remember any reason for it now. You could

(a) Just nuke it and verify PPC and x86 benchmarks. If all looks good, then I’ll confirm by running arm.

(b) Make it target specific and add a FIXME to remove after performance analysis

-Andy

Do you have another mechanism for encouraging macro fusion?

Thanks,
/jakob

From: "Jakob Stoklund Olesen" <stoklund@2pi.dk>
To: "Andy Trick" <atrick@apple.com>
Cc: "Hal Finkel" <hfinkel@anl.gov>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, January 17, 2014 6:16:25 PM
Subject: Re: [LLVMdev] Artificial deps and stores

>
>
>> Andy, et al.,
>>
>> In ScheduleDAGInstrs::buildSchedGraph, the code for handling
>> stores has this:
>>
>> if (!ExitSU.isPred(SU))
>> // Push store's up a bit to avoid them getting in between
>> cmp
>> // and branches.
>> ExitSU.addPred(SDep(SU, SDep::Artificial));
>>
>> This code does not seem to be in any way specific to compares; and
>> in any case, at least on the PPC A2, scheduling stores in between
>> the compare and the branch would not be a bad thing (because the
>> core is in order, and the compare has a 2-cycle latency, so if
>> there is nothing else to do, a store would not be a bad thing to
>> put there).
>>
>> Can you explain the motivation for this (why or for what it is
>> bad), and what else it might do (aside from the commented
>> cmp/branch pairing)? I'm wondering if we should make this target
>> dependent.
>
> I don’t agree with the existing comment. It’s possible that
> somewhere, maybe in target specific code, we make use of the extra
> store->exit edge, but I can’t remember any reason for it now.

Do you have another mechanism for encouraging macro fusion?

I think that TII->shouldScheduleAdjacent, which is used by MacroFusion in MachineScheduler.cpp is supposed to do this.

-Hal

The MI scheduler does it now via a proper mechanism. However, for targets that haven’t switched over it could be a minor, temporary issue (the old mechanism shown above didn’t work well anyway).

For the time being it is probably best to leave it under a target option with a FIXME to avoid perturbing non-PPC/x86 targets.

-Andy

Since it doesn’t work well anyway, I’d say just remove it.

Thanks,
/jakob

From: "Andrew Trick" <atrick@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, January 17, 2014 6:03:46 PM
Subject: Re: Artificial deps and stores

> Andy, et al.,
>
> In ScheduleDAGInstrs::buildSchedGraph, the code for handling stores
> has this:
>
> if (!ExitSU.isPred(SU))
> // Push store's up a bit to avoid them getting in between
> cmp
> // and branches.
> ExitSU.addPred(SDep(SU, SDep::Artificial));
>
> This code does not seem to be in any way specific to compares; and
> in any case, at least on the PPC A2, scheduling stores in between
> the compare and the branch would not be a bad thing (because the
> core is in order, and the compare has a 2-cycle latency, so if
> there is nothing else to do, a store would not be a bad thing to
> put there).
>
> Can you explain the motivation for this (why or for what it is
> bad), and what else it might do (aside from the commented
> cmp/branch pairing)? I'm wondering if we should make this target
> dependent.

I don’t agree with the existing comment. It’s possible that
somewhere, maybe in target specific code, we make use of the extra
store->exit edge, but I can’t remember any reason for it now. You
could

(a) Just nuke it and verify PPC and x86 benchmarks. If all looks
good, then I’ll confirm by running arm.

r206795, thanks! Maybe one day I won't be so far behind on my TODO list :wink:

-Hal