Problem ScheduleDAG on PowerPC, X86 works fine.

Long story short: https://llvm.org/bugs/show_bug.cgi?id=31890

The backend fails to schedule a given DAG, the reason being that there is an instruction and it glue that needs to be broken apart as they can’t be scheduled consecutively. See attached file for a picture of the DAG.

Not sure what’s the best course of action is, and not sure why this isn’t a problem for the X86 backend either. I’m looking for advice on the best course of actions. As I see it, the option are:

1/ add extra logic in the DAGCombiner to make sure this doesn’t happen. I don’t see a way this could be done cheaply and overall I don’t think this is the best option/

2/ Have the ScheduleDAG machinery detect this case and break up the glue, for instance via breaking up (adde X, Y, Carry) into (add (add X, Y)n (adde 0, 0, Carry)) or something alike when the situation present itself.

3/ Do whatever the X86 backend does, which I’m not sure what it is.

Advice ?

Thanks in advance,

Amaury SECHET

That’s seems really odd that ADDC/ADDE uses glue there, instead of a plain value.

The x86 backend has code that converts the glue into a value, which is why it wasn’t affected… (LowerADDC_ADDE_SUBC_SUBE).

Would it not make sense to refactor the code so those don’t use glue rather than emitting them with glue and then getting rid of it. There are times when we would like to emit these in separate blocks but can’t (presumably because of the glue).

Making this a value instead of a glue looks like a good longer term solution, but it doesn’t quite cut it as a short term one. It would require to change a fair amount of code in the DAG stack, plus in each backend.

@James,
do you think doing something similar to what the X86 backend does in the PowerPC one would cut it ?

I don’t think that’d work, because it leaves all other backends broken. AFAICT, your transform is simply not a legal transform, with the way the ADDC/ADDE opcodes are currently defined, and to do it you really need to fix the opcode definitions to not involve glue, first.

I also note that your transform doesn’t actually trigger at all on this particular test case on x86, because the dag ends up looking like (uaddo X, (adde Y, 0, Carry)), which the transform doesn’t match. That is not really relevant to the issue here, because the x86 backend does work even if the transform gets triggered (replacing the final store with “store i64 %add37, i64* %r, align 8” will make this happen)

I agree with James here. If getting rid of the glue is required for functional correctness, then:

  1. all targets should be made to get rid of the glue
  2. we should not emit the glue to begin with

Solution 1. seems like the inferior of the two. But if we can’t do one or the other in the short term, I believe we shouldn’t add the transform that requires them.

Well I don’t think this would break most backend. The opcode is generate only if the backend allows so to boot, so many backend actually do not use it at all.

Anyway, changing these opcode is kind of much broader scope than what I anticipated, but I maybe can make it work. What do you think would be an appropriate type for the carry instead of glue ?

I’d think i1 would be the proper and correct choice for a carry flag for the generic instruction. I expect that would also make UADDO/USUBO redundant with ADDC/SUBC (which would seem a good outcome).

You’d need to make sure the right thing happened when converting from ADDC’s 1-bit carry in/out to X86ISD::AD[DC]'s EFLAGS i/o. Right now the conversion can get away with assuming that the only thing which can be used as input to ADDC is the glue-output of ADD/ADDC. That would no longer be the case once they’re using a normal value.

(Actually, I wonder if it would be possible to represent x86’s EFLAGS.CARRY as a 1-bit subreg of EFLAGS, and use that on input to X86ISD::ADC; that may allow cheaper spills/reloads if it only has to save the one bit?).

So the plan would be:

1/ Use UADDO instead of ADDC.
2/ Deprecate and remove ADDC.

3/ Have ADDE/SUBE accept a boolean as carry input instead of a glue.

There are still one thing I don’t really understand: how come the X86 backend end up avoiding turnning into the problem altogether to begin with. I think that’s an important piece of the puzzle.

More on that front. I commented out the part of DAGTypeLegalizer::ExpandIntRes_ADDSUB that legalize using ADDC/ADDE so it falls back on using UADDO. The generated DAG on X86 looks legit, but I end up with 66 test failures, most of them seems to be due to trunc i8 X to i32 ending up being generated. Help !

So after some hacking I came up with https://reviews.llvm.org/D29872

There are still 2 problems with it, one come from compute known bits not working on uaddo, and the second won is due to a sbb/and pair being generated instead of a setb . I don’t think these two are blockers, and I would to get feedback on the whole thing before I dig into getting these to work.

Thanks,

Amaury SECHET