[RFC] Project for GSoC: Unit/Regression testing for CodeGen

Hi everyone,

In response to yet-another fix in CodeGen affecting only an out-of-tree target (r231186), our lack of the ability to properly unit test CodeGen components has been highlighted. It was suggested that improving this situation might be a good GSoC project, and I agree, provided that we can settle on the scope and basic design ahead of time.

I'd like to add that I feel this is a serious problem even for in-tree targets. We currently construct IR-level tests for CodeGen components, but
this is very fragile. Many of the IR-level CodeGen tests, especially "bug-triggering" regression tests, don't currently test the logic they were originally designed to cover.

Now, for a design:

One idea that I've had for some time is to develop a 'mock' target for testing. For this target, all of the various type/operation legality settings would be determined by some input configuration file. It would contain instructions, mostly in 1:1 correspondence to our SelectionDAG node types, and many different register classes of different sizes, different calling-conventions, etc. (again, some input configuration file would determine which were active). We could then use this mock target to right regression tests for CodeGen components. We could also use it write units tests, especially at the MI level.

Thoughts?

-Hal

Hi everyone,

In response to yet-another fix in CodeGen affecting only an out-of-tree
target (r231186), our lack of the ability to properly unit test CodeGen
components has been highlighted. It was suggested that improving this
situation might be a good GSoC project, and I agree, provided that we can
settle on the scope and basic design ahead of time.

I'd like to add that I feel this is a serious problem even for in-tree
targets. We currently construct IR-level tests for CodeGen components, but
this is very fragile. Many of the IR-level CodeGen tests, especially
"bug-triggering" regression tests, don't currently test the logic they were
originally designed to cover.

Now, for a design:

One idea that I've had for some time is to develop a 'mock' target for
testing. For this target, all of the various type/operation legality
settings would be determined by some input configuration file. It would
contain instructions, mostly in 1:1 correspondence to our SelectionDAG node
types, and many different register classes of different sizes, different
calling-conventions, etc. (again, some input configuration file would
determine which were active). We could then use this mock target to right
regression tests for CodeGen components. We could also use it write units
tests, especially at the MI level.

Thoughts?

I don't think a mock target would help much with the fragility of these
tests - given the need to create sufficiently convoluted test cases to
tickle particular corner cases of codegen (eg: needing to max-out register
usage to investigate particular spilling situations, etc), it's easy for
these tests to break (either in the false positive sense (failing when the
bug/fix under test has not regressed) or false negative (tests silently
becoming irrelevant due to the codepath no longer being hit for this
input)).

My understanding is that the usually considered solution to this is a
textual machine IR so we can test different parts of codegen in relative
isolation - that way instruction selection improvements don't perturb
register allocation tests, etc. (not suggesting it's the only or best
solution, but seems to be the current notion that gets bandied about so far)

- David

Hi everyone,

In response to yet-another fix in CodeGen affecting only an out-of-tree
target (r231186), our lack of the ability to properly unit test CodeGen
components has been highlighted. It was suggested that improving this
situation might be a good GSoC project, and I agree, provided that we can
settle on the scope and basic design ahead of time.

I'd like to add that I feel this is a serious problem even for in-tree
targets. We currently construct IR-level tests for CodeGen components, but
this is very fragile. Many of the IR-level CodeGen tests, especially
"bug-triggering" regression tests, don't currently test the logic they were
originally designed to cover.

Now, for a design:

One idea that I've had for some time is to develop a 'mock' target for
testing. For this target, all of the various type/operation legality
settings would be determined by some input configuration file. It would
contain instructions, mostly in 1:1 correspondence to our SelectionDAG node
types, and many different register classes of different sizes, different
calling-conventions, etc. (again, some input configuration file would
determine which were active). We could then use this mock target to right
regression tests for CodeGen components. We could also use it write units
tests, especially at the MI level.

Thoughts?

I don't think a mock target would help much with the fragility of these
tests - given the need to create sufficiently convoluted test cases to
tickle particular corner cases of codegen (eg: needing to max-out register
usage to investigate particular spilling situations, etc), it's easy for
these tests to break (either in the false positive sense (failing when the
bug/fix under test has not regressed) or false negative (tests silently
becoming irrelevant due to the codepath no longer being hit for this
input)).

My understanding is that the usually considered solution to this is a
textual machine IR so we can test different parts of codegen in relative
isolation - that way instruction selection improvements don't perturb
register allocation tests, etc. (not suggesting it's the only or best
solution, but seems to be the current notion that gets bandied about so far)

I should say, in addition - textual machine IR would help the fragility you
mentioned, but probably not the "this out of tree usage needs fixing" you
described - so we might need both: textual IR and a fake target (I imagine
we'd probably need multiple fake targets, though - this might become
expensive).

An alternative to the fake target might be to use actual API-level unit
testing for parts of CodeGen, though I don't know exactly how that'd look -
what boundaries would be good to sure-up/encapsulate so they could be used
from a unit test, etc.

- David

Hi everyone,

In response to yet-another fix in CodeGen affecting only an out-of-tree target (r231186), our lack of the ability to properly unit test CodeGen components has been highlighted. It was suggested that improving this situation might be a good GSoC project, and I agree, provided that we can settle on the scope and basic design ahead of time.

I'd like to add that I feel this is a serious problem even for in-tree targets. We currently construct IR-level tests for CodeGen components, but
this is very fragile. Many of the IR-level CodeGen tests, especially "bug-triggering" regression tests, don't currently test the logic they were originally designed to cover.

Yes, yes! Doing something - anything - in the area would be great =)

Now, for a design:

One idea that I've had for some time is to develop a 'mock' target for testing. For this target, all of the various type/operation legality settings would be determined by some input configuration file. It would contain instructions, mostly in 1:1 correspondence to our SelectionDAG node types, and many different register classes of different sizes, different calling-conventions, etc. (again, some input configuration file would determine which were active). We could then use this mock target to right regression tests for CodeGen components. We could also use it write units tests, especially at the MI level.

This is arguably a different and bigger issue ("Target" vs "CodeGen"),
but that only helps for generic (lib/CodeGen) bugs, not for
target-specific ones (I'm thinking ISel, or stuff like AnalyzeBranch),
right?

For the latter, serializing MI (or more importantly the SelectionDAG)
has been floating around for a while.
I think that would help a lot, but once you start serializing an
internal representation, you don't know if it's still possible to get
to it from IR, so you have the same staleness problem we currently
have.
That's solved by adding a companion tests, that's written like we do
now (from IR), checking the serialized representation right before the
to-be-tested component. When someone makes a change that affects this
companion, they have an opportunity to re-evaluate the unit test as
well. A bit verbose, yes, but does it sound sensible?

Don't get me wrong, a mock target would be pretty simple and very
useful - enough for lib/CodeGen, so +1 for that by itself.

-Ahmed

> Hi everyone,
>
> In response to yet-another fix in CodeGen affecting only an out-of-tree
target (r231186), our lack of the ability to properly unit test CodeGen
components has been highlighted. It was suggested that improving this
situation might be a good GSoC project, and I agree, provided that we can
settle on the scope and basic design ahead of time.
>
> I'd like to add that I feel this is a serious problem even for in-tree
targets. We currently construct IR-level tests for CodeGen components, but
> this is very fragile. Many of the IR-level CodeGen tests, especially
"bug-triggering" regression tests, don't currently test the logic they were
originally designed to cover.

Yes, yes! Doing something - anything - in the area would be great =)

> Now, for a design:
>
> One idea that I've had for some time is to develop a 'mock' target for
testing. For this target, all of the various type/operation legality
settings would be determined by some input configuration file. It would
contain instructions, mostly in 1:1 correspondence to our SelectionDAG node
types, and many different register classes of different sizes, different
calling-conventions, etc. (again, some input configuration file would
determine which were active). We could then use this mock target to right
regression tests for CodeGen components. We could also use it write units
tests, especially at the MI level.

This is arguably a different and bigger issue ("Target" vs "CodeGen"),
but that only helps for generic (lib/CodeGen) bugs, not for
target-specific ones (I'm thinking ISel, or stuff like AnalyzeBranch),
right?

For the latter, serializing MI (or more importantly the SelectionDAG)
has been floating around for a while.
I think that would help a lot, but once you start serializing an
internal representation, you don't know if it's still possible to get
to it from IR, so you have the same staleness problem we currently
have.
That's solved by adding a companion tests, that's written like we do
now (from IR), checking the serialized representation right before the
to-be-tested component. When someone makes a change that affects this
companion, they have an opportunity to re-evaluate the unit test as
well. A bit verbose, yes, but does it sound sensible?

Arguably that state already exists with middle-end optimizations - we might
make a singular IR test for a particular optimization only to, later on,
change canonicalization upstream such that that particular IR construct is
never seen.

If we were to continue to have the full IR tests as we do today, I don't
think a lot would be gained by having the more precise tests - we'd still
have much of the flaky issues we do today (I guess we'd lose the false
negatives, I described - the narrow test would ensure the issue was always
covered - but we'd still have the false positives of the broad IR test).

Hi everyone,

In response to yet-another fix in CodeGen affecting only an out-of-tree target (r231186), our lack of the ability to properly unit test CodeGen components has been highlighted. It was suggested that improving this situation might be a good GSoC project, and I agree, provided that we can settle on the scope and basic design ahead of time.

I'd like to add that I feel this is a serious problem even for in-tree targets. We currently construct IR-level tests for CodeGen components, but
this is very fragile. Many of the IR-level CodeGen tests, especially "bug-triggering" regression tests, don't currently test the logic they were originally designed to cover.

Now, for a design:

One idea that I've had for some time is to develop a 'mock' target for testing. For this target, all of the various type/operation legality settings would be determined by some input configuration file. It would contain instructions, mostly in 1:1 correspondence to our SelectionDAG node types, and many different register classes of different sizes, different calling-conventions, etc. (again, some input configuration file would determine which were active). We could then use this mock target to right regression tests for CodeGen components. We could also use it write units tests, especially at the MI level.

One advantage of creating a mock target with a 1:1 correspondence of
SelectionDAG nodes to MachineInstrs, is it would give us a generic
MachineInstr instruction set, which is a requirement of Jakob's global
isel proposal. Maybe it would help kick start work on this.

Another idea I've had for a 'mock' target would be to create a skeleton
target that you can do 's/YourTarget/MockTarget/g' and instantly get a
target that builds.

This would have to be separate from any mock target used for testing,
but I think creating something like this as a side-effect of your
proposal would be nice.

-Tom

Hi,

Ultimately, I would love llc (or another tool) to work the same way as opt but for target specific IR to IR, IR to MIR, and MIR to MIR passes.

Even with such a tool, we would still need a mock target to exercise the tricky corner cases.

The bottom line is +1 for the mock target Hal suggest.

Thanks,
-Quentin