Loop Vectorize: Testing cost model driven transformations

Note: This is a continuation of a discussion over at https://reviews.llvm.org/D26869.

Hi all,

In a discussion over on llvm-commits, we are debating how best to test loop vectorization transformations that are guided by the cost model. The cost model is currently used primarily for determining the vectorization and interleave factors. Both of these parameters are easily overridden with command line flags, which enables us to create target-independent tests. Tests that rely on specific TTI hooks and instruction costs are placed under the target-specific directories. Target-independent tests are great because we don’t have to replicate them for all targets.

But we can write new transformations (e.g., https://reviews.llvm.org/D26083) that use the cost model to guide code generation in ways other than selecting the vectorization and interleave factors. That is, the code we generate can be different, depending on the target, even if we manually specify vectorization and interleave factors.

So the question is, if we perform some new transformation based on cost model results, how do we preserve the behavior of the existing target-independent tests (that only specify vectorization and interleave factors)? And also, how should we test the code that consumes the cost model results but doesn’t care about what those results are? Should the tests be target-specific or target-independent?

There is some precedent for using TTI hooks to enable some optimizations, with testing done in the target-specific directories. But if we expect an optimization to be enabled for all targets does this make sense?

We currently have a “-force-target-instruction-cost” flag we can use to get consistency in the instruction costs. But as it’s currently implemented, it prevents us from testing with target-independent tests the code for computing derived costs. For example, are we adding a scalarization overhead to the cost of an instruction that can’t be vectorized, are we correctly scaling the cost of a predicated instruction by block probability, etc. None of that logic is target-specific. Our current flag just overrides the “final” cost of each instruction. But this is somewhat of an aside since our existing target-independent tests don’t even use this flag to guarantee instruction costs.

Some alternatives that we’ve discussed so far include:

  • Adding a new command line option to use the default TTI. This would allow us to use default values for TTI queries to ensure consistency across targets for transformations driven by the cost model. The existing and future target-independent tests would have to be updated to use the new flag. They also may still choose to manually specify vectorization and interleave factors to force vectorization and interleaving regardless of what the cost model would compute using the default TTI.
  • Adding a command line option to enable/disable each cost model driven transformation we add. The existing and future target-independent tests would have to be updated to explicitly disable each optimization (or in some way force predictable behavior across targets like we do now when setting the vectorization and interleave factors). All tests that test a cost model driven optimization would be placed under a target-specific directory.

Does anyone have any thoughts or suggestions?

Thanks!

– Matt

Hi Matt,

Thanks for summarizing the discussion here.

Note: This is a continuation of a discussion over at https://reviews.llvm.org/D26869.

Hi all,

In a discussion over on llvm-commits, we are debating how best to test loop vectorization transformations that are guided by the cost model. The cost model is currently used primarily for determining the vectorization and interleave factors. Both of these parameters are easily overridden with command line flags, which enables us to create target-independent tests. Tests that rely on specific TTI hooks and instruction costs are placed under the target-specific directories. Target-independent tests are great because we don’t have to replicate them for all targets.

But we can write new transformations (e.g., https://reviews.llvm.org/D26083) that use the cost model to guide code generation in ways other than selecting the vectorization and interleave factors. That is, the code we generate can be different, depending on the target, even if we manually specify vectorization and interleave factors.

So the question is, if we perform some new transformation based on cost model results, how do we preserve the behavior of the existing target-independent tests (that only specify vectorization and interleave factors)? And also, how should we test the code that consumes the cost model results but doesn’t care about what those results are? Should the tests be target-specific or target-independent?

There is some precedent for using TTI hooks to enable some optimizations, with testing done in the target-specific directories. But if we expect an optimization to be enabled for all targets does this make sense?

We currently have a “-force-target-instruction-cost” flag we can use to get consistency in the instruction costs. But as it’s currently implemented, it prevents us from testing with target-independent tests the code for computing derived costs. For example, are we adding a scalarization overhead to the cost of an instruction that can’t be vectorized, are we correctly scaling the cost of a predicated instruction by block probability, etc. None of that logic is target-specific. Our current flag just overrides the “final” cost of each instruction. But this is somewhat of an aside since our existing target-independent tests don’t even use this flag to guarantee instruction costs.

Some alternatives that we’ve discussed so far include:

  • Adding a new command line option to use the default TTI. This would allow us to use default values for TTI queries to ensure consistency across targets for transformations driven by the cost model. The existing and future target-independent tests would have to be updated to use the new flag. They also may still choose to manually specify vectorization and interleave factors to force vectorization and interleaving regardless of what the cost model would compute using the default TTI.

Do we need a new (loop-vectorizer-specific) command line option for this? Don’t we get the default TTI if the target is unspecified in the test?

Adam

I think you're right! It looks like I am getting the default TTI when the
target is left unspecified. I was assuming it would default to whatever the
host is, but this doesn't seem to be the case. I guess this is a non-issue
after all, as long as we don't specify a triple in the target-independent
tests. And it looks like Michael cleaned that all up in r283512.

Thanks!

-- Matt

Thanks Matt!

So, just to make sure I understand, what is getting a specific TTI in llc triggered off? -mcpu?

Right, TTI would be determined by the target specified in the IR or set
explicitly with the -m flags. My understanding is that if the target is
left unspecified in the IR and not set with the -m flags, llc will generate
code for the default target listed in the output of "llc --version".

-- Matt

Right, let’s say what we get from llc --version is:

Default target: x86_64-unknown-linux-gnu
Host CPU: haswell

So, what we currently do is use the default target (which is normally the host target), but ignore the host cpu?

Michael

That’s right. In your example, if the target isn’t specified anywhere, an llc invocation would be equivalent to “llc -mtriple=x86_64-unknown-linux-gnu -mcpu=generic”. TTI queries (in e.g., CodeGenPrepare) would be based on this. From opt, if the target triple is left unspecified, we will use the “base” TTI implementation (not x86).

– Matt

Yeah, this makes a lot of sense, -mcpu=generic (as opposed to -mcpu=native) is the sane default.
I guess I was just expecting an x86 host to get a “generic x86 TTI” (whatever that means), not a “generic TTI”.

I think using "generic" is the way to go here, but it's still *not*
target-independent.

-march=x86 -mcpu=generic -> i586? (SSE?)
-march=x86_64 -mcpu=generic -> Pentium what? (SSE? AVX?)
-march=arm -mcpu=generic -> ARM7TDMI (ARMv4T, no VFP, no NEON)
-march=aarch64 -mcpu=generic -> Cortex-A53 (ARMv8, VFP + SIMD)

So, if you don't specify -march, but use -mcpu=generic, when you run
the tests on native ARM/AArch64, you'll get different results on your
target-independent tests if they can support vector code at all.

We currently build *only* the ARM back-ends on the ARM native bots
because building x86 takes a long time and that's being tested
elsewhere. That's why we get a lot of test failing when they expect
the x86 back-end to exist (via -march=x86_64).

My point is, there is no safe way to do target-independent tests of
the vectorizer if you don't force "some" parameters.

However, it should be mostly fine, as we don't really need all bots to
test all things. If the x86_64 bots are testing the generic IR
transformations and the ARM bots are testing the ARM specific ones,
we're mostly covered.

We may let slip a thing or two in the cost models, but other tests
(like self-hosting, test-suite, benchmarks) will eventually pick them
up. In this case, we should add a new target-specific test on the ARM
side to make sure we don't regress again.

Does that make sense?

cheers,
--renato

Why is llc relevant to this thread, is this just an aside? Target-independent tests for LV are formulated with opt.

It isn’t relevant, really, Matt just brought up “llc --version” as a way to show the default triple and native cpu.
The same question (“Which TTI do/should we get with -mcpu=generic / when not providing -mcpu at all”) applies to opt.

And just to be clear in case there was any confusion, in opt when a target
is not specified we get the generic TTI, not one for the host or default
triple indicated by llc. I think this was Adam's original point/question.
As long as the tests we want to remain target-independent don't specify a
target triple, we should get the same generic TTI on any host.

-- Matt

Yes opt an llc are completely different in this regard. llc always needs a target to generate code for. opt does not, it uses data layout and TTI (the default if not target is specified).

That is why I think bringing llc into the discussion was confusing, at least I think it confused Renato. My whole point was that you can write target-independent tests with opt but not with llc of course (which I think was Renato’s conclusion).

Adam

Yup. Thanks for clearing that up. :slight_smile:

cheers,
--renato

Oh, ok, so the difference in practice is which triple llc and opt default to - llc defaults to LLVM_DEFAULT_TARGET_TRIPLE, and opt defaults to “”.

I never realized that - I thought opt also defaults to the, well, default.
Thanks!