Vectorization Cost Models and Multi-Instruction Patterns?

Hi all,

While tinkering with saturation instructions, I hit problems with the
cost model calculations.

The loop vectorizer cost model accumulates the individual TTI cost
model of each instruction. For saturating arithmetic, this is a gross
overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects
(for the saturation), and a truncate (output); these all fold alway.
With an intrinsic, you'd end up with a better estimate; however, I'm
trying to see what problems we would encounter without intrinsics, and
I think this is the biggest one.
Note that AFAICT, costs for min/max patterns (icmp+iselect) are also
overestimated, but not as much as saturate.

Proposal:

Add a method, part of the vector API of TargetTransformInfo, for
multi-instruction cost computation. It would take a scalar
Instruction, and a reference to a set of Instruction. If it's able to
match a min/max/saturate/.., it adds all the matched instructions to
the set, so the caller (say LoopVectorizationCostModel) can ignore
them.

But:
- this all seems icky: a very blunt hammer.
- what, if anything, should we do about legality checks? The
expanded IR equivalent of a saturate uses larger types than necessary,
so this might prevent vectorization. In practice it doesn't, because
only load/store/PHI types are checked there.
- is this useful in other cases, beyond min/max (maybe abs ?) and saturate?

Thanks!

-Ahmed

I guess it will help SAD generation also.

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Das, Dibyendu
Sent: Tuesday, January 20, 2015 2:42 PM
To: Ahmed Bougacha; LLVM Dev
Subject: Re: [LLVMdev] Vectorization Cost Models and Multi-Instruction
Patterns?

I guess it will help SAD generation also.

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Ahmed Bougacha
Sent: Tuesday, January 20, 2015 5:21 AM
To: LLVM Dev
Subject: [LLVMdev] Vectorization Cost Models and Multi-Instruction
Patterns?

Hi all,

While tinkering with saturation instructions, I hit problems with the cost
model calculations.

The loop vectorizer cost model accumulates the individual TTI cost model of
each instruction. For saturating arithmetic, this is a gross overestimate, since
you have 2 sexts (inputs), 2 icmps + 2 selects (for the saturation), and a
truncate (output); these all fold alway.
With an intrinsic, you'd end up with a better estimate; however, I'm trying to
see what problems we would encounter without intrinsics, and I think this is
the biggest one.
Note that AFAICT, costs for min/max patterns (icmp+iselect) are also
overestimated, but not as much as saturate.

Proposal:

Add a method, part of the vector API of TargetTransformInfo, for multi-
instruction cost computation. It would take a scalar Instruction, and a
reference to a set of Instruction. If it's able to match a min/max/saturate/..,

Are you trying to match the multi-instruction with library calls or Intrinsics?
If it is intrinsic, you may also need to check legality before matching.

it adds all the matched instructions to the set, so the caller (say
LoopVectorizationCostModel) can ignore them.

But:
- this all seems icky: a very blunt hammer.
- what, if anything, should we do about legality checks? The expanded IR
equivalent of a saturate uses larger types than necessary, so this might
prevent vectorization. In practice it doesn't, because only load/store/PHI
types are checked there.

IMO, it will still prevent vectorization because the value used by min/max/saturate
might have been defined by load/store/phi.

- is this useful in other cases, beyond min/max (maybe abs ?) and saturate?

Yes, Sum-Of-Absolute-Differece is another case.

Regards,
Shahid