Hi all,

While tinkering with saturation instructions, I hit problems with the

cost model calculations.

The loop vectorizer cost model accumulates the individual TTI cost

model of each instruction. For saturating arithmetic, this is a gross

overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects

(for the saturation), and a truncate (output); these all fold alway.

With an intrinsic, you'd end up with a better estimate; however, I'm

trying to see what problems we would encounter without intrinsics, and

I think this is the biggest one.

Note that AFAICT, costs for min/max patterns (icmp+iselect) are also

overestimated, but not as much as saturate.

Proposal:

Add a method, part of the vector API of TargetTransformInfo, for

multi-instruction cost computation. It would take a scalar

Instruction, and a reference to a set of Instruction. If it's able to

match a min/max/saturate/.., it adds all the matched instructions to

the set, so the caller (say LoopVectorizationCostModel) can ignore

them.

But:

- this all seems icky: a very blunt hammer.

- what, if anything, should we do about legality checks? The

expanded IR equivalent of a saturate uses larger types than necessary,

so this might prevent vectorization. In practice it doesn't, because

only load/store/PHI types are checked there.

- is this useful in other cases, beyond min/max (maybe abs ?) and saturate?

Thanks!

-Ahmed

I guess it will help SAD generation also.

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]

On Behalf Of Das, Dibyendu

Sent: Tuesday, January 20, 2015 2:42 PM

To: Ahmed Bougacha; LLVM Dev

Subject: Re: [LLVMdev] Vectorization Cost Models and Multi-Instruction

Patterns?

I guess it will help SAD generation also.

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]

On Behalf Of Ahmed Bougacha

Sent: Tuesday, January 20, 2015 5:21 AM

To: LLVM Dev

Subject: [LLVMdev] Vectorization Cost Models and Multi-Instruction

Patterns?

Hi all,

While tinkering with saturation instructions, I hit problems with the cost

model calculations.

The loop vectorizer cost model accumulates the individual TTI cost model of

each instruction. For saturating arithmetic, this is a gross overestimate, since

you have 2 sexts (inputs), 2 icmps + 2 selects (for the saturation), and a

truncate (output); these all fold alway.

With an intrinsic, you'd end up with a better estimate; however, I'm trying to

see what problems we would encounter without intrinsics, and I think this is

the biggest one.

Note that AFAICT, costs for min/max patterns (icmp+iselect) are also

overestimated, but not as much as saturate.

Proposal:

Add a method, part of the vector API of TargetTransformInfo, for multi-

instruction cost computation. It would take a scalar Instruction, and a

reference to a set of Instruction. If it's able to match a min/max/saturate/..,

Are you trying to match the multi-instruction with library calls or Intrinsics?

If it is intrinsic, you may also need to check legality before matching.

it adds all the matched instructions to the set, so the caller (say

LoopVectorizationCostModel) can ignore them.

But:

- this all seems icky: a very blunt hammer.

- what, if anything, should we do about legality checks? The expanded IR

equivalent of a saturate uses larger types than necessary, so this might

prevent vectorization. In practice it doesn't, because only load/store/PHI

types are checked there.

IMO, it will still prevent vectorization because the value used by min/max/saturate

might have been defined by load/store/phi.

- is this useful in other cases, beyond min/max (maybe abs ?) and saturate?

Yes, Sum-Of-Absolute-Differece is another case.

Regards,

Shahid