Cost model is missing in InstCombiner

Hi,

I think canEvaluateTruncated() in InstCombiner needs use cost model to decide whether perform optimization or not. Without cost model from TargetTransformInfo, aggressively optimizing IR in vector types according to the number of bits demanded may lead to scalarization of vector operations. For example, if the input IR is:

%wide.load25 = load <32 x i8>, <32 x i8>* %231, align 1

%232 = zext <32 x i8> %wide.load25 to <32 x i16>

%233 = mul nuw nsw <32 x i16> %232, %164

%237 = trunc <32 x i16> %233 to <32 x i8>

store <32 x i8> %237, <32 x i8>* %236, align 1

ICE: EvaluateInDifferentType converting expression type to avoid cast: %9 = trunc <32 x i16> %6 to <32 x i8>

IC: ADD: %6 = mul <32 x i8> %wide.load25, %wide.load

IC: Replacing %10 = trunc <32 x i16> %7 to <32 x i8>

with %6 = mul <32 x i8> %wide.load25, %wide.load

If the target doesn’t have support for mul <32 x i8>, the inst combiner will yield less profitable code.

Cheers,

Shixiong (Jason) Xu

+David M.

From: "Mehdi Amini via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Shixiong Xu" <shixiong@cadence.com>
Cc: llvm-dev@lists.llvm.org
Sent: Thursday, August 18, 2016 11:05:35 AM
Subject: Re: [llvm-dev] Cost model is missing in InstCombiner

+David M.

> Hi,

> I think canEvaluateTruncated() in InstCombiner needs use cost model
> to decide whether perform optimization or not.

I’ve always seen InstCombine as doing “canonicalization” of the IR
and not “optimization”. So the output of InstCombine should be in a
form that is the most suitable for further analyses and
transformations.

This is exactly our traditional view. Why can the backend not be fixed to generate better code for mul <32 x i8>? It looks like the widening in the IR is something natural to get from legalization (if you set up the correct promotion preferences in *ISelLowering).

-Hal

Thanks for your comments. I tried using the promotion to deal with operations on v32i8. It seems work well. As you mentioned “canonicalization”, I am wondering where I can find the document on canonicalized form of IR. When I saw truncate to minimal bitwidth in innerloop vectorization, I thought it was kind of optimization rather than canonicalization.

Shixiong

I don’t believe we document anything on this aspect.