InstCombine wrongful (?) optimization on BinOp with SameOperands

Hi all,
I have been looking at the way LLVM optimizes code before forwarding it to the backend I develop for my company and while building
define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 {
entry:
%conv = zext i32 %x to i64
%conv1 = zext i32 %y to i64
%mul = mul nuw i64 %conv1, %conv
%shr = lshr i64 %mul, 32
%xor = xor i64 %shr, %mul
%conv2 = trunc i64 %xor to i32
ret i32 %conv2
}

I came upon the following optimization (during instcombine):
IC: Visiting: %mul = mul nuw i64 %conv, %conv1
IC: Visiting: %shr = lshr i64 %mul, 32
IC: Visiting: %conv2 = trunc i64 %shr to i32
IC: Visiting: %conv3 = trunc i64 %mul to i32
IC: Visiting: %xor = xor i32 %conv3, %conv2
IC: ADD: %xor6 = xor i64 %mul, %shr
IC: Old = %xor = xor i32 %conv3, %conv2
New = = trunc i64 %xor6 to i32

which seems to be performed by SDValue DAGCombiner::SimplifyBinOpWithSameOpcodeHands(SDNode *N)

In my backend’s architecture truncate is free, but zext is not (and i64 is not a desirable type for xor or any binary operation in general), so I would expect this optimization to be bypassed but because of the following statement :
(N0.getOpcode() == ISD::TRUNCATE && (!TLI.isZExtFree(VT, Op0VT) || !TLI.isTruncateFree(Op0VT, VT))
it is not (as isZExtFree return false for my architecture while isTruncateFree returns true). The comment on binop simplification says that binop over truncs should be optimize only if trunc is not free, so I do not understand the point of adding !isZExtFree at this point.
Can someone enlighten my ignorance on this optimization ?

best regards,
Nicolas Brunie

From: "Nicolas Brunie via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Wednesday, September 30, 2015 1:01:52 AM
Subject: [llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands

Hi all,
I have been looking at the way LLVM optimizes code before forwarding
it to the backend I develop for my company and while building
define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 {
entry:
%conv = zext i32 %x to i64
%conv1 = zext i32 %y to i64
%mul = mul nuw i64 %conv1, %conv
%shr = lshr i64 %mul, 32
%xor = xor i64 %shr, %mul
%conv2 = trunc i64 %xor to i32
ret i32 %conv2
}

I came upon the following optimization (during instcombine):
IC: Visiting: %mul = mul nuw i64 %conv, %conv1
IC: Visiting: %shr = lshr i64 %mul, 32
IC: Visiting: %conv2 = trunc i64 %shr to i32
IC: Visiting: %conv3 = trunc i64 %mul to i32
IC: Visiting: %xor = xor i32 %conv3, %conv2
IC: ADD: %xor6 = xor i64 %mul, %shr
IC: Old = %xor = xor i32 %conv3, %conv2
New = <badref> = trunc i64 %xor6 to i32

which seems to be performed by SDValue
DAGCombiner::SimplifyBinOpWithSameOpcodeHands(SDNode *N)

You might have figured this out by now, but no, InstCombine and DAGCombine are two completely different pieces of code. One is driven by the code in lib/Transforms/InstCombine/* and the other in lib/CodeGen/SelectionDAG/DAGCombiner.cpp. InstCombine's job is to move the IR toward our chosen canonical form, which is designed to simplify operations in a way that exposes further optimization opportunities (as well as being generally beneficial). It does not take target costs into account.

In my backend's architecture truncate is free, but zext is not (and
i64 is not a desirable type for xor or any binary operation in
general),

Why, then, have you listed i64 as a legal type?

-Hal

----- Mail original -----