Advices Required: Best practice to share logic between DAG combine and target lowering?


** Problematic **
I am looking for advices to share some logic between DAG combine and target lowering.

Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine.

Let me know if there is another, better supported, approach for this kind of problems.

** Motivating Example **
The motivating example comes form the lowering of vector code on armv7.
More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types.

This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code.

Attached motivating_example.ll shows such a case:
llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -

ldr r0, [r1]
ldr r1, [r2]
vmov s1, r1
vmov s0, r0

Here each ldr, vmov sequences could have been replaced by a simple vld1.32.

** Proposed Solution **
Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free.
The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO).

Thanks for your help.



ARMISelLowering.patch (1.26 KB)

motivating_example.ll (1.09 KB)

I think you're approaching this backwards: the obvious thing to do is to
generate the insert_vector_elt sequence unconditionally, and DAGCombine
that sequence to a build_vector when appropriate.


Thanks Eli.

I will try that approach.


Hi Eli,

I have started to look into the direction you gave me.

I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert.

Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information).

What did I miss?



Err, wait, sorry, my fault; I missed that you only insert the bitcasts on
the other side of the branch.

You should be able to do it the other way, though: generate the
build_vector unconditionally, and pull insert_vector_elts out of it in a
DAGCombine. (At this point, you know whether DAGCombine will remove the
bit casts because if it could, it would have already done it.)


Make sense.
I will try that.



Thanks again Eli for the direction.

That works great!

I have sent a proposal with that solution.