Is there an established way of disabling a DAG combine on a per target basis, where it appears to be detrimental to the generated code? Writing if (!mytarget) in DAGCombiner.cpp works but tends to be erased by git merge and generally doesn’t look ideal. Writing the inverse transform in target specific code doesn’t work in this instance and in general creates an infinite loop.
Guidance would be very welcome!
For the curious, the specific instance I would like to avoid is reduceBuildVecToShuffle. It doesn’t seem to have any target specific hooks. Exhaustive testing of x86-64 vector code doesn’t show the error. I think the other in tree targets would notice the vector transform getting the answer wrong (it’s harder to confirm without hardware), so the bug is probably inert for in tree targets.
Given a v4f16 instance t2, the DAG describes building a v2f16 vector from elements [0, 2]. The combine translates this to building a vector from elements [0, 0]. The problem seems to be treating extract_subvector with different constants as instances of the same value.
t14: v2f16 = extract_subvector t2, Constant:i32<2>
t15: f16 = extract_vector_elt t14, Constant:i32<0>
t16: v2f16 = extract_subvector t2, Constant:i32<0>
t17: f16 = extract_vector_elt t16, Constant:i32<0>
t9: v2f16 = BUILD_VECTOR t17, t15
… into: t19: v2f16 = vector_shuffle<0,0> t16, undef:v2f16 // fail
There is no way to opt out of the DAG combiner. Some of what it does is actually a prerequisite for legalization, so it cannot be completely disabled. There has been some talk about the issue of having optimization and "canonicalization" intertwined with each other, and about potentially separating the two, but nothing concrete has been done about it (yet).
From your description it seems like you are seeing an incorrect behavior. If that's the case, it should definitely be fixed. Could you provide the complete DAG before and after the erroneous transformation?
Indeed. The shuffle_vector replaces the build_vector shortly after visitBUILD_VECTOR returns. This dump is immediately before and after visitBUILD_VECTOR calls reduceBuildVecToShuffle. The shuffle_vector<0,0> then gets lowered as usual, to code that doesn’t implement shuffle_vector<0,2>.
There are other cases where the indexing calculation works out correctly, but for some reason it misfires on this input DAG. The implementation involves chasing indices through intermediate data structures and I don’t have an adequate handle on how it is intended to work. The control flow constructs a single shuffle node with a vector mask calculated from previous information and then returns, so at least the shuffle combination code at the end of the function isn’t involved.
Disabling reduceBuildVecToShuffle fixes the problem completely for my back end, but I’d rather get to the root cause.