I have a custom lowering operation on ISD::BITCAST for the PowerPC/SPE
target, to convert 'f64 bitcast (i64 build_pair i32, i32)' into a 'f64
BUILD_SPE64 i32, i32' node, which can be seen at ⚙ D54583 PowerPC: Optimize SPE double parameter calling setup. However, when building compiler-rt's
lib/builtins/divdc3.c an assertion is triggered that BUILD_PAIR is not
legal on line 24. There should be no bitcast(buildpair) anywhere in
the generation, as it should all be lowered. However, this is not the
case when lowering to a libcall it seems. The relevant debug output,
is here:
It sounds like the legalizer is lowering fmaxnum to a libcall because it is not a legal node for f64 and in doing so, it is producing the build_pair to reassemble the results of the libcall. And presumably, it is assuming that the new nodes do not need legalization or something along those lines.
Justin, it would probably be good if you could provide the DAG before and after legalization both with and without your patch. Then we can see how it was handled before your patch and how it is handled now and the difference should allude to the problem.
I don't think this is a bug in ConvertNodeToLibCall - rather, it's
just a case where you can be tripped up by the very late lowering of
intrinsics calls. This takes place post-legalisation. The issue isn't
the build_pair, it's the fact that you're constructing an i64 prior to
bitcasting to f64, and i64 is not legal on your target. Because these
bitcasts can be introduced so late, a custom bitcast lowering strategy
won't work.
It might be helpful to look at how I handle passing f64 on RISC-V. See
BuildPairF32 and SplitF64 in lib/Target/RISCV. I can't guarantee this
is the _best_ way of handling this sort of problem, but I did explore
quite a lot of options including a similar approach to the custom
bitcast lowering strategy you use here. I'm following this thread
closely to see if there are suggestions for better ways of handling
this issue. The custom inserter shouldn't be necessary in your case -
for RV32FD we don't have an instruction to directly transfer from a
64-bit FPR to the 32-bit GPRs.
The behavior is the same prior to my patch and with my patch, with
regard to this specific behavior. The custom BITCAST lowerer doesn't
introduce any change with this. In fact, it was reported on https://reviews.llvm.org/D49754 as behavior introduced with the
original SPE additions, so is inherent in this target from the
beginning. So, it seems to me that the new nodes added in the libcall
lowering should themselves be lowered, unless I should do as Alex
alluded to, and add new legal nodes that themselves handle the argument
splitting and reforming. But it seems to me that the machinery to do
that already exists, but might need another pass?
Thanks for your replies. I spent a little more time hacking away at
this, and came with the following:
If
1) we allow custom actions on the legalizing node in LegalizeOp() and
2) I add a custom action for BITCAST on f64
Everything works swimmingly. I'm not sure if the custom action on
BITCAST f64 is correct, though, since it's not needed anywhere else.
Everywhere else it works fine with a BITCAST i64, so maybe I'm not
thinking of the problem correctly.
I'm attaching a patch that builds on D54583 and implements what we
discussed on IRC earlier today. Particularly:
* Make LowerCallTo() a virtual function, so it can be wrapped by a
subclass.
* Implement LowerCallTo() in PPCTargetLowering to wrap
TargetLowering::LowerCallTo() and legalize the return node when
targeting SPE.
* Augment PPCTargetLowering::LowerCall_32SVR4() to legalize MVT::f64
arguments that have been pre-processed into
EXTRACT_ELEMENT(i64 BITCAST f64, 0/1)
The purpose of this being to legalize intermediate illegal types
post-type legalization.
Is there a better approach? Comments from anyone else?
The changes seem fine to me. I don’t think this is excessively intrusive and it accomplishes what is needed by targets whose call lowering can introduce illegal types.
Adding Justin Bogner as the owner of SDAG and Hal Finkel as the PPC back end owner for their opinions.
Aside from the fact that you’re checking for i64 specifically instead of generally checking for illegal types, how much of this is really PPC specific? Would this be a reasonable enhancement to the SDAG logic in general?
It's really not at all PPC specific. The goal is to legalize DAGs that
have an intermediate illegal type to satisfy the calling convention. I
don't know if other ABIs have this, but the reason this is an issue for
SPE is that it uses 64-bit doubles in registers, but passes these
around in 32-bit GPR pairs, so necessitates having an intermediate
illegal i64 type. If this is specific to PPC, then it might make more
sense to stay PPC-specific with the trivial virtual override. But if
other ABIs have this requirement (does ARM EABI have this?) then
pushing it into the general SDAG logic is probably better. It really
just would mean adding a legalizing pass over the call and return nodes
immediately, or pushing it up to the Libcall logic, which started this
thread in the first place.
It's really not at all PPC specific. The goal is to legalize DAGs that
have an intermediate illegal type to satisfy the calling convention. I
don't know if other ABIs have this, but the reason this is an issue for
SPE is that it uses 64-bit doubles in registers, but passes these
around in 32-bit GPR pairs, so necessitates having an intermediate
illegal i64 type. If this is specific to PPC, then it might make more
sense to stay PPC-specific with the trivial virtual override. But if
other ABIs have this requirement (does ARM EABI have this?) then
pushing it into the general SDAG logic is probably better. It really
just would mean adding a legalizing pass over the call and return nodes
immediately, or pushing it up to the Libcall logic, which started this
thread in the first place.
Thanks. I'm wondering, in particular, whether adding a generalized
version of the logic in your LowerCallTo override into the base version
would be meaningful / cause behavioral changes for existing targets?
> It's really not at all PPC specific. The goal is to legalize DAGs
> that have an intermediate illegal type to satisfy the calling
> convention. I don't know if other ABIs have this, but the reason
> this is an issue for SPE is that it uses 64-bit doubles in
> registers, but passes these around in 32-bit GPR pairs, so
> necessitates having an intermediate illegal i64 type. If this is
> specific to PPC, then it might make more sense to stay PPC-specific
> with the trivial virtual override. But if other ABIs have this
> requirement (does ARM EABI have this?) then pushing it into the
> general SDAG logic is probably better. It really just would mean
> adding a legalizing pass over the call and return nodes
> immediately, or pushing it up to the Libcall logic, which started
> this thread in the first place.
Thanks. I'm wondering, in particular, whether adding a generalized
version of the logic in your LowerCallTo override into the base
version would be meaningful / cause behavioral changes for existing
targets?
-Hal
Justin Bogner would know better. I think it would probably be better
in the base version, but don't know what the best way to implement it
would be.
In general, we should not introduce nodes illegal types after type legalization; type legalization over a general DAG is complicated, so the type legalization code uses some specialized data structures. (Also, it's easier to reason about what operations can show up at various points in legalization.)
From your description, with your patch, the bitcast is still created, but it is cleaned up before any other code sees it. That sort of works, but it's a hack. We should avoid creating such nodes at all.
On ARM, we have special handling for f64 arguments on targets where f64 is legal, but the calling convention is soft-float. We use the existing argument lowering hooks to avoid generating nodes with illegal types, and instead generate target-specific nodes: ARMISD::VMOVRRD and ARMISD::VMOVDRR. ARMISD::VMOVRRD takes two i32 values and produces an f64, and ARMISD::VMOVDRR takes an f64 and produces two i32 values. This is implemented in ARMTargetLowering::LowerReturn, ARMTargetLowering::LowerFormalArguments, and ARMTargetLowering::LowerCall. (See ARMTargetLowering::PassF64ArgInRegs etc.)
As far as I can tell, the SPE target is similar, so a similar approach should fix the issue, without any changes to target-independent code.
It would be possible to add some target-independent code to make this sort of handling a little easier. We could define target-independent nodes which are equivalent to the ARMISD nodes I mentioned. And if we had those nodes, we could add a target-independent helper for argument lowering that would generate them. Not sure how much of that is actually worth doing; probably a lot of work to remove a small amount of redundant code.