Function Return Legalization

Hi All,

In the target we are implementing, function return for i64 and f64 types has a different processing.

For types i8 to i32, and f32, the return values are stored in their designated return registers (like how other targets does it).

For i64 and f64 types, in the function call, after pushing the function parameters into the stack, the address of the allocated return memory space is assigned to a 16-bit register (In our case, ER0). This is done by assigning the frame pointer register into ER0 and then adding the offset to it. So the output assembly code would somewhat look like the following:

Expected Target ASM output for i64 function call with i64 return type:

1 push qr0 ;; 64-bit parameter pushed into the stack

2 mov er0, fp ;; Assign the frame pointer into er0

3 add er0, #-32 ;; Add the frame pointer offset to er0

4 bl _fn ;; Function call fn

5 add sp, #8 ;; SP adjustment

In function fn:

1 ;; function prologue processing here…

2 mov er8, er0 ;; frame pointer location of retval is transferred to er8 (will be used at line 4)

3 ;; function processing here… Assuming that the i64 value to be returned is stored at qr0…

4 lea [er8] ;; i64 return processing starts here. Load effective address from [er8]

5 st qr0, [ea] ;; Store the i64 value into the effective address memory location.

6 ;; function epilogue processing here…

My questions are:

  1. Are there any target examples that also has this kind of behavior? If there are please let me know so I could study on it.

  2. Where does the assigning of frame pointer+offset(frame index?) fall? I am currently suspecting that it should be in the LowerCall function.

  3. If my assumption that question #2 falls in LowerCall function is correct, how do I retrieve the created frame index from the LowerCall function in the LowerReturn function?

  4. I also suspect that the affected functions are not limited to LowerCall and LowerReturn, if you have suggestions on which ISelLowering Class functions I should investigate on, please let me know.

Thank you very much in advance for your help!

Sincerely,

Miguel Inigo J. Manalac

LLVM calls a pointer that’s used to return a value indirectly, like you’re describing, an “sret” argument. There are sort of two ways to go about generating one. First, you can edit the call lowering code in clang (clang/lib/CodeGen/TargetInfo.cpp) so the pointer is represented explicitly in IR. Second, you can make the target-independent backend code generate an sret argument in SelectionDAG, by making your target’s “CanLowerReturn” return false.

Probably making CanLowerReturn return false for i64 makes sense for your target. If you’re using TableGen’ed calling conventions (*CallingConv.td), the TableGen’ed code will handle this automatically; otherwise, you can write it out explicitly in C++. If you want an example of how this works in practice, try something like “echo ‘define i128 @foo() { ret i128 3 }’ | llc -mtriple=i686”.

If you need to stick the sret pointer into a special register, you can use CCIfSRet in TableGen. (Or if you’re not using a TableGen’ed calling convention, you can check IsSRet in C++.)

-Eli

Hello Eli,

I think your second suggestion is the correct way of implementing this for our target. Thank you very much for your suggestions, this has greatly helped us in our development!

I am currently trying to implement it for our target.

Cheers,

Miguel

Hi,

After removing support for the i64 type in the *CallingConv.td, sret-demotion is performed and we now have a store<(store 8, align 1)> DAG node being generated. Please refer to the attached dag_funcret.pdf DAG visualization.

My understanding is that, the second operand(CopyFromReg->Register %1, Register %0 back-up) in the store node is the memory location allocated for the i64 type return. Unfortunately, this register is killed in the live variable analysis and makes the Register %0 the second operand. The Register %0 is overwritten during the processing of the function which makes the store node have an incorrect address as the second operand.

I have also tried implementing SRET processing similar to x86’s implementation, this saves the Register %0 value into another register but it is stored after the store node (starting from the EntryToken). Refer to dag_sret_proc.pdf for the DAG visualization for this.

sret-demotion automatically sets er0(16-bit register, copied into Register %0) as a livein function argument containing the return memory location in LowerFormalArguments. er0 is a sub register of the qr0(i64 register) used in the processing. I think I was not able to inform the backend that if qr0 is modified, er0 is modified as well. Is there a way to do this?

What classes/functions should I look into so that Register %1 will not be killed?

I am considering to create a custom select function for ISD::STORE but I am worried that most probably i64 function ret is not the only process that would generate this kind of store node.

Thank you very much for your help and time!

Cheers,

Miguel

dag_funcret.pdf (44.8 KB)

dag_sret_proc.pdf (48.9 KB)

The CopyFromReg->CopyToReg->CopyFromReg sequence doesn’t have the chains set correctly: the second CopyFromReg’s input chain isn’t connected to the CopyToReg’s output chain. (This appears to be the same problem in both graphs.)

Subregisters need to be listed in your RegisterInfo.td. If they are listed correctly, that should be enough to avoid allocating overlapping registers in the register allocator.

-Eli

Hi llvm-dev,

The CopyFromReg->CopyToReg->CopyFromReg sequence doesn’t have the chains set correctly: the second CopyFromReg’s input chain isn’t connected to the CopyToReg’s output chain. (This appears to be the same problem in both graphs.)

The DAG mentioned was generated by the SelectionDAGBuilder and as much as possible, we only modify the files within our target so I tried the next suggestion.

Subregisters need to be listed in your RegisterInfo.td. If they are listed correctly, that should be enough to avoid allocating overlapping registers in the register allocator.

Although we have set the subregisters correctly in the RegisterInfo, subregisters of subregisters are not automatically listed which caused the overlapping. This was solved by declaring subregisters of subregisters as Aliases.

With this solution, The target now generates assembly for the function as expected. Special thanks to Eli for the help!

The problem now is, during function call, there is a CopyToReg(ER0(16-bit Reg), FrameIndex) generated, saving the pointer to the i64 return value memory location in int form into a register. The plan is to extract the frame address and offset from the FrameIndex node and perform CopyToReg(ER0, (frame address + offset)).

  • These DAG nodes are being created in LowerCall when the isSRet flag is true for an argument. Is it ideal to do it here or are there other better methods? The main goal here is to expand FrameIndex node into an add(frame address, offset) node

  • I am currently having a hard time extracting the offset in the FrameIndex node. I have also read (From “Using frameindex in a pattern” llvm-dev archive, referring to the Sparc target) that adding frameindex into the def addr : Complex Pattern<…,”SelectAddr”,…> would only translate it into a targetframeindex with an offset of 0.

  • In the AVR target, the ISD::FrameIndex has a custom select transforming it into an AVR:FRMIDX node and then later processed in the eliminateFrameIndex. Is this equivalent to the process I am trying to do?

  • What is the expected output process for a FrameIndex node when selected in ISelDAGToDAG? This node does not usually get selected for load/store operations. What I understand is for load/store it contains information about address where to load from/store to.

Thanks in advance!

Cheers,

Miguel

The first two nodes in the CopyFromReg->CopyToReg->CopyFromReg sequence are generated by your target’s LowerFormalArguments. Maybe there’s a bug there?

Unfortunately, I don’t think we have any good documentation for frame indexes. Normally, an ISD:: FrameIndex is lowered to an ISD::TargetFrameIndex, which is then used as an operand to some arithmetic or memory instruction. That instruction then ends up with a MachineOperand of type MO_FrameIndex. This is usually done with C++ code; see, for example, AArch64DAGToDAGISel::Select. In the general case, you just need to compute the address and return it in a register.

The FrameIndex gets converted to an actual number much later, in your target’s implementation of TargetRegisterInfo::eliminateFrameIndex.

-Eli

Hi Eli,

We’ll take note of the CopyFromReg->CopyToReg->CopyFromReg and investigate it in the future.

AArch64’s and AVR’s implementation was what we needed to fix the issue.

Thank you very much for your help! You significantly helped us in our development.

Cheers,

Miguel