From: Evan Cheng <evan.cheng@apple.com>
Date: 11 September 2007 19:26:39 GMT+02:00
To: LLVM Developers Mailing List <llvmdev@cs.uiuc.edu>
Subject: Re: [LLVMdev] RFC: Tail call optimization X86
Reply-To: LLVM Developers Mailing List <llvmdev@cs.uiuc.edu>
Hi Arnold,
Thanks for the patch. Some questions and commons:
1. Have you test it against the llvm test suite?
No not yet.
Does it work if fp
elimination optimization is turned off?
For my test cases - yes.
2. Please follow llvm coding convention and make sure every line fits
in 80 columns.
Okay dokey. i was looking through the files manually - must have missed some - sorry.
3.
enum NameDecorationStyle {
None,
StdCall,
- FastCall
+ FastCall,
+ FastCC // the normal fastcc calling convention
};
Why is FastCC necessary? Can't you just use FastCall?
FastCall is used to indicate a function has x86_FastCall semantics if i am not mistaken.
I needed to differentiate between a normal fastcc and the x86_fastcall semantics in an older version
of my code. I no longer depend on that so it can be removed as you suggest. sorry for the code corpse 
4.
def X86tailcall: SDNode<"X86ISD::TAILCALL", SDT_X86Call,
[SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
+def X86truetailcall: SDNode<"X86ISD::TRUETAILCALL", SDT_X86Call,
+ [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
+
Please use X86tailcall. It's not currently used so feel free to
change its patterns, etc.
Okay dokey.
5.
+// the following two instructions are used to adjust the stack pointer
+// in the case where the callee has more arguments than the caller
+// an area is created where the return addr can be safely moved to
+let isConvertibleToThreeAddress = 1 in { // Can transform into LEA.
+def TCADD32ri : Ii32<0x81, MRM0r, (outs GR32:$dst), (ins GR32:
$src1, i32imm:$src2),
+ "add{l}\t{$src2, $dst|$dst, $src2}",
+ []>;
+}
+
+def TCSUB32ri : Ii32<0x81, MRM5r, (outs GR32:$dst), (ins GR32:
$src1, i32imm:$src2),
+ "sub{l}\t{$src2, $dst|$dst, $src2}",
+ []>;
+
In an intermediate implementation (that did not work anyway) i need to differentiate
between the normal ADD32ri/SUB32ri that were in the code and the stack
adjustments i added. I no longer need them so i will remove them. again a code
corpse sorry for that.
+//let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
+//def TCRETURNmi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:
$offset),
+// "#TC_RETURN $dst $offset",
+// []>;
+
This node holds the stack adjustment delta and the jmp destination of the call.
The information is used in epilogue generation.
Why are these needed? They don't look any different from normal add
and subtraction instructions. Why not just use ADJCALLSTACKDOWN and
ADJCALLSTACKUP?
6.
+ } else if (RetOpcode == X86::TCRETURNdi) {
+ // a tailcall adjust the stack
...
+ } else if (RetOpcode == X86::TCRETURNri) {
+ MBBI = prior(MBB.end());
Seems like there are quite a bit of duplicate code between the 2
cases. Please refactor.
Okay dokey
7.
X86TargetLowering::X86TargetLowering(TargetMachine &TM)
: TargetLowering(TM) {
+ IsLastCallTailCall = false;
This is probably not a good idea. You are assuming nodes are lowered
in certain order, that is dangerous. It's probably better to check
whether the last call is a tail call on the fly as you are processing
the return node.
Will do so.
8.
-
- SDOperand Chain = Op.getOperand(0);
- SDOperand Flag;
-
- // Copy the result values into the output registers.
- if (RVLocs.size() != 1 || !RVLocs[0].isRegLoc() ||
- RVLocs[0].getLocReg() != X86::ST0) {
- for (unsigned i = 0; i != RVLocs.size(); ++i) {
- CCValAssign &VA = RVLocs[i];
- assert(VA.isRegLoc() && "Can only return in registers!");
- Chain = DAG.getCopyToReg(Chain, VA.getLocReg(), Op.getOperand
(i*2+1),
- Flag);
- Flag = Chain.getValue(1);
- }
+ if (IsLastCallTailCall) {
+ IsLastCallTailCall = false;
+ SDOperand TailCall = GetTailCall(Op);
+ SDOperand TargetAddress = TailCall.getOperand(1);
+ SDOperand StackAdjustment = TailCall.getOperand(2);
+ assert ( ((TargetAddress.getOpcode() == ISD::Register &&
+ cast<RegisterSDNode>(TargetAddress)->getReg() ==
X86::ECX) ||
+ TargetAddress.getOpcode() == ISD::TargetExternalSymbol ||
+ TargetAddress.getOpcode() == ISD::TargetGlobalAddress) &&
+ "Expecting an global address, external symbol, or
register");
+ assert( StackAdjustment.getOpcode() == ISD::Constant &&
"Expecting a const value");
+ // TODO: should we add flag from tail call as last operand?
+ return DAG.getNode(X86ISD::TC_RETURN, MVT::Other, Op.getOperand
(0), TargetAddress, StackAdjustment);
} else {
Since if (IsLastCallTailCall) { ... } always ends with an early
exist, you can just place the block before if (RVLocs.size() ...) and
there is no need to change indentation for the rest of the function.
Okay
9.
+// Check to see whether the next instruction following the call is a
return
+// A function is eligable if caller/callee calling conventions match
and the
+// function CALL is immediatly followed by a RET
+bool X86TargetLowering::IsEligibleForTailCallElimination(SDOperand
Call, SelectionDAG& DAG, unsigned CalleeCC, SDOperand Callee) {
+ bool IsEligible = false;
+ SDNode * CallNode = Call.Val;
...
+SDOperand X86TargetLowering::LowerX86_32FastCCCallTo(SDOperand Op,
+ SelectionDAG &DAG,
+ unsigned CC) {
+ DOUT << "LowerX86_32FastCCCallTo\n";
+ SDOperand Chain = Op.getOperand(0);
+ bool isVarArg = cast<ConstantSDNode>(Op.getOperand(2))-
getValue() != 0;
+ bool isTailCall = cast<ConstantSDNode>(Op.getOperand(3))-
getValue() != 0;
+ SDOperand Callee = Op.getOperand(4);
+ //unsigned NumOps = (Op.getNumOperands() - 5) / 2;
+
+ // Analyze operands of the call, assigning locations to each operand.
+ SmallVector<CCValAssign, 16> ArgLocs;
+ CCState CCInfo(CC, isVarArg, getTargetMachine(), ArgLocs);
+ CCInfo.AnalyzeCallOperands(Op.Val, CC_X86_32_TailCall);
+ if (isTailCall &&
+ IsEligibleForTailCallElimination(Op, DAG,CC, Callee) &&
+ PerformTailCallOpt) {
IsEligibleForTailCallElimination() should be a target hook. This way
TargetLowering::LowerCallTo() can determine whether a call is
eligible for tail call elimination and insert the current ISD::CALL
node.
Okay
X86TargetLowering::LowerX86_32FastCCCallTo() will not have to
handle non-tail calls.
I will then add code to LowerCCCCallTo() to check whether to use the normal
calling convention CC_X86_32_C (3 registers free for argument passing) or
the tail call (fastcc) calling convention CC_X86_32_TailCall (2 register free).
Expect to see a new patch after i have done the testing with the whole test suite.
regards arnold