Adding VARGS support for custom backend

Hey.

I’m trying to add vargs support to my backend.
I’m trying to figure out how to do it, and would like to get help.

I’m still learning about the implementation of VARS, if anyone can help with that too if would be great.
For what I’ve learned so far, there is a pointer on the stack, pointing at the first VARG, and va_arg is fetching the value from the pointer and advances it.

Either way.

My current calling convention is:

The first arguments are stored in registers, there are two 32-bit registers reserved for this purpose.
The rest of the arguments go to the stack.

def BF_CCallingConv : CallingConv<[
  CCIfType<[i8], CCAssignToReg<[R_33_b, R_35_b, R_37_b, R_39_b, R_41_b, R_43_b, R_45_b, R_47_b]>>,
  CCIfType<[i16], CCAssignToReg<[R_33_w, R_37_w, R_41_w, R_45_w]>>,
  CCIfType<[i32], CCAssignToReg<[R_33_d, R_41_d]>>,

  CCIfType<[i8], CCAssignToStack<1, 1>>,
  CCIfType<[i16], CCAssignToStack<2, 2>>,
  CCIfType<[i32], CCAssignToStack<4, 4>>
]>;

I’d like to know if that is enough for VARGS, or should I change something.

As for the implementation of VARGS.
The location of the pointer to the current varg doesn’t really matter, it could be on the stack, but if possible, I’d rather use a register when avilable.
For the actuall vargs, I guess it would be best to just throw them on the stack and let the pointer do its work.

Now, It seems that the most basic opcode I need to lower is ISD::VASTART.
I’ve looked in some other implementations, like SPARC and ARC, it seems that I can imeplement my own MachineFunctionInfo, and store the location of the vargs pointer in there when lowering call, and that use it to lower VASTART, letting it know the location of the pointer.

I still need to figure how to find the location of the next avilable frame index when lowering CALL, but I guess it just matter of understanding the call lowering process better than I do now.

Another thing is, it seems that VAARG could be just Expand and it should work on its own? I’d like to know the pros and cons.
Same for VACOPY and VAEND, which I’m not even sure what they do in the first place.

Every help would be appriciated.
Thanks :slight_smile:

I’ve looked further into ARC implementation.

I think I got a better preception of the way it implements VARGS.

For my understanding, given a calling convantion that uses R1…Rn and than put the address of R1’s copy on the stack to the VARGS pointer.

The stack looks like this:

1. POINTER (= 2.)
2. R1
3. R2
...
4. STACK_ARG1
5. STACK_ARG2
...

I’m not sure it this implementation can handle my calling conventaion, which supports any arg size (unline ARC which supports only 32-bit arguments)

My args can be as such (reigster memory illustration)

my_func((char) a, (short) b, (long) c);

32-bit register A
[ u8 arg1, u8 nothing, u16 arg2 ]

32-bit register B
[ u32 arg3 ]

which means, the stack will be filled with holes of random values…

It seems that the compiler my resolve those alignment issues by itself using the calling convention file.

My problem now is the structure of the stack.
When calling a function, the stack looks like so:

Arg N
Arg N-1
...
Arg 1
$IP
$BP

If I wanted to use registers in varg, I should push them right after arg1 and point to them, but $IP is there…

Is there a way for me to just forbit the use of registers in varg?

If you just want them to work, you can use a void * lowering via [RFC] Desugar variadics. Codegen for new targets, optimisation for existing provided by @JonChesterfield. That skips the backend by just doing the lowering in LLVM-IR, that’s what we use for the GPU targets currently.

Thank you for your help! :slight_smile:

To be honest, it seems like a lot to digest.
I tried to read the article, and even the PR, yet I don’t understand the idea nor the implementation.

I might use this solution, but I want to try and make it work the usual way before I try that.

It seems to me that if I could make the compiler pass the vargs argument straight to the stack and not use registers, it would be much easier to implement, given the work already done.

See [NVPTX] Implement variadic functions using IR lowering (#96015) · llvm/llvm-project@486d00e · GitHub. The values used here will depend on your ABI, for example alignment and handling of aggregates.

The most common happy path for variadic lowering is:

  • expand va_arg in clang (there’s a voidptr path several targets use)
  • the rest turn into intrinsics that pass through IR unchanged
  • the backend deletes va_end and turns va_copy into a memcpy
  • the backend turns va_start into something that digs values out of the stack

The alternative Joseph mentioned above can be found by looking at amdgpu or nvptx. That goes:

  • expand va_arg in clang, but don’t disassemble structs etc
  • run an IR pass from the backend that removes all traces of variadics (lib/Transforms/IPO/ExpandVariadics.cpp)

As a nice quirk that pass can be run by the optimiser as well where it removes known functions. To add that to your backend, there’s a small class to implement in the IR pass and a case to add to a switch that recognises your triple. And actually schedule the pass from the backend.

As the author of that thing I’d encourage going down that path. It’s what I wanted for Graphcore’s backend years ago. If you have specific architecture determined ABI constraints then it won’t help much, but if you’d just like variadics crossed off the todo list and think “stash them all in a struct, pass a pointer to that struct” sounds fine as a lowering strategy you should like it.

If the pass doesn’t make any sense to you ping me and we’ll make it better. I got sidetracked before landing some trailing refactors to the class abstraction - it is believed to be functionally correct, but the workflow for adding a new backend is not as pretty as it might be.

edit: The implementation is to make a struct alloca per call site, store all the variable arguments into that struct, pass it as the last argument. It’s very much replacing the … with a va_list and then fixing up the call sites to match.

1 Like

You can create your own calling convention for vararg functions, and assign everything to stack. Functions with only ... in the argument list may be a problem, but as long as you have at least one named parameter, it should be fine.

Hey.

First of all, thank you all for you comments! :smiley:

Now, before I answer your commends, I’ll say it is now working, and I successfully compiled variadic functions to my backend :slight_smile:

Now, as I said before, the aproach of replacing va_start and vaarg in the LLVM IR was, and still, a bit complicated for me.
Thanks for the explanation, I understood the idea of that implementation, but I dont event know where to start to apply it for my backend.

Eventually, I implemented basic support for varargs in the following way:

I created a second calling convention for varargs:

def BF_VarargCallingConv : CallingConv<[
  // In varargs functions the arguemnts are passed only on the stack

  CCIfType<[i8], CCAssignToStack<1, 1>>,
  CCIfType<[i16], CCAssignToStack<2, 1>>,
  CCIfType<[i32], CCAssignToStack<4, 1>>
]>;

And used it in LowerCall and LowerFormalArguments:

if (IsVarArg) {
    CCInfo.AnalyzeCallOperands(Outs, BF_VarargCallingConv);
  } else {
    CCInfo.AnalyzeCallOperands(Outs, BF_CCallingConv);
  }

One thing I noticed, is that the arguments will always be MVT::i32, even when I pass char or short.
I decided to count on this behaviour, so first I enforced it (and using the stack)

for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
    CCValAssign &VA = ArgLocs[i];
    if (IsVarArg) {
      assert(VA.isMemLoc() && "Varags must be passed on the stack!");
      assert(VA.getLocVT() == MVT::i32 && "Varargs must be 32 bit!");
    }

I also needed to implement my own class BFMachineFunctionInfo so I can store the offset of the varags.
I added the logic to LowerFormalArguments to save the offset

// Handle vargs
if (isVarArg) {
  // Save a place on the stack to store the VARARG pointer
  int VarFI = MFI.CreateFixedObject(4, CCInfo.getNextStackOffset(), true);
  // Set the offset of the VARARG pointer
  BFI->setVarArgsFrameOffset(VarFI);
}

Now all I needed to do was to implement LowerVASTART and LowerVAARG.

In LowerVASTART I used BFMachineFunctionInfo to get the offset of the varargs pointer and store the adderss in it.

I also implemented LowerVAARG, mostly because my stack grows upwards instead of downwards, and it was not behaving the way I needed it to.

Now, testing with simple implementation of printf everything seems to work!

Although I did not end up using the LLVM IR replacement method, I appreciate you for explaining it to me, and for suggesting it as a solution.

Thank you all very much for all the support! :smiley:

Struct { char x; }; probably isn’t passed as i32 and f16 is worth checking too (e.g. does it get passed via f64). Beware alignment as well. There’s a long list of tedious edge cases.

You wouldn’t apply the IR in your backend as such, merely ensure that the existing lowering pass is always called and that it knows your triple behaves like a void*.

Either approach works with sufficient care though. You can probably reuse the IR test cases with llc to find some of the edge cases in your backend if so inclined.