LLVM Backend for a platform with no (normal) stack

Dear Sir or Ma’am;

I have found a wealth of help and information on writing an LLVM backend. And, my platform has no stack.

Can you point me to any information that would specifically address creating a backend for this kind of platform?

In previous wanderings, I thought I ran across a phrase “platforms with no stack such as FPGAs”, but I can’t find that mention, now.

More thanks than I can type,

JD Jones

Software Engineer

This message is intended for the addressee only and may contain Paragon Research Corporation (PRC) confidential or privileged information. Use or distribution of such confidential information is strictly prohibited without the prior written permission of PRC. If you have received this message in error, please contact the sender immediately and delete the message and attachments from your computer.

Do you have a register that you can store a memory address in and an addressing mode that allows you to add (or subtract) a constant to that register and use the address calculated to load//store from memory? Do you have enough registers that you can keep one of them permanently pointed to a particular memory region?

If yes, then you have a stack to exactly the same extent as RISC-V or MIPS do.

Thanks for your response. Please see below.

Having your function prologue call malloc() and epilogue call free() (or similar functions) instead of bumping a stack pointer is not a problem. That stuff is generated explicitly by ISA-specific code anyway.

Thanks, no malloc or free equivalents either (no heap).

So, there are no others (to your knowledge) who have built an LLVM backend for a platform with no “normal” stack? I found a presentation about some FPGA work (using LLVM) but it doesn’t seem to apply to my platform. Perhaps someone else on the mailing list will have come across this rarity?

Thank you again for your time and responses. If I have inadvertently been rude, please forgive someone new to LLVM? And if your forgiveness stretches that far, perhaps you could clue me on just how I was rude so that I can avoid it in the future?

More thanks,

JD Jones

You said you can allocate memory?

Well, since you're having memory, then things are more or less doable
– all you need is to create and maintain stack by yourself. This could
be done on per-function basis or whole program (the technique is
called "compiled stack" or "software stack").

For a machine like an FPGA with no stack, you have to ensure that you have an optimization that rewrites the alloca into either registers (such as PromoteMem2Reg) or that you rewrite the alloca by declaring a static local, and rewriting the code to use that instead of the alloca result.

Mark Mendell

You’ll then have to make sure programs never have recursion.

When it comes down to it, nothing we run programs on actually has a true unbounded stack. We just have a region of memory that we calculate or hope will be big enough. But over a million people (mostly amateurs) happily and regularly run C++ programs (with virtual functions and all) compiled by gcc on machines with only 2 KB of RAM: the ATmega328 in most Arduinos.

Not only do FPGAs not support recursion, we don’t even support calls! All user code must be inlined into one kernel/component, which is then used to create HDL for the FPGA.


That’s an implementation choice. Verilog allows you to use recursion to generate complex structured circuits. Of course, it’s all just inlined and the recursions have to hit bottom during that process.

Do you also insist on no loops, or at least only loops with explicit fixed trip counts? What about AREF?

I’ve worked on a GPU back end for llvm where we inlined all function calls into each shader/kernel. That was convenient, but only ever as a stop-gap and OpenCL 1.2 lets you get away with that.

On very restricted machines it’s often easier to create a simple interpreter using the native features and then target your actual program at the interpreter. It’s slower, of course, but much more flexible. This applies equally as much to FPGAs as it did to things like the 6502 forty years ago where most programming was done in either tokenised BASIC or else Pascal compiled to bytecode (or even SWEET16). Every day I use an FPGA programmed to interpret RISC-V opcodes and it supports not only function calls and stacks and recursion, it’s running a full Debian Linux as quickly as an early Pentium or PowerPC did.

Depending on the performance needs, an interpreter might be a good option for the OP.

We have full support for loops

Yes, thank you, specifically and all. On this platform, we call what I will use a "frame stack" (it's actually not a stack, but that's really not relevant). Special thanks to Mr. Mendell--I promise to make good use of the contents of lib/Transforms/Utils/PromoteMemoryToRegister.cpp.

To All, I'm sorry I wasn't clear in my original posting. In hindsight, it was clearly not clear. More apologies.

I am looking for information on projects that actually implemented an LLVM backend on a platform that has no normal call stack, the project's reasoning, timelines, lessons-learned, that sort of thing, or research/analysis that explains why LLVM was not used for such a backend.

Perhaps the reasons for the posting would help? If not, please read no further, just please accept more, sincere thanks and requests for forgiveness.


My hope was to obtain information about/from projects that _have actually performed_ (or decided _not_ to perform) a task similar to mine. Because...

My superordinates have read that it's quick and easy to create an LLVM backend.

On 11/27, I was tasked to create a "super basic" LLVM backend for a platform that has no stack--something the core LLVM code seems to assume.

My superordinates expect to see that backend by the end of this month.

"Super basic" means a translation from C, "int main() { int a = 1; int b = 2; int c = a+b; return c; }", into the assembly code used by our (likely unique) platform. What I've got generates "hmninh" where I need to work inside the prologue and epilogue, and a few of my platform's assembly instructions in almost the right places. This week, I'll be looking for those places in the DAG processing where I need to add (or prevent the consolidation of) nodes, subtract (or prevent the addition of) nodes, etc.. I'm getting the hang of LLVM enough to cause a little damage.

ADHD moment: praises to those folks who thought to add -debug to llc and those who thoughtfully populate and filter it.

When I sent my first email request for help, I was thinking along the lines of the generously provided, previous suggestions (Mr. Hoult, Mr. Korobeynikov, thank you again!), to just "make a stack" using what I've got, but that's not the way the current compiler (which was written in Haskell) does things. I was hoping for some thoughts about which way to go, and the responses validated and gladdened.

I know _now_ that implementing an equivalent of a normal call stack on this platform is disallowed. I know this because after my first email, a superordinate told me "No stack", spouted Greek at me for a few long minutes, then asked "Got it?" I replied, "No stack. Got it."

So, I basically need to port/re-invent some of the Haskell code (from the current compiler) inside of and using the LLVM infrastructure (otherwise, it won't be an "LLVM backend").

I can do this.

I do _not_ know _how much_ of it I can do by New Year's Eve. My superordinates have shown a disinclination to accept my word as a professional, experienced engineer that I'm doing the right things in a reasonable manner at the optimal speed. They believe their own research over mine unless I thoroughly defend it.

With no other project as a basis of estimate, by New Year's Day, I _may_ have a _defendable_ BOE on how long the task will take, but there is a significant probability that I will not have executable code that generates a correct assembly file.

If someone else has already implemented something similar (or decided not to) and that project has shared lessons-learned, timelines, hiccoughs, anything, then that information could help me better communicate with my superordinates, perhaps manage their expectations, possibly add value to my implementation.

If a similar project took two years, then my superordinates might be happier with what I'm able to accomplish in six weeks. Of course if a similar project took six weeks, I'll have to resign in utter embarrassment--unless I can rapidly adapt their methods to my project's platform :-}

So, as much as I truly appreciate your suggestions on potential implementations, what I really need is information about projects that have _actually implemented_ (or chosen not to implement) an LLVM backend on a platform that has no stack. And I apologize again for not having been clear about that previously. In hind sight, I can immediately see that what I wrote was much too open and vague, not at all correct except when viewed through the filter of my personal assumptions.

Just for situational awareness, I found a PDF of a presentation (sub-)titled "Using LLVM to Generate FPGA Accelerators" by Altera, and I plan to digest it today, although a quick scan provided little hope. Searching the Altera (redirected to Intel) website on "LLVM" got no hits, reducing hope.

More thanks than I can type,

The problem here is that you haven’t told us what you do have – or for that matter whether you need to support all features of LLVM IR (or C), or leave some out, and if so then what?

For example, does your source language and hardware allow any notion of “address”, pointers, pointer arithmetic (AREF), arrays?

What about loops? Functions? Assignment?

We do support the functionality need to implement OpenCL 1.2. This includes pointers, loops, pointer arithmetic, arrays, local/global/constant memory, etc.

It does not include recursion, function pointers, virtual calls, etc.