Partially complete LLVM backend for the VideoCore 4


For a while I've been working on an LLVM backend for Broadcom's
VideoCore 4, the GPU made famous by the Raspberry Pi. This isn't the
QPU, for which Broadcom released docs a little while ago; it's the main
processor, which is a VC4 core.

It's a rather elegant thing with two cores, 32 registers, a built-in DSP
and an extremely nice instruction set; reverse engineered docs (based on
publically available information) are available here:

Right now the backend is incomplete but generates decent code for those
things it does --- 32 bit integer and float, most ALU operations, some
memory operations. It doesn't know about instruction displacements yet
so the generated code probably won't assemble, and it doesn't know about
64 bit integers yet, which means it gets on very badly with LLVM's

I was eventually hoping that it would be useful to run programs on the
Raspberry Pi bare metal; I've had success with this using an extremely
crude VC4 code generator for the ACK, but the generated code was
painfully terrible, hence this port.

Unfortunately I've run out of time and probably won't get a chance to
work on this for a while, so if anyone is interested, help yourself:

Congratulations on the release David, this looks very interesting. I
had wondering what you were targeting given your series of questions
to the mailing list :slight_smile: Just a note to say that in VideoCore parlance
this is an LLVM backend for the VPU. An interesting complimentary
project would be an LLVM backend targetting the QPU (which are
actually publicly documented via




Congratulations on the release David, this looks very interesting. I
had wondering what you were targeting given your series of questions
to the mailing list :slight_smile:

Yeah, it is a kind of distinctive architecture. Did I mention the
condition codes? It's got condition codes...

It's a lovely thing to write hand assembly in, by the way. The ARM used
to be my favourite processor; no more. (Have I mentioned the 64x64xbyte
DSP vector storage? On which you can perform SIMD operations on
arbitrary horizontal or vertical slices with full integration with the ALU?)

It's just a shame I didn't get it any more finished; the LLVM learning
curve is really steep, and the documentation is, with all the best will
in the world, deeply inadequate. There are lots of rough edges and
things that you expect should work by don't, and there's a painful
amount of boilerplate. It was really surprising how frequently I had to
resort to manually matching patterns in C++ rather than using TableGen's

This is actually my third attempt at a VC4 compiler; the first was the
ACK, which was straightforward but generates shockingly bad code; and
the second was gcc. Comparing retargeting gcc vs LLVM is interesting.
gcc has better documentation and, I think, a better templating system
(it does more and requires less manual code), but it's got *way* more
magic, and trying to figure out what was wrong when it inevitably did go
wrong was practically impossible. The community wasn't much help either.
(I eventually got it generating code, but the stack frames were
hopelessly mangled and I never did figure out what was wrong. gcc's
frame pointer elimination is... exciting.)


Yes, the QPU is really interesting. Unfortunately I prefer my processors
to have... how do I put it... memory operations!

If you like that, you should take a look at recent Intel CPUs, which also allow diagonal stripes through the register set for SIMD operations...


Uh, fingers autocorrected. Intel *GPUs*.