MOS6502 target

Hey there!

I've started to embark on a path to try and create a backend for a 39 year old CPU with only an accumulator, two index registers, and a 256 byte stack. It does have a bank of 256 bytes before the stack that are pretty quick though.

Really, if I can get an assembler out of `llc`, that'll be success enough for me. Clang would be better, but I think that might be crazy talk.

I've been doing lots of research so far, but from the experts, how feasible does this sound?

I've also been banging my head against the wall trying to figure out what all the classes for different instruction types do. Is there a nicely documented index? Is it in source somewhere, or should I start one?

Thanks,

Edwin

I’ve considered doing this as well :slight_smile: As an exercise to learn the LLVM back end as much as anything.

It probably makes sense to allocate 8 or 16 pairs of zero page locations as virtual 16 bit registers and make 32 bit operations available only as library routines/intrinsics.

What would be really helpful would be if LLVM had a way to detect that certain functions and sets of functions (probably >90% of the program) are NOT recursive and statically allocate fixed zero page and/or high memory locations for their local variables.

If you have the call DAG, you can turn that into a total ordering such that if A transitively calls B then the locations of A’s locals will always be at higher addresses than B’s locals. (or vice versa).

This will probably be a bit bigger than the maximum dynamic depth of a stack implementation, but I think usually not a lot. And it lets you use absolute addressing instead of slow (zp),y, and also let you avoid saving and restoring simulated callee-save registers.

Hi,

I found the tutorial presented at last LLVM conference very helpful when starting on my own backend. Especially with the GitHub repository containing a template backend that was quite easy to rename and extend.

Here's the slides with a pointer to the repository:

http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf

Cheers,
  Roel

I'd not seen that tutorial, but it looks very nice. One other piece of advice I'd give:

Get the assembler working first. Go through your instruction reference, define each instruction's encoding, add the relevant bits to the AsmParser (typically not much, unless you have some weird encodings), and test the assembler with llvm-mc and its show-encodings feature before you try to connect up code generation. And write lots of tests while you're doing this!

It makes it easy to see if you've got sensible patterns, when every instruction is in the .td files and you can easily find the ones that aren't used (is it because there's a more sensible way of expressing what they can do, because they do something that isn't easily expressed in LLVM IR, or because you made a mistake?). It's a lot easier to track down bugs in instruction encodings when you've got a decent set of assembly tests for them than when you're staring at compiler output. Common errors include not realising immediates are sign / zero extended or shifted.

David

That's the plan! It'll be something useful for all my effort.

It's also where the bulk of the coding effort seems to be (instruction and register definitions). I'm sure there's more brain effort into making up for lack of registers and design differences from a modern CPU, but that can come later.

Thanks! I feel the offline documentation could be improved or re-done with some of these. The current backend doc has lots of definition without explanation. It makes it fairly frustrating when it says "look at the spark target" when it doesn't define any of the classes or define what the options were for or the motivations for their use.

Once I've figured this stuff out, I'll likely be contributing to that.

I've read about half-way. Holy crap is this helpful.

This needs to be referenced in the backend docs or the talk needs to be transcribed to become the new backend docs.

I've considered doing this as well :slight_smile: As an exercise to learn the LLVM back end as much as anything.

It probably makes sense to allocate 8 or 16 pairs of zero page locations as virtual 16 bit registers and make 32 bit operations available only as library routines/intrinsics.

One interesting problem with doing that is determining which zero page locations are free for use on the target machine. The Apple II ROM, for example, reserves many of these locations for its own use. Other machines may reserve different locations within the zero page.

What would be *really* helpful would be if LLVM had a way to detect that certain functions and sets of functions (probably >90% of the program) are NOT recursive and statically allocate fixed zero page and/or high memory locations for their local variables.

A typical callgraph analysis run within libLTO can be used (along with Tarjan's algorithm) to find SCCs in the callgraph. The default callgraph implementation within LLVM is extremely conservative with function pointers and external code, so for some programs, the results may not be good.

Alternatively, one could use the DSA analysis within the poolalloc project to get a better callgraph. There's even code within SAFECode to find SCCs, although I don't recall off-hand which pass it is and whether it's active in the default compilation.

Regards,

John Criswell (who fondly remembers assembly programming on his Apple //c)

The monitor reserves very few. Mostly just screen cursor position and the
hooks for character in and out, if you want to use those. Or you can run on
bare metal and do your own screen drawing and exit by calling the monitor
init vector (e.g. the CPU reset vector).

DOS uses a few more, but not a lot.

The vast majority of locations marked as "reserved" are used by AppleSoft.
Which you can totally ignore if you're not a subroutine called by & from
BASIC. If you preserved monitor and DOS locations, then you can exit to the
AppleSoft init vector. You'll recall "3D0G" :slight_smile:

I was doing some thinking and discussing, and another interesting problem is that the MOS6502 really doesn't have much in the way of general purpose registers.

Some add/move/etc can use reg X or Y in immediate mode, but I think the expectation is that it operates on the zero page directly.

Has anyone managed to do that effectively? Instructions that take as input a memory location to read values from?

Ex: "ADD A, #00FF" where "#00FF" is the final byte in the zero page.

Well, the stack pointer be a single byte, so pushing things on there doesn’t work terribly well.

Assuming I pass by reference, that’s 128 values absolutely total before it wraps around and silently clobbers itself. It means single byte values will be incredibly inefficient… Tricky stuff.

I’m lucky on the C64 since it’s rare to exit back to the kernel with machine language apps (never did it when I was a kid at least), so if I destroy the Kernel’s stack, no one will ever know! Mwahaha!

With regard to code layout, ideally everything would get inlined since I have gobs of memory compared to everything else. I wouldn’t need to worry as much about the stack as long as real values don’t get stored there.

Well, the stack pointer be a single byte, so pushing things on there
doesn't work terribly well.

Assuming I pass by reference, that's 128 values absolutely total before it
wraps around and silently clobbers itself. It means single byte values will
be incredibly inefficient... Tricky stuff.

You absolutely don't want anything on the hardware stack except function
return addresses and possibly very temp storage e.g. PHA (push A); do
something that destroys A, PLA (pull A). Or you could use a ZP temp for
that. STA ZP; LDA ZP is I think cycle or two faster, PHA/PLA is two bytes
smaller ... size usually wins.

The "C" local variables stack absolutely needs to be somewhere else,
bigger, and using a pair of ZP locations as the stack pointer (SP). You
can't index off the hardware stack pointer, for a start.

As mentioned before, if possible you'd want to statically allocate as many
local vars as possible, as LDA $nnnn is a byte smaller and twice as fast (4
vs 8) as LDY #nn; LDA (SP),Y. (you'll sometimes be able to use INY or DEY
instead of the load .. or just reuse the last value. But still...)

With regard to code layout, ideally everything would get inlined since I

have gobs of memory compared to everything else. I wouldn't need to worry
as much about the stack as long as real values don't get stored there.

I actually think that the ideal 6502 compiler would output actual 6502
code (mostly) only for leaf functions, and everything else should be
compiled to some kind of byte code. The 6502 was a very very cheap chip to
build hardware wise, but the code is BULKY. Even when operating on 8 bit
values it's worse than, say, Thumb2, due to the lack of registers. On 16 or
32 bit values it's diabolical if everything is done inline.

Wozniak's "Sweet 16" is still not a terrible design for this, but I think a
bit more thought can come up with something better. The Sweet16 interpreter
is pretty small though (under 512 bytes I think?), which is pretty
important.

http://www.6502.org/source/interpreters/sweet16.htm

The criteria whether to use native or bytecode for a given function is
pretty similar to the inlining decision. And a decent compact, small
interpreter, byte code solution could be reused on other 8 bit CPUs.

Some of which are still in active use today, and so even commercially
important e.g. 8051, AVR, and PIC.

Erm .. are we boring the rest of llvmdev yet?

I suppose that once you’ve got a 6502 working, adding support for a 4510 shouldn’t be too difficult…

(http://c65gs.blogspot.com.au/)

What’s the one-paragraph current state of FPGA programming? What does one cost? What support gear do you need (and how much is it)? Is programming one-shot or can you reuse it?

You can buy one for under $50, reprogram thousands of times and the dev tools last I checked confused me.

That’s the last I’ll say about it until you either start a new thread or justify how it fits in this one.

Not a fan of hijacking.

How does it fit the thread? Previous poster Jeremy Lakeman’s blog says he has implemented a 6502 in an FPGA, running at about 30x original speed.

Seems relevant to me, especially if anyone is interested in running code on actual hardware rather than a software simulator.

The set of programs that you will be able to compile will be severly limited. There is no OS to speak of, so the only mode you will be able to use is the "base metal". This means no file I/O.

If you ever plan to run the output on a real hardware, add an option to reserve memory addresses for memory mapped registers.

There is a 16-bit derivative of 6502, maybe you could target that instead?

In any case, good luck. Sounds really cool!

-Krzysztof

Oh yeah, I'm not expecting to port some major app. I have pretty intimate knowledge of what my c64 can and can't do. It barely has a kernel (pretty much just in-memory routines for disk, screen, and tape access).

This isn't a practical project (good C compilers exist already), this is an excuse to play with LLVM.