[RFC] RISC-V backend

Hi all,

I am proposing the integration of a backend targeting the RISC-V ISA.

RISC-V is a free and open instruction set architecture that was originally
developed at UC Berkeley. Future development of the ISA specification will be
handled by the 501(c)(6) non-profit RISC-V Foundation and its members
<https://riscv.org/membership/?action=viewlistings>. You can find much more
information at the RISC-V website <https://riscv.org/>, including the current
ISA specification <https://riscv.org/specifications/>. You might note that
RISC-V defines 32-bit and 64-bit variants and also supports a compressed
variant, allowing 16-bit instructions to be freely intermingled with the
standard 32-bit representations. The standard is structured to allow
implementers to choose appropriate subsets to support, for instance a
micro-controller might support 'RV32I' (32-bit RISC-V with the integer
instructions) and an application core running Linux might implement RV64IMAFD
(commonly shortened to RV64G: 64-bit with integer instructions, the multiply
extension, atomics, and single and double precision floating point). A
generous portion of the opcode space is left reserved for implementers or
researchers to add their own instructions.

In line with the proposed policy for adding a new target
(https://reviews.llvm.org/D23162), RISC-V has a clear specification, multiple
software models, and multiple FPGA implementations as well as prototype ASICs
from various groups. At lowRISC (http://www.lowrisc.org/), inspired by our
previous experience with the Raspberry Pi project, we are working towards
creating a completely open source RISC-V SoC and producing low-cost
development boards around it. Feel free to contact me off-list to discuss
lowRISC further. LLVM is a key part of our development plan, and with
community approval I would like to act as maintainer for the backend. The vast
majority of my LLVM work over the past 6 years has sadly been out-of-tree, but
I'm far from new to the project.

In the RISC-V community right now, GCC is by some way the more stable compiler
port. We've discussed best way of moving forward with LLVM at the last couple
of RISC-V Workshops and a number of us concluded a fresh codebase may be the
best way to move forwards. Producing a series of patches that introduce RISC-V
support incrementally in easy-to-review chunks with associated test cases at
every point also allows us to get the maximum benefit from LLVM's code review
procedure. It also provides a good basis for more detailed documentation on
writing an LLVM backend (and making modifications to an existing one, e.g.
making it much easier for a research group wanting to explore RISC-V changes).
This is an area I also hope to contribute to. The approach of small,
incremental patches is somewhat similar to what is being done with the AVR
backend. I'm grateful to David Chisnall who suggested that starting with the
MC layer may be a productive way to go about developing this backend, and so
far this seems to be working well.

The current status is that I have submitted a series of 10 patches
implementing assembler support and an initial set of relocations and fixups.
Help reviewing these would be very welcome, do let me know if you'd like to be
CCed in or added as a reviewer to future patches. I'd ultimately like the
RISC-V backend to be considered a "reference" backend, and as such
I specifically welcome reviews you might worry are pedantic.

Please find the current set of patches for your review here:
* <https://reviews.llvm.org/differential/?authors=asb>

I've obviously spent a lot of time with the MC layer recently, and I'd be
happy to put that to use in helping review MC patches for other archs.

Mini development roadmap:
* Complete MC layer (supporting up to RV32+RV64G at least)
  * There is currently no specification for supported RISC-V assembly syntax,
  mnemonics etc. The ideal solution may not always be "whatever the GCC port
  currently does", so some aspect of this will involve discussions with the
  wider RISC-V software community.
* Codegen
* Compressed instruction set support (RVC)
* Benchmarking and comparison to GCC RISC-V (and potentially other archs)

Finally I'd like to give a prominent mention to Colin Schmidt, the UC Berkeley
student who has been maintaining the current out-of-tree RISC-V LLVM port
<https://github.com/riscv/riscv-llvm>. The RISC-V community owes him a debt of
gratitude.

All comments very welcome,

Alex

I think it's a great idea! RISC--V is a really interesting project,
and I've thought it was a shame that we don't have a backend in trunk
for a while. I'll see if I have any comments on the patches.

Tim.

From: "Alex Bradbury via llvm-dev" <llvm-dev@lists.llvm.org>
To: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Wednesday, August 17, 2016 4:14:38 AM
Subject: [llvm-dev] [RFC] RISC-V backend

Hi all,

I am proposing the integration of a backend targeting the RISC-V ISA.

RISC-V is a free and open instruction set architecture that was
originally
developed at UC Berkeley. Future development of the ISA specification
will be
handled by the 501(c)(6) non-profit RISC-V Foundation and its members
<https://riscv.org/membership/?action=viewlistings>. You can find
much more
information at the RISC-V website <https://riscv.org/>, including the
current
ISA specification <https://riscv.org/specifications/>. You might note
that
RISC-V defines 32-bit and 64-bit variants and also supports a
compressed
variant, allowing 16-bit instructions to be freely intermingled with
the
standard 32-bit representations. The standard is structured to allow
implementers to choose appropriate subsets to support, for instance a
micro-controller might support 'RV32I' (32-bit RISC-V with the
integer
instructions) and an application core running Linux might implement
RV64IMAFD
(commonly shortened to RV64G: 64-bit with integer instructions, the
multiply
extension, atomics, and single and double precision floating point).
A
generous portion of the opcode space is left reserved for
implementers or
researchers to add their own instructions.

In line with the proposed policy for adding a new target
(https://reviews.llvm.org/D23162), RISC-V has a clear specification,
multiple
software models, and multiple FPGA implementations as well as
prototype ASICs
from various groups. At lowRISC (http://www.lowrisc.org/), inspired
by our
previous experience with the Raspberry Pi project, we are working
towards
creating a completely open source RISC-V SoC and producing low-cost
development boards around it. Feel free to contact me off-list to
discuss
lowRISC further. LLVM is a key part of our development plan, and with
community approval I would like to act as maintainer for the backend.
The vast
majority of my LLVM work over the past 6 years has sadly been
out-of-tree, but
I'm far from new to the project.

In the RISC-V community right now, GCC is by some way the more stable
compiler
port. We've discussed best way of moving forward with LLVM at the
last couple
of RISC-V Workshops and a number of us concluded a fresh codebase may
be the
best way to move forwards. Producing a series of patches that
introduce RISC-V
support incrementally in easy-to-review chunks with associated test
cases at
every point also allows us to get the maximum benefit from LLVM's
code review
procedure.

Yes, this is exactly the process we should follow when practical. I'm in favor of this effort.

-Hal

+1

Thanks Alex!

In the RISC-V community right now, GCC is by some way the more stable
compiler
port. We've discussed best way of moving forward with LLVM at the last
couple
of RISC-V Workshops and a number of us concluded a fresh codebase may be
the
best way to move forwards. Producing a series of patches that introduce
RISC-V
support incrementally in easy-to-review chunks with associated test cases
at
every point also allows us to get the maximum benefit from LLVM's code
review
procedure. It also provides a good basis for more detailed documentation on
writing an LLVM backend (and making modifications to an existing one, e.g.
making it much easier for a research group wanting to explore RISC-V
changes).
This is an area I also hope to contribute to. The approach of small,
incremental patches is somewhat similar to what is being done with the AVR
backend. I'm grateful to David Chisnall who suggested that starting with
the
MC layer may be a productive way to go about developing this backend, and
so
far this seems to be working well.

This sounds like a great plan. The only concern I have is that the process
actually finish, and not end up in a limbo state for a year.

I haven't actually been following the story of the AVR backend at all, but
afaik the current status is that there's a partially completed AVR backend
in trunk that's been under construction for a year or so, and a functional
backend in another repository, which people actually use. However that
situation came to pass, it seems a very unfortunate state to be in, and it
would be sad if this rewrite of the RISC-V backend ended up with the
upstream repository having the same mostly-unusable status for RISC-V.

Note, I want to stress I am __not__ disagreeing with your proposed plan! I
just want to mention that concern.

I'd ultimately like the

RISC-V backend to be considered a "reference" backend, and as such
I specifically welcome reviews you might worry are pedantic.

Yes, it would probably make a lot of sense to transition
the WritingAnLLVMBackend document from describing the SPARC backend to the
RISC-V backend, at some future point.

Mini development roadmap:
* Complete MC layer (supporting up to RV32+RV64G at least)

  * There is currently no specification for supported RISC-V assembly

syntax,
  mnemonics etc. The ideal solution may not always be "whatever the GCC
port
  currently does", so some aspect of this will involve discussions with the
  wider RISC-V software community.

I would suggest starting CodeGen before getting too far into MC, since a
lot of things about the instruction definitions are only for codegen, it's
easy to make something that works for MC but isn't structured right for
codegen. That is to say, I'd recommend starting codegen after doing MC
through RV32G+RV64G at *most*, rather than at *least*, and it might even
make sense to start on CodeGen support after only RV32I/RV64I.

In the RISC-V community right now, GCC is by some way the more stable
compiler
port. We've discussed best way of moving forward with LLVM at the last
couple
of RISC-V Workshops and a number of us concluded a fresh codebase may be
the
best way to move forwards. Producing a series of patches that introduce
RISC-V
support incrementally in easy-to-review chunks with associated test cases
at
every point also allows us to get the maximum benefit from LLVM's code
review
procedure. It also provides a good basis for more detailed documentation
on
writing an LLVM backend (and making modifications to an existing one, e.g.
making it much easier for a research group wanting to explore RISC-V
changes).
This is an area I also hope to contribute to. The approach of small,
incremental patches is somewhat similar to what is being done with the AVR
backend. I'm grateful to David Chisnall who suggested that starting with
the
MC layer may be a productive way to go about developing this backend, and
so
far this seems to be working well.

This sounds like a great plan. The only concern I have is that the process
actually finish, and not end up in a limbo state for a year.

I haven't actually been following the story of the AVR backend at all, but
afaik the current status is that there's a partially completed AVR backend
in trunk that's been under construction for a year or so, and a functional
backend in another repository, which people actually use. However that
situation came to pass, it seems a very unfortunate state to be in, and it
would be sad if this rewrite of the RISC-V backend ended up with the
upstream repository having the same mostly-unusable status for RISC-V.

Note, I want to stress I am __not__ disagreeing with your proposed plan! I
just want to mention that concern.

That's a fair concern. I think the upstreaming of AVR has been slower
than hoped because
1) as I understand it, it's a spare time project for everyone involved
2) Dylan has found it difficult to get code reviewers - it remains to
be seen how problematic that will be for RISC-V, but there's certainly
a lot of RISC-V interest
3) AVR is I think in general a more difficult target

lowRISC CIC (the UK not-for-profit we set up to support the lowRISC
efforts) is now becoming my full time focus, and as part of that the
majority of my time will, for some time, be on RISC-V LLVM. To move
faster, we of course welcome additional support in the form of either
engineering time or sponsorship from any parties interested in open
source hardware, the RISC-V ecosystem, or LLVM+RISC-V more
specifically.

Mini development roadmap:
* Complete MC layer (supporting up to RV32+RV64G at least)

  * There is currently no specification for supported RISC-V assembly
syntax,
  mnemonics etc. The ideal solution may not always be "whatever the GCC
port
  currently does", so some aspect of this will involve discussions with
the
  wider RISC-V software community.

I would suggest starting CodeGen before getting too far into MC, since a lot
of things about the instruction definitions are only for codegen, it's easy
to make something that works for MC but isn't structured right for codegen.
That is to say, I'd recommend starting codegen after doing MC through
RV32G+RV64G at *most*, rather than at *least*, and it might even make sense
to start on CodeGen support after only RV32I/RV64I.

That's a really good point. I was hoping to have completed some
CodeGen to have fully proven the MC-first approach before submitting,
but getting everything lined up to move to lowRISC full-time took more
time than I anticipated. I think the milestone I really want to hit is
where I can easily cross-validate against gcc - i.e. assembling its .s
output. I think we're actually almost at the point where I can do
that, and I agree there's a lot of value in getting the CodeGen
support well underway. Not least, once there's a reasonable CodeGen
and MC baseline I think it will be somewhat easier for more people to
work in parallel on additional features and optimisations. I'll aim to
move to CodeGen ASAP.

Thanks,

Alex

The problem is nobody is reviewing it. I've reviewed a number of patches, but the current set of ones up for review are for MC areas I'm not the best person for

-Matt

That’s extremely unfortunate. Our review systems make it way too easy for reviews to fall through the cracks and get lost forever.

Aren’t code owner supposed to help find reviewers?
From the developer policy: "The sole responsibility of a code owner is to ensure that a commit to their area of the code is appropriately reviewed, either by themself or by someone else.“

(OK it mentions “commit” and not “patch”, but that would seem like a pedantic distinction to me)

Slightly off-topic, but if you want to port the entire toolchain to RISC-V, you may want to add RISC-V support to LLD. I took a quick look at the specification a few months ago and found that that’s pretty straightforward EFL ABI, so I expect you only need a few hundred lines of new code to support RISC-V. I actually tried to do that at that moment as my weekend project but gave up because I found that no code was upstreamed.

Then you fall into the problem of finding the right code owner to
review. A new back-end falls into a multitude of domains, most of them
with different code owners, some of them without any (citation
needed).

Another issue is what I mention in the "new targets policy" as the
maturing process. Reviewing a patch into an existing piece of code is
orders of magnitude simpler than reviewing a whole new target, and
code owners may feel entirely incompetent to review code that they
have no idea how it should work (haven't read the ISA/ABI documents,
doesn't know the target, etc).

It boils down to nagging from the people who proposed the patch/es in
the first place, as was repeatedly mentioned in the target acceptance
and developer policies discussions of the past.

Many of us have limited attention span... (squirrel!).

cheers,
--renato

I'd love LLD support - I'll eventually get round to it if nobody else
does but I'd really welcome someone more familiar with LLD internals
to take it on. Rafael has also expressed an interest, so that's two
people who are massively over-qualified in that respect :slight_smile:

Best,

Alex

I am proposing the integration of a backend targeting the RISC-V ISA.

+1!

In line with the proposed policy for adding a new target
(https://reviews.llvm.org/D23162), RISC-V has a clear specification, multiple
software models, and multiple FPGA implementations as well as prototype ASICs
from various groups. At lowRISC (http://www.lowrisc.org/), inspired by our
previous experience with the Raspberry Pi project, we are working towards
creating a completely open source RISC-V SoC and producing low-cost
development boards around it. Feel free to contact me off-list to discuss
lowRISC further. LLVM is a key part of our development plan, and with
community approval I would like to act as maintainer for the backend. The vast
majority of my LLVM work over the past 6 years has sadly been out-of-tree, but
I'm far from new to the project.

The policy has been updated and accepted by Chris and is now at:

http://llvm.org/docs/DeveloperPolicy.html#new-targets

Basically, in addition to the previous proposal, it requires a code
owner to come forward, which you just did. :slight_smile:

Code owner: check
Community: check
Compatible code: check
Policies: AFAICS, check
License: check
Docs / Impl: check

The code seem to have been reviewed and largely accepted, and your
responses to code review were quick and good.

From what I can see, the RISC-V target & community checks all the boxes.

It also provides a good basis for more detailed documentation on
writing an LLVM backend (and making modifications to an existing one, e.g.
making it much easier for a research group wanting to explore RISC-V changes).

This would be fantastic!

I've obviously spent a lot of time with the MC layer recently, and I'd be
happy to put that to use in helping review MC patches for other archs.

This also checks the box for "helpful community". :slight_smile:

Mini development roadmap:
* Complete MC layer (supporting up to RV32+RV64G at least)
  * There is currently no specification for supported RISC-V assembly syntax,
  mnemonics etc. The ideal solution may not always be "whatever the GCC port
  currently does", so some aspect of this will involve discussions with the
  wider RISC-V software community.

Maybe some more documentation is in order, but this can start slow and
converge to a future standard.

One problem that might happen is that GNU asm output will have to be
accepted, so in doubt, following what they do would be the least
amount of work. But in the long run, you'll want something consistent
and well written (to be considered the go-to back-end), and if the
RISC-V community decides LLVM is the default compiler, following an
agreed spec, then GCC will have no option but to follow the spec.

* Codegen
* Compressed instruction set support (RVC)
* Benchmarking and comparison to GCC RISC-V (and potentially other archs)

What about buildbots?

I'm assuming "check-all" would be enough for now, but you'll have to
have at least one buildbot that builds the back-end (which for now
will be experimental, and will need an additional CMake flag).

But in the long run, you'll want to run the test-suite, even if on a
simulator, and who knows, maybe even self-host Clang in your target!

cheers,
--renato

Good question, I didn't mention buildbots in this RFC as from a quick
look at http://lab.llvm.org:8011/builders it didn't look like
early-stage architecture ports tend to have one, and as you say
check-all should be be enough initially. I'm sure that we (i.e.
lowRISC CIC) can support an additional buildbot when appropriate. Is
there any recommendation on minimum specification? At what point do
you think providing an extra buildbot would become a priority? If any
additional value can be provided by doing so I'd definitely like to
have a buildbot before RISC-V becomes an 'official' rather than
'experimental' arch.

Best,

Alex

Good question, I didn't mention buildbots in this RFC as from a quick
look at http://lab.llvm.org:8011/builders it didn't look like
early-stage architecture ports tend to have one, and as you say
check-all should be be enough initially.

They normally don't. But your target won't be tested by any other
buildbot unless it's built by default, which only happens when it's
made official.

So, either you have some local validation (buildbot, weekly build +
check-all, doesn't matter), with your target built in, or you won't
know when your tests regress.

I'm sure that we (i.e.
lowRISC CIC) can support an additional buildbot when appropriate. Is
there any recommendation on minimum specification?

If you have a server which can do some LLVM builds (can be any arch),
then you just create a buildslave and add
-DLLVM_TARGETS_TO_BUILD=RISCV to the CMake options, running check-all.

This doesn't need to be public, but you don't want to find test
failures only when we move your target to official, then it breaks
*all* buildbots, etc.

At what point do
you think providing an extra buildbot would become a priority? If any
additional value can be provided by doing so I'd definitely like to
have a buildbot before RISC-V becomes an 'official' rather than
'experimental' arch.

Official arches should have at least some testing. Many official
arches test on other bots (like BPF and Lanai building on x86_64 bots)
and this could be the case of RISCV.

Of course, more bots / configurations are always welcome, but it will
depend on the target and the community's engagement.

cheers,
--renato

+1!

Good question, I didn't mention buildbots in this RFC as from a quick
look at http://lab.llvm.org:8011/builders it didn't look like
early-stage architecture ports tend to have one, and as you say
check-all should be be enough initially.

They normally don't. But your target won't be tested by any other
buildbot unless it's built by default, which only happens when it's
made official.

So, either you have some local validation (buildbot, weekly build +
check-all, doesn't matter), with your target built in, or you won't
know when your tests regress.

Obviously `./bin/llvm-lit -s -i ../test` is one of my most frequently
executed commands, but we definitely want automation to pick up issues
caused by changes elsewhere.

I'm sure that we (i.e.
lowRISC CIC) can support an additional buildbot when appropriate. Is
there any recommendation on minimum specification?

If you have a server which can do some LLVM builds (can be any arch),
then you just create a buildslave and add
-DLLVM_TARGETS_TO_BUILD=RISCV to the CMake options, running check-all.

This doesn't need to be public, but you don't want to find test
failures only when we move your target to official, then it breaks
*all* buildbots, etc.

Thanks, I didn't realise nobody was running a public buildbot already
that built all experimental archs - though of course that makes sense.
In that case I'll prioritise getting something set up.

Best,

Alex

I just went through the previous discussions about noisy buildbots
again. One of the outcomes from that discussion was the setting up of
the "silent staging buildbot" at lab.llvm.org:8014/ which sounds
perfect for this case. It seems to me the ideal approach would be to
add a silent bot that builds RISC-V. Perhaps even better, one that
builds all experimental architectures.

If there's someone out there who thinks this would be useful and has
spare computing power and already has a working setup to clone (i.e.
the marginal cost for you of running one more buildslave is small)
that would be a very welcome contribution. Otherwise, I'll try to get
something set up some time in the next week or two.

Best,

Alex

As a maintainer of your own Experimental backend, you’re likely be the only one looking at your silent bot, building all the experimental targets means that when (for instance) the AVR backend is broken by an API change, your build will fail till the AVR folks fix it.
You may want to limit the number of `false positive` in build break and only build RISC instead.

That'd be the best signal-to-noise ratio, yes.

And yes, the 8014 master is the ideal place for that. Just follow the
production buildbots guidelines, and replace 8011 for 8014 and all is
set.

cheers,
--renato