[hi, please do cc me to maintain thread integrity, i am subscribed digest]
Hi Luke and welcome to the LLVM community.
... oh! hi alex! thanks
There is a total of ZERO new RISCV instructions, the entire design is based
around CSRs that implicitly mark the STANDARD registers as "vectorised",
It's worth noting the fact that there are zero new RISC-V instruction
encodings doesn't mean it's necessarily easier to support vs a
proposal that introduces new instructions.
appreciate the insight.
LLVM would have to be
taught how to handle this register bank switching / redirection scheme
and how to minimise the number of switches required. This does have
the potential to be somewhat intrusive. It reduces work for the MC
layer (assembler/disassembler), but the code generator would still
need to understand the semantics of these overloaded instructions.
indeed. there's a major difference between SV and RV here, which
stems from the use of the standard register file. SV's SETVL *has* to
guarantee that when the #immediate is set to e.g. 16, that if srcreg
is >= 16, VL *must* be set to 16. this to ensure that if it is used
for a single hit (i.e. with no looping), for example in a
context-switch or LD/ST MULTI substitute, that the LD/ST or
context-switch can be achieved in two and only two instructions [minus
CSR setup]: SETVL and the LD/ST/other-op.
as a first pass these kinds of... interesting semantics would
probably be a good idea to skip, and instead break the (now-extended)
register file into two groups: x0-x31 (standard regfile) and x32-x63
which would be utilised by an SV-aware MC layer. the "standard"
x0-x31 regs would be treated as "scalar", the top ones treated as
"RV-like". a bit like SSE, in other words. what do you think?
btw my primary focus here is to do the research into what's the most
practical path for empowering jake to create a Libre 3D GPGPU.
1. I note that the separation between LLVM front and backend looks like
adding SV experimental support would be a simple matter of doing the backend
assembly code translator, with little to no modifications to the front end
needed, would that be about right? Particularly if LLVM-RV already adds a
variable length concept.
As with most compilers you can separate the frontend, middle-end and
backend. Adding SV experimental support would definitely, as you say,
require work in the backend (supporting lowering of IR to machine
instructions) but potentially also middle-end modifications (IR->IR
transformations) to enable the existing vectorisation passes.
can you elaborate on that at all, or point me in the direction of
some docs? or is it something that's... generally understood? if i
can translate what you're saying to concepts that i understand ffrom
prior experience: the people who helped developed pyjs (a
parser (actually... just took parts of lib2to3, wholesale!), then
added in "AST morphers" which would walk the AST looking for patterns
format, and *then* handed it over to the JS outputter.
the biggest of these was, if i recall the one that massively rewrote
the AST to add proper support for python "yield".
if i understand correctly, the intermediary morphing (IR->IR) is i
*think* the same thing, does that sound about right? that you have an
IR, but that, for the target ISA which has certain concepts that are
less (or more) efficient, the IR needs rewriting from the
"general-purpose" original down to a more "architecturally-specific"
2. With there being absolutely no new instructions whatsoever (standard
existing AND FUTURE scalar ops are instead made implicitly parallel), and
given the deliberate design similarities it seems to me that SV would be a
good first experimental backend *ahead* of RVV, for which the 240+ opcodes
have not yet been finalised. Would people concur?
I'm not convinced it would actually be an easier starting point and I
anticipate quite a lot of work describing these new instruction
semantics and teaching LLVM how to use them.
For clarity, is this something you're proposing to be done directly in
upstream LLVM, or something you're asking for advice on in an (at
least initially) downstream project?
i don't know yet: i do know that, ultimately, it'll need to be
upstreamed. it's just not possible otherwise to have a goal with the
word "libre" attached to it. if the M-Class SoC takes off, it would
end up in hundreds of millions of ubiquitous devices (primarily and
initially in india) - smartphones, tablets, netbooks and so on - and
if there's even the sniff of a proprietary library or anything that's
*not* fully upstreamed the site hosting it will be deluged with
complaints from libre advocates (due to the proprietary
libraries/applications), and, due to the sheer number of units,
absolutely deluged with downloads.
from a development perspective, being able to coordinate disparate
groups via upstream repositories would be... a lot easier, but not a
showstopper if they weren't. but ultimately everything does have to
3. If there are existing patches, where can they be found?
Robin Kruppe is the main person working on RVV support. I'm not sure
if patches have been made available anywhere at this point.
ah! yes, sorry, hi robin, loved the RFC. do you have anything
available that could be looked over?
4. From Jeff Bush's Nyuzi work It has been noted that certain 3D operations
are just far too expensive to do as SIMD or vectors. Multiple FP ARGB to
24/32 bit direct overlay with transparency into a tile is therefore for
example a high priority candidate for adding a special opcode that must
explicitly be called. Is this relatively easy to do and is there
documentation explaining how?
Adding a new instruction and making it available through inline
assembly or intrinsics is pretty easy.
ok great, good to hear.
I did a mini-tutorial on this
at the RISC-V Workshop in Barcelona and really should tidy up and
publish the extended materials I started on this subject.
no rush, here.
It is worth emphasising that this shall not be a private proprietary hard
fork of llvm, it is an entirely libre effort including the GPGPU (I read
Alex's lowRISC posts on such private forking practices, a hard fork would be
just insane and hugely counterproductive), so in particular regard to (4)
documentation, guidelines and recommendations likely to result in the
upstreaming process going smoothly also greatly appreciated.
One additional thought: I think RISC-V is somewhat unique in LLVM in
that implementers are free to design and implement custom extensions
without need for prior approval. Many such implementers may wish to
see upstream LLVM support for their extensions.
i don't know if you followed the isa-conflict-resolution discussion
(which was itself ironically... full of conflict), i am... well,
there's no easy way to say this, so i'll just say it straight: just as
with gcc, if upstream LLVM accepts such extension support upstream
(which implies that, publicly, that opcode is now permanently and
irrevocably world-wide "taken over" and is *permanently* and
implicitly *exclusively* reserved by that implementor) without them
being able to switch it off, i.e. having something that's exactly or
is orthogonal to the isa-mux proposal (which is exactly like the 386
"segment offset" concept... except for instructions), LLVM-RISC-V will
get into an absolute world of pain.
the very first time that LLVM has to generate (support) two uses of
the exact same binary instruction encoding with two completely
different meanings, that's it: RISC-V will be treated exactly like
Altivec / SSE for PowerPC, i.e. dead.
so please, please, guys, for goodness sake, please: when it comes to
upstreaming non-standard custom extensions that "take over" opcode
space in ways that *can't be switched off*, please for goodness sake
put your foot down and say "no, sorry, you'll have to maintain this
yourself as an unsupported hard fork".
think about it: you let even *one* team make a public declaration "we
effectively own this custom opcode, now", that's it: nobody else,
anywhere in the world, can ever publicly consider using it. and you
know how few custom opcodes there are [in the 32-bit space]. two.
i can't.... i can't begin to express how absolutely critical this is,
to the entire RISC-V ecosystem.
For any open source
project, it's normal to consider factors such as the following when
considering new contributions:
* Potential value to the project (will there be users?)
* Potential cost to the project (what is the maintenance burden? Is
someone stepping up to maintain the addition?)
* Is it stable? i.e. is the design and external interfaces finalised?
If not, is the level of instability compatible with the project's
release process ad sufficiently explained in docs etc.
appreciated. well, i can say that the people i've encountered are
extremely competent: Jake for example is amazing. and i'm used to
dealing with software libre projects. i'll make sure that the
different groups/contributors minimise impact.
yes, SV is sufficiently straightforward that it's not going to
significantly change. i say that, but i did reconsider the CSR
element width meanings recently so that RV32G could transfer 64-bit
ints to 64-bit floats with a single (meaning-overloaded) instruction,
whilst i can categorically say that there's not going to be major
redesigns (i don't anticipate any being needed, the RV work is the
base foundation and that's had *literally* years of thought put into
it), small changes as issues are encountered are... kiiinda going to
be inevitable, and can only realistically really be worked out as and
when an actual implementation gets underway.
Support for any standard RISC-V Foundation published extensions is
easy to justify.
indeed. for standard extensions, the RV Foundation acts as the
atomic arbiter to ensure and guarantee world-wide exclusive unique
meaning of any given opcode. that's their role, it's the purpose of
the Certification Mark, and they'll do that job very very well.
Also for custom extensions with shipping hardware
that is programmable by end-users.
ngggh... *sigh* ok much as i'd like to keep this topic on-track, the
following is very important to be aware of. the custom opcode space
(and the practice of overriding even standard opcodes) is where the
RISC-V Foundation has unfortunately failed to comprehend the nature of
the problem, and has effectively passed the burden of responsibility
for atomically curating the public opcode space over to the FSF (in
the case of gcc) and to the LLVM team (in the case of LLVM). this um
may be news to you.
we know for a fact, from many many historic examples, with PowerPC
Altivec/SSE being the most publicly well-known one, that dual or
greater *public* binary ISA encoding conflicts (private ones are not a
problem at all) quite literally kill the entire ISA as it completely
violates the expected contract that any binary will have one and only
so if we want RISC-V to not be killed off stone-dead, it's absolutely
critically important that this contract never be violated.
and.. um... the responsibility for guaranteeing that that be the
case... has been punted.... to *you*, the LLVM team. that may.. um...
come as an alarming surprise.
examples which have been *successful* in other architectures - and
not burdensome at all - are those which dynamically set a CSR which
changes big-endian / little-endian meaning of LD/ST instructions.
these have an *exact* orthogonal equivalent system to the
Cases such as experimental
non-standard extensions that haven't (yet) shipped in hardware might
require more examination on a case-by-case basis. [Just sharing my
initial thoughts here rather than official LLVM policy.]
so, a couple of things:
firstly, the isa-mux proposal basically extends standard opcodes from
32-bit to say 33, 34 or however many extra (hidden) bits are needed to
*literally* change the meaning of 32-bit binary encoding(s). want an
encoding completely switched off and to fall back to standard RVbase
only? no problem: set muxid=0. done. transfer a binary to another
architecture and you want to guarantee that you will get an exception
trap that's world-wide unique and so can be software-emulated with
publicly-available libre libraries? no problem: the trap will have
access to the current "mux" setting from userspace so will know
*exactly* which (publicly published) custom extension it should
so the mvendorid-marchid-muxid becomes a globally world-wide unique
tuple for which it is absolutely flat-out impossible to have any kind
of conflict of the Altivec / SSE PowerPC type that killed *public* gcc
/ LLVM PowerPC vector support stone dead. nobody knew if a given
binary would work on their hardware, because they had no idea if the
binary encoding for an opcode was for Altivec or for its competitor...
so nobody bothered with either.
following the isamux concept, that situation is impossible to
encounter. therefore, i will be (and have been) making it absolutely
clear to people that the 3D and SV custom extensions *will* be
run-time dynamically muxable (i.e. absolutely guaranteed to be
secondly, over the long term, there's going to need to be quite a bit
of sustained coordination, world-wide, between completely different
inter-dependent groups. i've yet to track down someone who can do the
modifications to spike, for example, and they may be someone who
doesn't know (and shouldn't have to know) about LLVM, or even the
Libre 3D GPGPU project. if however i can point them at an upstream
branch / repo for llvm and say to them "run this test code", things
get a heck of a lot easier (i.e. don't go rapidly out-of-date, follow
and keep up-to-date with standard practices)... you get where i'm
going with that, i'm sure.
ok, that's probably enough for now, apologies i am taking on a new
project today, i may be a little delayed in responding. this is
however very important so i will be putting some thought into replies.