porting AMDVLK to the Libre RISC-V 3D GPU: NLNet EUR 50, 000 Grant application

hi all,

[please do cc the libre-riscv-dev list for this discussion, thanks]

after speaking to Michiel from NLNet, i am looking to put in an
additional grant application for EUR 50,000, so that charitable
donations to developers are available in order to convert AMD's AMDVLK
driver, replacing its amdgpu LLVM IR backend with one that outputs
RISC-V assembler instead. in stages, that will become vectorised
assembler, and, later, will have special accelerated custom 3D opcodes
(texturisation, OpenCL opcodes such as atan2 etc.) added as well.

would anybody be interested to be the intended recipient(s) of such
charitable donations? (they would be tax-deductible in most
jurisdictions). universities may also be recipients, such that they
may then hire interns (or otherwise do what they wish). please do
note that Corporations may *not* be the recipient of an NLNet
charitable donation, but individuals are.

the deadline is oct 1st: there will be another opportunity (dec 1st)
however i would prefer to meet the oct 1st deadline.

the technical details - starting point and plan - is as follows:

* the code start-point is here: https://github.com/GPUOpen-Drivers/AMDVLK

* this code uses both the (Reference) Khronos SPIR-V to LLVM-IR
compiler - augmented - as well as a forked (being slowly updated to
mainline) version of LLVM

* the AMD-forked version of LLVM contains not only support for AMDGPU
texturisation, from what i gather from an analysis by Jacob Lifshay it
also is one of the only LLVM IR backends that support *explicit*
full-function vectorisation intrinsics. this is *not* the same as
sub-vector types (vec4, vec3 etc.)

* the "normal" versions of LLVM IR *lose* the explicit vectorsation
information during the front-end to back-end conversion process, and,
in the case of e.g. Vector backends such as the RISC-V RVV engine,
will "opportunistically" *reinstate* (recover) the very vectorisation
information that the AMDGPU SPIR-V to LLVM-IR already explicitly
carries through, as part of [some] of the IR-to-assembly conversion
passes.

* thus we cannot simply start from mainline LLVM because it simply
does not contain support for full-function explicit vector-looping
which AMD very deliberately added to their fork of LLVM.

* we need to *drop* the queue / pipe / etc. code within AMDVLK
(contained within AMD's "PAL" library), and replace it with direct and
explicit RISC-V assembler-generation. a good - perfectly acceptable -
starting point for this would be the current mainline RISC-V LLVM JIT
which has just been upstreamed.

this latter is the core of the work, and requires some ancillary explanation.

most GPUs are separated from the main CPU by way of shared memory -
usually but not always over a PCIe Bus. AMD's PAL library is
effectively a fully-functioning RPC subsystem that pushes AMDGPU
assembly code (compiled JIT on the *CPU*) over to the Radeon GPU,
pushes it the data it needs as well, carries out synchronisation and
blah blah blah you get the general idea, all of which is hugely
complicated.

the Libre RISC-V SoC is a *HYBRID* CPU / GPU. the CPU *IS* the GPU.
the GPU *IS* the CPU.

the accelerated texture assembly code instructions will be added *to
the CPU*. the YUV2RGB acceleration assembly code instructions will be
added *to the CPU*. the atan2 and other transcendental assembly code
instructions will be added *to the CPU*. all of these will be done
over time, on an ongoing basis, starting initially from "base" RISC-V
instructions.

thus what we need doing is actually a drastic *simplification* of the
AMDVLK assembly-generation code.

question (which more clearly illustrates where we are going with
this): why are we not just using swiftshader and be done with it?
surely you are doing a "software 3D driver", right?

https://github.com/google/swiftshader

the answer's "no, we are not".

the reason is quite simple: swiftshader was *specifically* designed to
be a software-only 3D GPU renderer, for use on *scalar* processors
that may - or may not - have SIMD instructions [NOT Vector Engines].

as such, as mentioned above, it *does not have explicit full-function
vectorisation support of any kind*. it may have support for
sub-vector types (vec3, vec4), but it *does not* have built-in support
for predication etc. all of which is vital information for a GPU and
was the whole reason why SPIR-V was created as an augmented version of
LLVM-IR in the first place.

in addition, swiftshader simply does not have support for
texturisation or any other *hardware* accelerated features present in
GPUs. this is reflected right back throughout the entire source tree
and it would be far too much work to try to add it.

so we are *starting* from a software-only 3D driver, and then adding
hardware-accelerated opcodes *to* that driver. note very very
specifically that there will *NOT* be an associated kernel driver
involved here, nor will those instructions require special
"privileges". they will be literally callable just like any other
opcode such as FMUL, FADD, and FDIV and so on.

it's quite a fascinating and exciting project, that, in its simplest
form, should be relatively straightforward. augmentations and
optimisations may be done incrementally to increase performance and
add experimental opcodes as the project progresses.

also please note: there's no actual contractual obligation, expressed
or implied. this *really is* donations. it's definitely *not*
"work-for-hire" and this is definitely *NOT* a "Job Proposal". if a
milestone is completed, you receive the donation (direct from NLNet,
*not* from me, or our team).

any questions please do ask. one important condition: we need at
least one person with an EU home address, who is also an EU citizen.
that EU citizen does not have to be living *in* the EU at the time of
the application. they can be a UK citizen, because no decision has
been made yet, there.

best,

l.