[RFC][RISCV] Add intrinsic and/or builtin functions by #pragma

Hi :

# TL;DR:

It's the intrinsic and/or builtin functions related issue again, in
this RFC we are trying to use pragma to import intrinsics and declare
intrinsic wrappers function to reduce the compilation time.

And here is the PoC for this RFC:
https://reviews.llvm.org/D103228

# Background:

RISC-V vector extension has defined 25,386 intrinsic and 2,102
overloaded intrinsic functions in riscv_vector.h which increase a lot
of compilation time; the header file contains ~60k lines for those
overload functions and intrinsic wrapper functions.

An empty file with include riscv_vector.h takes 0.395s on release
build and 8.067s second on debug build, and this also increases the
clang test time.

# Proposal:

Using Tablegen to generate the table of the intrinsic wrapper
functions and then using pragma to declare intrinsic wrapper
functions.

Syntax:

#pragma riscv intrinsic vector

Then import all builtin functions and intrinsic wrappers into the
symbol table, this could save lots of time parsing the prototypes of
the intrinsic wrapper function.

And this idea of trick is borrowing from AArch64/SVE's implementation on GCC:
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/arm_sve.h#L40

# Experimental Results:
## Size of riscv_vector.h:

IIUC OpenCL faced the same issue, and their solution was pretty clever and generalizable; a similar approach could conceivably improve compile speeds still further, while also minimizing memory usage and making pragmas unnecessary. https://lists.llvm.org/pipermail/cfe-dev/2021-February/067610.html

The basic idea if I recall (Anastasia cc’d might correct me), is to create the necessarily declarations whenever lookup fails. I.e., if lookup of vint32m1_t fails, before giving up clang checks if that is the name of one of your intrinsics; if so it adds the necessarily declaration/overloaded declarations (the particulars handled via Tablegen) and returns that.

The effect is to “instantiate” these declarations as needed, as if from a template.

What also seems nice about this approach is that heavy-duty users can alternatively choose to just #include the large header, or use a pre-compiled header, and thereby automatically avoid any costs associated with this last-ditch-lookup solution.

Hi David:

Thanks for your info, I investigate OpenCL intrinsic last few days,
and I saw OpenCL already use some #pragama to control the extenison
on/off.
So I think the mechnish is pretty simiular, the difference is OpenCL
apporache need to write a new td file to generate those helper
functions.

And our apparoch is extending existing builtin declare mechnish: add
one filed to record the enable contdition.

We consider pre-compiled header before, but seems like pre-compiled
header are not fit RISC-V scenario - having different -march
combination which will affect the content of the header, so it seems
not work for RISC-V intrinsic headers.

Thanks :slight_smile:

FYI, in case it helps we have started documentation about the internals of the
approach https://clang.llvm.org/docs/OpenCLSupport.html#opencl-builtins.
Although it is still a bit concise. There is not much OpenCL specific in the
approach we have implemented so it should be easily generalizable with some
renaming and minor refactoring (CC to Sven who might be able to provide more
info if needed). You might need to add a few special types if you use any that
we don’t have in OpenCL yet. Although we have covered a good variety from C99
already.

We have removed the need for the pragmas in the last commits but it is mainly
because it wasn’t useful in OpenCL in a way it was defined in the spec as it
was not similar to a header include. The TableGen based header include is very
fast compared to parsing the large header files so I can certainly recommend
this route.

Cheers,
Anastasia

Hi Anastasia:

Thanks for your explanation! My first impression is the implementation
is kind of OpenCL specific since there are lots of OpenCL term are
used in the implementation including the option name used in tablegen,
but the mechnich seems could be generalized.

We have removed the need for the pragmas in the last commits but it is mainly
because it wasn't useful in OpenCL in a way it was defined in the spec as it
was not similar to a header include. The TableGen based header include is very
fast compared to parsing the large header files so I can certainly recommend
this route.

Could you explain what the TableGen based header is ? Does it mean
OpenCLBuiltins.inc? or some other headers?

I guess we still needed for RISC-V since we don't want to import those
symbols until include riscv_vector.h, but that should not conflict
with the OpenCL built-in approach :slight_smile:

Thanks!

Thanks for your explanation! My first impression is the implementation
is kind of OpenCL specific since there are lots of OpenCL term are
used in the implementation including the option name used in tablegen,
but the mechnich seems could be generalized.

Exactly, the names we used are OpenCL specific but the logic can
apply to C-based languages universally. You might find it easy to decode
if you look at the relevant spec part:

https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#built-in-functions

Could you explain what the TableGen based header is ? Does it mean

OpenCLBuiltins.inc? or some other headers?

Yes, this is a hook to Sema that is generated by the TableGen from
OpenCLBuiltins.td description.

So if let’s say we wanted to generalize the logic we would rename from
“OpenCLBuiltinStruct” → “BuiltinStruct”. Because this is just a data
structure that describes a function with its overloads.

I imagine the only OpenCL-specific parts are Extensions and “gentype”.
The concept of “gentype” is taken from
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#math-functions
This is just a trick to make TableGen description and data structures much
smaller. I think it can be useful outside of OpenCL but only if there are
overloads with vector types for example.

I guess we still needed for RISC-V since we don’t want to import those
symbols until include riscv_vector.h, but that should not conflict
with the OpenCL built-in approach :slight_smile:

Indeed. In OpenCL language we don’t have a way to activate/load the
builtins so they are always available. This is why we activate the Tablegen
header by a flag.

Cheers,
Anastasia