Best way to integrate special-purpose hardware with LLVM

Dear all,

in a research project, we are developing a special-purpose
floating-point accelerator, together with C++/OpenMP offloading support
with LLVM.

Since our hardware has a limited, special-purpose instruction set, it is
often a challenge for us to decide on how to integrate with LLVM's
general-purpose concepts such as IR and instruction selection. Hence, I
would appreciate a lot if we could get some feedback w.r.t. our current
approach. Let me sketch an example.

Our accelerator has no instruction for loading/writing floating point values from or to memory using a given address. Instead: 1) The data layout of a structured volume of data in memory can be hardware-configured, and then load and store instructions have a special form with offsets around the "center point" in the data volume which is currently processed. 2) We can write values to hardware registers and use them in computations.

Now, let's assume we have C++ code like the following:

float array[5] {1, 2, 3, 4, 5}.

#pragma omp target
   float a = array[0];

In the IR we get, we find fragments like this:

%arrayidx = getelementptr inbounds [5 x float], ..., i32 0
%1 = load float, float* %arrayidx, align 4

If we leave it like this, in the created selection DAG there will be a lot of instructions we do not conceptually have on our hardware, and detecting the necessary changes at instruction selection stage is complex and hard.

Hence, we chose method 2) (use register) from above in this case and do the following:

1) Define a target-specific intrinsic "array_access" which captures
information about the accessed array and the offset.

2) Replace the two instructions (getelementptr, load) from above by a
single "array_access" intrinsic in a special IR pass. This is a kind of "meta-instruction" which is here for conveying necessary information to instruction selection phase.

3) Lower the array_access intrinsic to a pseudo-instruction we also define for our hardware, to get from IR to selection DAG.

4) Replace the pseudo instruction by a hardware register in instruction selection. This creates valid machine code. (At runtime, we configure the register correctly before executing the machine code.)

I would be interested in feedback on several things here:

-Is there an "intended way" of integrating such special-purpose hardware with LLVM?

-Is it a good idea to introduce an intrinsic for the described purpose, or does this violate some architectural ideas of intrinsics or LLVM in general?

-We do not want to lose any relevant information on the IR or selection DAG. For example, in the original form, general LLVM functionality will be aware of what getelementptr and load means, so data dependencies will not be broken. In our "meta-instruction" (intrinsic) form, there might be a risk of either LLVM passes creating invalid code since they do not know what our meta-instructions mean, or losing the ability to employ optimization functionality present in LLVM due to unknown semantics of our meta-instructions. What would be the right way to handle this?

-How to cope in general with the fact that a target architecture has no instructions for certain concepts such as integer computation, address calculations, "standard" memory accesses, and so on. Are there known projects which had to tackle this?

-When modifying IR or selection DAG as above, how to get the necessary "metadata" from IR to instruction selection or to runtime? (In this case the data on the accessed array and offset or register)

To summarize: What would be the best "architectural approach" to integrate support for such a kind of special-purpose hardware in LLVM? Should we e.g. include "pseudo instructions" in and "pseudo registers" in to be able to integrate with the general-purpose LLVM concepts and in the end treat instructions/registers in a special way? Or can we avoid having to do so earlier?

Any feedback on this would be greatly appreciated.


Kai Plociennik