To write a compiler for Microsoft Direct3D shaders from our hardware,
I have a program which translates the Direct3D shader assembly to LLVM
assembly. I added several intrinsics for this purpose.
It's a vector ISA and has some special instructions like:
* rcp (reciprocal)
* frc (the fractional portion of each input component)
* dp4 (dot product)
* exp (exponential)
* max, min
These operations are very specific to 3D graphics and missing from the
LLVM instructions. The vector LLVM extension is not enough to compiled
The result LLVM assembly is assembled by llvm-as, and directly passed
to llc. The frontend is missing from the picture. The reasons are
simple. The transformations/optimizations in the frontend
1) don't understand the intrinsic
2) don't deal with packed type (it's not vectorized)
I consider to add new instructions, instead of intrinsic, to LLVM.
However, there are two options.
In the vector LLVM extension, there are dedicated instructions
manipulating the vectors like 'extract', 'combine', and 'permute'. DSP
and other scientific programs do not permuate the vectors as frequent
as 3D programs do. Almost each 3D instruction requires to permuate its
operands. For example:
// Each register is a 4-component vector
// the names of the components are x, y, z, w
add r0.xy, r1.zxyw, r2.yyyy
The components of r1 and r2 and permuted before the addition, but the
permeation result is _not_ written backed to r1 and r2. 'zxyw' and
'yyyy' are the permutation patterns (they are called 'swizzle').
'xy' is called the write mask. The result is written to only x and y
component of r0. z and w are left untouched.
_Almost each_ instruction specifies different write masks and
swizzles. There will be a lot of extract, combine, and permute LLVA
instructions. It may make the transformations difficult to match a
certain pattern in the program semantic tree. For example, to match
'mul' and 'add', and merge them to a single instruction 'mad'
(multiple-and-add). For another example, to vectorize several scalar
add r0.xy, r1.xy, r2.xy
add r0.zw, r1.zw, r2.zw
add r0.xyzw, r1.xyzw, r2.xyzw
If the write mask and swizzles are 'supported' in the each instruction
per se. The syntax/signature of LLVM assembly will need to be changed
<result> = add <ty> <var1>, <var2>
<result>.<writemask> = add <ty> <var1>.<swizzle>, <var2>.<swizzle>
This could be easier for the frontend transformations to
recognize/identify the real program semantics, without the additional
extract, combine, and permute instruction sequences.
From the point of view writing frontend vector transformation and
optimizations, which method is better?
1. Follow the vector LLVM extension style, using dedicated instruction
to manipulate the vectors.
2. Support writemask and swizzle (permuate) as part of the instruction syntax.
I worked on the backend and don't have much experience on the fronted.
Thank you all.