Adding custom operation intrinsic for ASIP architectures.

Hi,

I was talking with aKor in #llvm how we could implement custom operation support for our ASIP architecture. We came into solution that the best way would be to write new custom operation intrinsic and optimization pass for raising certain type of function calls to those intrinsics (similar to raising mallocs).

Basically our custom operation are like calls, with operand name and multiple inputs and outputs. e.g. C code:
__llvm__custom_op_add(a,b,c) would be raised to customop add(i32 %tmp1, i32 %tmp24 , i32 %tmp25). Those "__llvm__custom_op_" prefixed function will not have function body, but pure declarations in C code level.

Comments are welcome, especially if there anyone else, that needs this kind of functionality or if people has already implemented something similar.

Mikael Lepistö

Yes, this is very useful. Can we somehow integrate this into tblgen so that
targets can declare intrinsics they are interested in capturing and map them
to machine instruction sequences? I think a lot of architectures have need
for stuff like this and I would hate for a solution that only addresses those
needs in a piecemeal fashion.

                                                -Dave

Sure, this works. This is exactly the idea of the builtin functions in GCC. For example, in SSE, the __builtin_ia32_movntps function does a nontemporal store.

To answer David's question, we already have direct support for this in tblgen, for example, include/llvm/IntrinsicsX86.td contains:

let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
   def int_x86_sse_movnt_ps : GCCBuiltin<"__builtin_ia32_movntps">,
               Intrinsic<[llvm_void_ty, llvm_ptr_ty,
                          llvm_v4f32_ty], [IntrWriteMem]>;
}

and lib/Target/X86/X86InstrSSE.td contains:

def MOVNTPSmr : PSI<0x2B, MRMDestMem, (outs), (ins i128mem:$dst, VR128:$src),
                     "movntps {$src, $dst|$dst, $src}",
                     [(int_x86_sse_movnt_ps addr:$dst, VR128:$src)]>;

There is corresponding code in llvm-gcc to tell GCC how to handle this builtin. Is this what you're looking for?

-Chris

Chris Lattner wrote:

I was talking with aKor in #llvm how we could implement custom operation
support for our ASIP architecture. We came into solution that the best
way would be to write new custom operation intrinsic and optimization
pass for raising certain type of function calls to those intrinsics
(similar to raising mallocs).

Basically our custom operation are like calls, with operand name and
multiple inputs and outputs. e.g. C code:
__llvm__custom_op_add(a,b,c) would be raised to customop add(i32 %tmp1,
i32 %tmp24 , i32 %tmp25). Those "__llvm__custom_op_" prefixed function
will not have function body, but pure declarations in C code level.

Comments are welcome, especially if there anyone else, that needs this
kind of functionality or if people has already implemented something
similar.

Sure, this works. This is exactly the idea of the builtin functions in GCC. For example, in SSE, the __builtin_ia32_movntps function does a nontemporal store.

To answer David's question, we already have direct support for this in tblgen, for example, include/llvm/IntrinsicsX86.td contains:

let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_sse_movnt_ps : GCCBuiltin<"__builtin_ia32_movntps">,
Intrinsic<[llvm_void_ty, llvm_ptr_ty,
llvm_v4f32_ty], [IntrWriteMem]>;
}

and lib/Target/X86/X86InstrSSE.td contains:

def MOVNTPSmr : PSI<0x2B, MRMDestMem, (outs), (ins i128mem:$dst, VR128:$src),
"movntps {$src, $dst|$dst, $src}",
[(int_x86_sse_movnt_ps addr:$dst, VR128:$src)]>;

There is corresponding code in llvm-gcc to tell GCC how to handle this builtin. Is this what you're looking for?

Yes this is more or less, what we are looking for, except we need variable arguments for out intrinsic. I assume that it can be achieved by using llvm_vararg_ty as a argument type?

Is there need to make changes to llvm-gcc for supporting new GCCBuiltin types that I define or are all of them automatically converted to intrinsics on gcc side?

Mikael Lepistö

def MOVNTPSmr : PSI<0x2B, MRMDestMem, (outs), (ins i128mem:$dst, VR128:$src),
"movntps {$src, $dst|$dst, $src}",
[(int_x86_sse_movnt_ps addr:$dst, VR128:$src)]>;

There is corresponding code in llvm-gcc to tell GCC how to handle this builtin. Is this what you're looking for?

Yes this is more or less, what we are looking for, except we need variable arguments for out intrinsic. I assume that it can be achieved by using llvm_vararg_ty as a argument type?

Sure, that can be done. What sort of operations require variable arguments? It might be better implemented with pattern matching in the code generator than as an intrinsic.

Is there need to make changes to llvm-gcc for supporting new GCCBuiltin types that I define or are all of them automatically converted to intrinsics on gcc side?

No, unfortunately you also have to use the GCC builtin mechanism to teach gcc about them.

-Chris

Chris Lattner wrote:

  

def MOVNTPSmr : PSI<0x2B, MRMDestMem, (outs), (ins i128mem:$dst, VR128:$src),
"movntps {$src, $dst|$dst, $src}",
[(int_x86_sse_movnt_ps addr:$dst, VR128:$src)]>;

There is corresponding code in llvm-gcc to tell GCC how to handle this builtin. Is this what you're looking for?

Yes this is more or less, what we are looking for, except we need variable arguments for out intrinsic. I assume that it can be achieved by using llvm_vararg_ty as a argument type?
    
Sure, that can be done. What sort of operations require variable arguments? It might be better implemented with pattern matching in the code generator than as an intrinsic.
  

Need for variable arguments is that we don't know beforehand
which custom operations we support. So practically we just
give operation name and 1 or more parameters depending
how many parameters custom operation has.

e.g. __custom_operation("addsub", a, b, c)

In lowering of intrinsic function we check from the processor,
which type of parameters our current processor has for operation
named "addsub" and write corresponding native code.

Is there need to make changes to llvm-gcc for supporting new GCCBuiltin types that I define or are all of them automatically converted to intrinsics on gcc side?
    
No, unfortunately you also have to use the GCC builtin mechanism to teach gcc about them.
  

I already made optimizer pass for transforming certain function name
to intrinsic function. For me it seemed to be a lot easier choice.

Mikael Lepistö

The standard way of handling this is to add a bunch of separate intrinsics, on for each thing a processor supports. For example, take a look at llvm/include/llvm/IntrinsicsX86.td, which has almost 200 intrinsics.

-Chris