GCCBuiltin and Intrinsic Mapping

I've run into an issue specifying intrinsics for AVX.

Right now one can use GCCBuiltin to get automatic CBE (and other)
support for emitting intrinsics as gcc builtins. It looks like
this:

  def int_x86_sse3_hadd_pd : GCCBuiltin<"__builtin_ia32_haddpd">,
              Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty,
                         llvm_v2f64_ty], [IntrNoMem]>;

AVX has 128-bit instructions that work exactly like SSE instructions
except they have non-destructive operands. gcc defines intrinsics for
256-bit operations but does not define special intrinsics for 128-bit
AVX instructions. So one has to use the SSE intrinsics:

   def int_x86_avx_vhadd_pd_xmm : GCCBuiltin<"__builtin_ia32_haddpd">,
              Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty, llvm_v2f64_ty],
                        [IntrNoMem]>;

Unfortunately, this doesn't work:

/ptmp/dag/universal_build/merge/developer/DEFAULT/llvm/tblgen: Intrinsic 'int_x86_sse3_hadd_pd': duplicate GCC builtin name!

Apparently it's not possible to define two different LLVM intrinsics
that map to the same GCCBuiltin. Is this a known limitation? How
complicated would it be to lift this restriction?

                          -Dave

int_x86_avx_vhadd_pd_xmm doesn't exist on trunk. Why does it exist on
your branch if the semantics are exactly equivalent to
int_x86_sse3_hadd_pd? The register allocator can handle converting to
three-address form if the target provides the appropriate hooks.

-Eli

Eli Friedman <eli.friedman@gmail.com> writes:

int_x86_avx_vhadd_pd_xmm doesn't exist on trunk. Why does it exist on
your branch if the semantics are exactly equivalent to
int_x86_sse3_hadd_pd? The register allocator can handle converting to
three-address form if the target provides the appropriate hooks.

Because in some cases users may want to explicitly use non-VEX encoded
instructions. So we need to differentiate.

                            -Dave

Can you give an example of such a scenario?

In answer to your original question, it's probably just a matter of
messing with the relevant generator in TableGen, relatively
straightforward. Your syntax is probably insufficient, though: how
will the table generator decide which intrinsic to use for
__builtin_ia32_haddpd coming from a frontend?

-Eli

I don't see why one would like to emit 256-bit wide reg instructions and at the
same time non-VEX encoded 128-bit ones. For cases like this one can compile
the former alone and then link with regular sse code, right?

Yep. I don't see any reason either.

-eric

Eli Friedman <eli.friedman@gmail.com> writes:

Eli Friedman <eli.friedman@gmail.com> writes:

int_x86_avx_vhadd_pd_xmm doesn't exist on trunk. Why does it exist on
your branch if the semantics are exactly equivalent to
int_x86_sse3_hadd_pd? The register allocator can handle converting to
three-address form if the target provides the appropriate hooks.

Because in some cases users may want to explicitly use non-VEX encoded
instructions. So we need to differentiate.

Can you give an example of such a scenario?

Simulator validation, for example.

In answer to your original question, it's probably just a matter of
messing with the relevant generator in TableGen, relatively
straightforward. Your syntax is probably insufficient, though: how
will the table generator decide which intrinsic to use for
__builtin_ia32_haddpd coming from a frontend?

These are GCC builtins. They only get used by the C generating backend
and I use them in debug scenarios. In those cases I don't really care
which gets chosen.

For "normal" usage the codegen knows exactly what to emit (the actual
machine instruction).

                          -Dave