SSE examples

Does anyone have any LLVM IR examples implementing things using the
instructions for SSE, like complex arithmetic or 3D vector-matrix stuff?

I'd like to have HLVM use them "under the hood" for some things but I cannot
see all of the operations that I was expecting (e.g. dot product) and am not
sure what works when (e.g. "Not all targets support all types however.").

Does anyone have any LLVM IR examples implementing things using the
instructions for SSE, like complex arithmetic or 3D vector-matrix stuff?

I don't have any examples...

I'd like to have HLVM use them "under the hood" for some things but I cannot
see all of the operations that I was expecting (e.g. dot product) and am not
sure what works when (e.g. "Not all targets support all types however.").

LLVM is probably implementing actual SSE, and not a wrapped/extended form.
it can be noted that, apart from some very new SSE variants (aka: very new CPUs), operations like dot-product don't exist, and so would have to be simulated (for example, by serializing the values to the stack and running them through the FPU or similar...).

so, in general, if one wants specific vectors (such as geometric vectors, quaternions, ...), one usually has to implement them in terms of the existing matrix operations.

I don't know what the best way to deal with this in LLVM would be, someone else may have a better idea.

as for what targets support which operations, in the case of SSE, go check the Intel and AMD64 docs.
it can be noted that most processors around now support SSE2, but not as many support newer (SSE3/SSSE3, SSE4, ...).

note that Intel and AMD have had a split over the issue:
Intel implements SSE3 and SSE4;
AMD implements parts of SSE3 and SSE4, but not other parts;
AMD is implementing SSE5, but it uses instructions which Intel does not use;
...

so, SSE2 is fairly safe at this point, but much newer is an area with some peril...

it would require checking documentation to know which operations are part of which subset.

granted though, going too far down this route (especially if LLVM does not fake ops on targets where they don't exist), is a route likely to somewhat hurt the "generic portability" of code.

(in my case, I only target x86 and x86-64, and at present restrict myself mostly to SSE2).

I was assuming that LLVM's implementations were incomplete. Are they now
complete? So anything that a CPU can do and LLVM has bindings for is
implemented?

as for what targets support which operations, in the case of SSE, go check
the Intel and AMD64 docs.

I was assuming that LLVM's implementations were incomplete. Are they now
complete? So anything that a CPU can do and LLVM has bindings for is
implemented?

I don't know as much about LLVM specifically, someone else can probably provide a better answer...

but, if you are meaning for things like dot product intrinsics, it is worth noting that these are not part of SSE or SSE2.

actually, dot product was added for SSE4.1, so it is only likely to be found in Intel CPUs starting from late 2008 / early 2009...

as such, I would not advise depending on it just yet...

The issue is that most vector ISA's don't implement every operation. If SSE (for example) doesn't implement an operation efficiently, you'll get inefficient but correct code.

-Chris