Matching addsub

How should I go about matching floating-point addsub-like vector
instructions? My first inclination is to write something which matches
build_vector 1.0, -1.0, and then use that in combination with a match on
fadd, but that does not seem to work. I think this is because
BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a
constant load before instruction selection.

Specifically, in SDValue SelectionDAGLegalize::LegalizeOp(SDValue Op):

case ISD::BUILD_VECTOR:
// A weird case: legalization for BUILD_VECTOR never legalizes the
// operands!
// FIXME: This really sucks... changing it isn't semantically incorrect,
// but it massively pessimizes the code for floating-point BUILD_VECTORs
// because ConstantFP operands get legalized into constant pool loads
// before the BUILD_VECTOR code can see them. It doesn't usually bite,
// though, because BUILD_VECTORS usually get lowered into other nodes
// which get legalized properly.
SimpleFinishLegalizing = false;
break;

and then:

case ISD::BUILD_VECTOR:
  switch (TLI.getOperationAction(ISD::BUILD_VECTOR,
Node->getValueType(0))) {
  default: assert(0 && "This action is not supported yet!");
  case TargetLowering::Custom:
    Tmp3 = TLI.LowerOperation(Result, DAG);
    if (Tmp3.getNode()) {
      Result = Tmp3;
      break;
    }
    // FALLTHROUGH
  case TargetLowering::Expand:
    Result = ExpandBUILD_VECTOR(Result.getNode());
    break;
  }
  break;
(so there is not even branch for TargetLowering::Legal). Maybe I'm just
missing something.

Thanks in advance,
Hal

Trying to keep a vector producer this naive about the target instruction
set is awkward. If the vector producer doesn't know whether the target
has an addsub or subadd instruction, how is it to know the best way to
expand complex multiplication? It's likely to get suboptimal code in many
cases.

On the other hand, a producer that knows that the target has certain
instructions could pretty easily just use appropriate intrinsics that can
be mapped directly to the desired instructions. This way, there's no need
for it to play pictionary with the backend, drawing out its desired
semantics in terms of primitive operations and expecting codegen to
rediscover what was meant by pattern-matching.

Dan

> How should I go about matching floating-point addsub-like vector
> instructions? My first inclination is to write something which matches
> build_vector 1.0, -1.0, and then use that in combination with a match on
> fadd, but that does not seem to work. I think this is because
> BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a
> constant load before instruction selection.

Trying to keep a vector producer this naive about the target instruction
set is awkward. If the vector producer doesn't know whether the target
has an addsub or subadd instruction, how is it to know the best way to
expand complex multiplication? It's likely to get suboptimal code in many
cases.

On the other hand, a producer that knows that the target has certain
instructions could pretty easily just use appropriate intrinsics that can
be mapped directly to the desired instructions. This way, there's no need
for it to play pictionary with the backend, drawing out its desired
semantics in terms of primitive operations and expecting codegen to
rediscover what was meant by pattern-matching.

In general, I agree with you. It is always questionable how far to
attempt to go with idiom recognition. However, some kind of combination
add/subtract is a common vector operation, and so I seem to be left with
three choices:

1. Decompose the operation and then attempt to recognize the
decomposition.
2. Add an additional LLVM instruction.
3. Add a number of target-specific special cases into the higher-level
code.

I am not sure which is better, but I'd prefer to stay away from choice
(3) as much as practical in favor of one of the first two options. Would
you support adding some kind of vector_faddsub LLVM instruction?

Also, there is a precedent, in some sense, for choices (1) and (2) in
how fneg %a is serialized (as fsub -0.0, %a). Perhaps we could recognize
fadd (fmul <-1.0, 1.0>), %b and turn it into something else for
instruction selection in a similar way.

-Hal

Hi Hal, you should probably add a target specific DAG combine that
synthesizes the appropriate target instruction. This is how I
handled x86 horizontal add (see the FHADD X86 opcode). If it turns
out that the same thing is useful for other targets then it can be
generalized later.

Ciao, Duncan.

Duncan,

Thanks for the suggestion; I'll look at it.

-Hal

How should I go about matching floating-point addsub-like vector
instructions? My first inclination is to write something which matches
build_vector 1.0, -1.0, and then use that in combination with a match on
fadd, but that does not seem to work. I think this is because
BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a
constant load before instruction selection.

Trying to keep a vector producer this naive about the target instruction
set is awkward. If the vector producer doesn't know whether the target
has an addsub or subadd instruction, how is it to know the best way to
expand complex multiplication? It's likely to get suboptimal code in many
cases.

On the other hand, a producer that knows that the target has certain
instructions could pretty easily just use appropriate intrinsics that can
be mapped directly to the desired instructions. This way, there's no need
for it to play pictionary with the backend, drawing out its desired
semantics in terms of primitive operations and expecting codegen to
rediscover what was meant by pattern-matching.

In general, I agree with you. It is always questionable how far to
attempt to go with idiom recognition. However, some kind of combination
add/subtract is a common vector operation, and so I seem to be left with
three choices:

1. Decompose the operation and then attempt to recognize the
decomposition.
2. Add an additional LLVM instruction.
3. Add a number of target-specific special cases into the higher-level
code.

I am not sure which is better, but I'd prefer to stay away from choice
(3) as much as practical in favor of one of the first two options. Would
you support adding some kind of vector_faddsub LLVM instruction?

I assume you mean an operator that subtracts even elements and adds odd
elements; correct me if I'm wrong. I agree that this operation is fairly
common; I think a general intrinsic to do this would be useful for people
working on such targets.

Also, there is a precedent, in some sense, for choices (1) and (2) in
how fneg %a is serialized (as fsub -0.0, %a). Perhaps we could recognize
fadd (fmul <-1.0, 1.0>), %b and turn it into something else for
instruction selection in a similar way.

The problem with "fadd(fmul <-1.0,1.0>, %a), %b" is that it requires a
separate fmul instruction at the LLVM IR level, and you never want to
actually execute an fmul. If optimization obscures the pattern, you end
up with an unwanted fmul.

The fneg case is simpler because fsub with a constant is just one instruction
in LLVM IR. FWIW, the lack of a proper way to do fneg also happens to
be a bug.

Dan

>>
>>> How should I go about matching floating-point addsub-like vector
>>> instructions? My first inclination is to write something which matches
>>> build_vector 1.0, -1.0, and then use that in combination with a match on
>>> fadd, but that does not seem to work. I think this is because
>>> BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a
>>> constant load before instruction selection.
>>
>> Trying to keep a vector producer this naive about the target instruction
>> set is awkward. If the vector producer doesn't know whether the target
>> has an addsub or subadd instruction, how is it to know the best way to
>> expand complex multiplication? It's likely to get suboptimal code in many
>> cases.
>>
>> On the other hand, a producer that knows that the target has certain
>> instructions could pretty easily just use appropriate intrinsics that can
>> be mapped directly to the desired instructions. This way, there's no need
>> for it to play pictionary with the backend, drawing out its desired
>> semantics in terms of primitive operations and expecting codegen to
>> rediscover what was meant by pattern-matching.
>>
>
> In general, I agree with you. It is always questionable how far to
> attempt to go with idiom recognition. However, some kind of combination
> add/subtract is a common vector operation, and so I seem to be left with
> three choices:
>
> 1. Decompose the operation and then attempt to recognize the
> decomposition.
> 2. Add an additional LLVM instruction.
> 3. Add a number of target-specific special cases into the higher-level
> code.
>
> I am not sure which is better, but I'd prefer to stay away from choice
> (3) as much as practical in favor of one of the first two options. Would
> you support adding some kind of vector_faddsub LLVM instruction?

I assume you mean an operator that subtracts even elements and adds odd
elements; correct me if I'm wrong. I agree that this operation is fairly
common; I think a general intrinsic to do this would be useful for people
working on such targets.

Yes, that is essentially what I mean.

>
> Also, there is a precedent, in some sense, for choices (1) and (2) in
> how fneg %a is serialized (as fsub -0.0, %a). Perhaps we could recognize
> fadd (fmul <-1.0, 1.0>), %b and turn it into something else for
> instruction selection in a similar way.

The problem with "fadd(fmul <-1.0,1.0>, %a), %b" is that it requires a
separate fmul instruction at the LLVM IR level, and you never want to
actually execute an fmul. If optimization obscures the pattern, you end
up with an unwanted fmul.

I certainly agree that this is less than ideal. Adding a new intrinsic
would be better.

I would like to add two, one would be something that adds the odd
elements and subtracts the even elements, maybe:
@llvm.addsub.*
And another is one which negates only the even elements, maybe
@llvm.altneg.* is a good name? (actually, only this one is really
necessary, but would we like both for clarity or some other reason?)

In combination with permutations, this should allow the encoding of the
full range of asymmetric-fma-type operations (some of which can be
directly represented on some platforms), while remaining fairly generic
any easy to emulate.

What do you think?

The fneg case is simpler because fsub with a constant is just one instruction
in LLVM IR. FWIW, the lack of a proper way to do fneg also happens to
be a bug.

Is this something, then, that we would like to fix? Since fneg is
already a selection node, it would seem not to be a huge change to make
it an instruction. Alternatively, we could just alter the parser and
printers to support reading and writing 'fneg'.

-Hal

I'd like to resurrect this old thread. Now that we have the TargetTransformInfo infrastructure, and other improvements are being made to our vectorization capabilities, I'd like to tackle this question. In terms of representing asymmetric vector operations, we seem to have several options:

1. Introduce one or more intrinsics, such as I proposed, and pattern match (or expand) as appropriate. TargetTransformInfo will inform the vectorizer of costs as it will with other intrinsics.
2. Keep only target intrinsics, but extend TargetTransformInfo to inform the vectorizers regarding how to form the relevant target intrinsics.
3. Use the DAG combiner to form target-specific ISD nodes for these instructions (as Duncan pointed out, this is how he handles the FHADD instruction). TargetTransformInfo will indicate whether this is likely to happen(?)

Opinions?

Thanks again,
Hal