I’m trying to define a multiply-accumulate instruction for the LEON processor, a Subtarget of the Sparc target.
The documentation for the processor is as follows:
I’m trying to define a multiply-accumulate instruction for the LEON processor, a Subtarget of the Sparc target.
The documentation for the processor is as follows:
I’m trying to define a multiply-accumulate instruction for the LEON processor, a Subtarget of the Sparc target.
The documentation for the processor is as follows:
The initial thought I had was that I could ignore the ASR18 and Y registers, but the SMAC instruction is designed to be used in a loop and ASR18 (and Y) feed back into the inputs each time too.
Does this imply that a hand-coded ISelDAGToDAG.cpp implementation is going to be virtually required?
Any chance of some Pseudo-code? I haven’t had to write any ISelDAGToDAG code up to now and any starter would be appreciated.
Do you only want to define assembler syntax for this, or do you need to be able to be able to automatically emit it from some higher level construct? I’d expect the former would be entirely sufficient, in which case this should be sufficient:
let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y, ASR18] in
def SMACrr : F3_1<3, 0b111110,
(outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2),
“smac $rs1, $rs2, $rd”,
>;
If you want the latter, I’m not sure how you’d go about being able to pattern-match it, because of the unusual 40 bit accumulate input and output, and the unusual for sparc 16-bit inputs. Hopefully you don’t really need that.
From: "James Y Knight via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Chris.Dewhurst" <Chris.Dewhurst@lero.ie>
Cc: llvm-dev@lists.llvm.org
Sent: Friday, September 18, 2015 10:39:20 AM
Subject: Re: [llvm-dev] multiply-accumulate instruction
Do you only want to define assembler syntax for this, or do you need
to be able to be able to automatically emit it from some higher
level construct? I'd expect the former would be entirely sufficient,
in which case this should be sufficient:
let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y,
ASR18] in
def SMACrr : F3_1<3, 0b111110,
(outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2),
"smac $rs1, $rs2, $rd",
>;
If you want the latter, I'm not sure how you'd go about being able to
pattern-match it, because of the unusual 40 bit accumulate input and
output, and the unusual for sparc 16-bit inputs. Hopefully you don't
really need that.
To do that, you'll likely need a target-specific IR-level pass that runs in the backend to recognize the desired pattern and transform it into a loop using some target-specific intrinsics.
-Hal
I’ve been looking to see if there’s a way to get the instruction below (SMAC) emitted from a higher-level construct, but I’m starting to think this is unrealistic.
To do so, I’d have to tie-in two other instructions: Firstly, clearing the ASR18 and Y register somewhere near the start of the method, then copying out the value of these registers somewhere near the end of the method, or wherever the value needs to be used.
In addition, it would only make sense to use the construct inside a loop of some form, otherwise, some variation on MUL would be better. That would either require detecting the loop, or optimising further down the line to convert the above construct into a simple MUL.
This now feels to me to be unrealistic and likely to be prone to bugs.
On that basis, I’m going to go with the simple “assembler-only support” recommended below, unless anyone can recommend a simple way of achieving the above (and direct me to a suitable reference). I can’t find anything sufficiently similar in any of the other processors supported by LLVM.
Thanks for the feedback
Chris Dewhurst
University of Limerick.
From: "Chris.Dewhurst via llvm-dev" <llvm-dev@lists.llvm.org>
To: "James Y Knight" <jyknight@google.com>
Cc: llvm-dev@lists.llvm.org
Sent: Monday, September 21, 2015 2:43:30 AM
Subject: Re: [llvm-dev] multiply-accumulate instructionI've been looking to see if there's a way to get the instruction
below (SMAC) emitted from a higher-level construct, but I'm starting
to think this is unrealistic.To do so, I'd have to tie-in two other instructions: Firstly,
clearing the ASR18 and Y register somewhere near the start of the
method, then copying out the value of these registers somewhere near
the end of the method, or wherever the value needs to be used.In addition, it would only make sense to use the construct inside a
loop of some form, otherwise, some variation on MUL would be better.
That would either require detecting the loop, or optimising further
down the line to convert the above construct *into* a simple MUL.This now feels to me to be unrealistic and likely to be prone to
bugs.On that basis, I'm going to go with the simple "assembler-only
support" recommended below, unless anyone can recommend a simple way
of achieving the above (and direct me to a suitable reference). I
can't find anything sufficiently similar in any of the other
processors supported by LLVM.
Can you provide an example or two (written in C is fine) showing the kinds of loops or sequences of operations you're trying to pattern match to use this instruction. I don't know of anything that works exactly like this, but some targets do have IR-level preprocessing passes to use certain kinds of intrinsics (lib/Target/PowerPC/PPCCTRLoops.cpp for an example involving loops). There may be other ways to do this as well. I'd not give up so easily, but I need to see some concrete examples in order to provide advise.
-Hal
I think the canonical example is likely to be matrix multiplication, but any kind of "sum of products"-type method would, I expect, be improved by using this instruction.
e.g. (and I haven't syntax-checked this, so apologies for any errors):
int SumOfProducts(int xs, int ys, int count)
{
int sum = 0; // Use ASR18 & Y for the SMAC / UMAC instructions.
for (int index=0 ; index<count ; index++)
{
sum += xs[index] * ys[index]; // Could all be done with SMAC in one instruction.
}
return sum; // Needs retrieving from ASR18 & Y
}
From: "Chris.Dewhurst" <Chris.Dewhurst@lero.ie>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: llvm-dev@lists.llvm.org, "James Y Knight" <jyknight@google.com>
Sent: Monday, September 21, 2015 8:16:10 AM
Subject: RE: [llvm-dev] multiply-accumulate instructionI think the canonical example is likely to be matrix multiplication,
but any kind of "sum of products"-type method would, I expect, be
improved by using this instruction.e.g. (and I haven't syntax-checked this, so apologies for any
errors):int SumOfProducts(int xs, int ys, int count)
{
int sum = 0; // Use ASR18 & Y for the SMAC / UMAC instructions.for (int index=0 ; index<count ; index++)
{
sum += xs[index] * ys[index]; // Could all be done with SMAC in
one instruction.
}return sum; // Needs retrieving from ASR18 & Y
}
Using a late target-specific IR-level pass (such as lib/Target/PowerPC/PPCCTRLoops.cpp) to recognize this pattern and produce some appropriate target intrinsics should definitely work for this. Also, we already have code to recognize these kinds of reductions in the loop vectorizer that you should fine useful.
-Hal