Hello everybody,
I’m reading the .td files defining the Cortex-A57 processor,
which is a subtarget of AArch64 target, and there is something
confusing me in the AArch64SchedA57.td
file.
In the top of AArch64SchedA57.td
, various processor resource are
defined, as follows
def A57UnitB : ProcResource<1>; // Type B micro-ops
def A57UnitI : ProcResource<2>; // Type I micro-ops
def A57UnitM : ProcResource<1>; // Type M micro-ops
def A57UnitL : ProcResource<1>; // Type L micro-ops
def A57UnitS : ProcResource<1>; // Type S micro-ops
def A57UnitX : ProcResource<1>; // Type X micro-ops
def A57UnitW : ProcResource<1>; // Type W micro-ops
let SchedModel = CortexA57Model in {
def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>; // Type V micro-ops
}
According the Cortex-A57 software optimization manual, Cortex-A57 has 8
function units in the backend,
-
Branch(B)
-
Integer 0(I0)
-
Integer 1(I1)
-
Integer Muti-Cycle(M)
-
Load(L)
-
Store(S)
-
FP/ASIMD 0(F0)
-
FP/ASIMD 1(F1)
So I think A57UnitW
and A57UnitX
should be the TableGen records
defining pipeline F0 and F1, respectively. So A57UnitW
and A57UnitX
together compose a ProcResGroup
, A57UnitV
,
which can execute a 128bit ASIMD floating point operation,
such as FMLA(Q-form), in a single clock cycle.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
But in line 479-483 of AArch64SchedA57.td
, as shown below
def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; }
def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>;
def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32|v1i32|v2i32|v1i64)")>;
def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex "^FML[AS](v4f32|v2f64|v4i32|v2i64)")>;
In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires
two A57UnitV
s, meaning that two clock cycles are needed.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There must be something wrong with my understanding, anyone could help me
figure out the problem? thanks a lot!