A question about AArch64 Cortex-A57 subtarget definition

Xing_Su1 · May 13, 2016, 12:13am

Hello everybody,

I’m reading the .td files defining the Cortex-A57 processor,

which is a subtarget of AArch64 target, and there is something

confusing me in the AArch64SchedA57.td file.

In the top of AArch64SchedA57.td, various processor resource are

defined, as follows


def A57UnitB : ProcResource<1>; // Type B micro-ops

def A57UnitI : ProcResource<2>; // Type I micro-ops

def A57UnitM : ProcResource<1>; // Type M micro-ops

def A57UnitL : ProcResource<1>; // Type L micro-ops

def A57UnitS : ProcResource<1>; // Type S micro-ops

def A57UnitX : ProcResource<1>; // Type X micro-ops

def A57UnitW : ProcResource<1>; // Type W micro-ops

let SchedModel = CortexA57Model in {

def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>; // Type V micro-ops

}

According the Cortex-A57 software optimization manual, Cortex-A57 has 8

function units in the backend,

Branch(B)
Integer 0(I0)
Integer 1(I1)
Integer Muti-Cycle(M)
Load(L)
Store(S)
FP/ASIMD 0(F0)
FP/ASIMD 1(F1)

So I think A57UnitW and A57UnitX should be the TableGen records

defining pipeline F0 and F1, respectively. So A57UnitW and A57UnitX

together compose a ProcResGroup, A57UnitV,

which can execute a 128bit ASIMD floating point operation,

such as FMLA(Q-form), in a single clock cycle.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

But in line 479-483 of AArch64SchedA57.td, as shown below


def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; }

def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; }

def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>;

def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32|v1i32|v2i32|v1i64)")>;

def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex "^FML[AS](v4f32|v2f64|v4i32|v2i64)")>;

In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires

two A57UnitVs, meaning that two clock cycles are needed.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There must be something wrong with my understanding, anyone could help me

figure out the problem? thanks a lot!

Xing

James_Molloy3 · May 13, 2016, 7:36am

Hi Xing,

Most of what you said was correct, up until the end! :

In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires two A57UnitVs, meaning that two clock cycles are needed.

The ProcResGroup is an “OR” relationship, not an “AND”. It says that a V op can go to EITHER the W or X pipes, not both. So a 128-bit FP op is modelled as having two V ops, which could either be [W, X] (simultaneously), [W, W] (requiring two cycles), or [X, X] (requiring two cycles).

Cheers,

James

Xing_Su1 · May 13, 2016, 10:16am

ok，got it！thanks！

发自我的 iPhone

在 2016年5月13日，15:37，James Molloy <james@jamesmolloy.co.uk> 写道：

Topic		Replies	Views
subtarget features LLVM Dev List Archives	2	119	May 17, 2012
Newer Cortex scheduling files for LLVM? A77/A78/X1? LLVM Dev List Archives	3	168	February 10, 2021
[RFC] New Clang target selection options for ARM/AArch64 LLVM Dev List Archives	17	133	April 17, 2019
LLVM issuse:AArch64 TargetParser LLVM Dev List Archives	11	86	May 18, 2016
Retargetting llvm to a simplified X86_64 architecture LLVM Dev List Archives	0	77	January 7, 2013

A question about AArch64 Cortex-A57 subtarget definition

Related Topics