new backend in llvm

HI,
I am learning cpu0 backend and trying to transplant it from llvm 3.9 to llvm 9.0.
I have 3 questions?

  1. In include/llvm/BinaryFormat/MachO.h of version 9.0, line 1410 - line 1423:
    enum CPUType {
    CPU_TYPE_ANY = -1,
    CPU_TYPE_X86 = 7,
    CPU_TYPE_I386 = CPU_TYPE_X86,
    CPU_TYPE_X86_64 = CPU_TYPE_X86 | CPU_ARCH_ABI64,
    /* CPU_TYPE_MIPS = 8, */
    CPU_TYPE_MC98000 = 10, // Old Motorola PowerPC
    CPU_TYPE_ARM = 12,
    CPU_TYPE_ARM64 = CPU_TYPE_ARM | CPU_ARCH_ABI64,
    CPU_TYPE_ARM64_32 = CPU_TYPE_ARM | CPU_ARCH_ABI64_32,
    CPU_TYPE_SPARC = 14,
    CPU_TYPE_POWERPC = 18,
    CPU_TYPE_POWERPC64 = CPU_TYPE_POWERPC | CPU_ARCH_ABI64
    };
    I notice that not all backends are included in CPUType. For example, Sparc is included, but MIPS is not. What is this CPUType used for? How could I decide whether I need to add Cpu0 to it?

  2. About include/llvm/BinaryFormat/ELFRelocs/*.def, how could we decide that which items are needed and which are not? what materials do you suggest me to read?

  3. In Cpu0 backend tutorial(llvm 3.9), there is a file named Cpu0Schedule.td:
    //===----------------------------------------------------------------------===//
    // Functional units across Cpu0 chips sets. Based on GCC/Cpu0 backend files.
    //===----------------------------------------------------------------------===//
    def ALU : FuncUnit;
    def IMULDIV : FuncUnit;

//===----------------------------------------------------------------------===//
// Instruction Itinerary classes used for Cpu0
//===----------------------------------------------------------------------===//
def IIAlu : InstrItinClass;
def II_CLO : InstrItinClass;
def II_CLZ : InstrItinClass;
def IILoad : InstrItinClass;
def IIStore : InstrItinClass;
//#if CH >= CH4_1 1
def IIHiLo : InstrItinClass;
def IIImul : InstrItinClass;
def IIIdiv : InstrItinClass;
//#endif
def IIBranch : InstrItinClass;

def IIPseudo : InstrItinClass;

//===----------------------------------------------------------------------===//
// Cpu0 Generic instruction itineraries.
//===----------------------------------------------------------------------===//
//@ http://llvm.org/docs/doxygen/html/structllvm_1_1InstrStage.html
def Cpu0GenericItineraries : ProcessorItineraries<[ALU, IMULDIV], , [
//@2
InstrItinData<IIAlu , [InstrStage<1, [ALU]>]>,
InstrItinData<II_CLO , [InstrStage<1, [ALU]>]>,
InstrItinData<II_CLZ , [InstrStage<1, [ALU]>]>,
InstrItinData<IILoad , [InstrStage<3, [ALU]>]>,
InstrItinData<IIStore , [InstrStage<1, [ALU]>]>,
//#if CH >= CH4_1 2
InstrItinData<IIHiLo , [InstrStage<1, [IMULDIV]>]>,
InstrItinData<IIImul , [InstrStage<17, [IMULDIV]>]>,
InstrItinData<IIIdiv , [InstrStage<38, [IMULDIV]>]>,
//#endif
InstrItinData<IIBranch , [InstrStage<1, [ALU]>]>
]>;

There is 10 InstrItinClass definition. When should we add a new InstrItinClass definition for a new cpu or chip?
I read the comment in TargetItinerary.td:

// Instruction itinerary classes - These values represent ‘named’ instruction
// itinerary. Using named itineraries simplifies managing groups of
// instructions across chip sets. An instruction uses the same itinerary class
// across all chip sets. Thus a new chip set can be added without modifying
// instruction information.
//
class InstrItinClass;
def NoItinerary : InstrItinClass;

But I still can not understand it. Are there any materials?

Thanks!

Jack

Hi,

I notice that not all backends are included in CPUType. For example, Sparc is included, but MIPS is not. What is this CPUType used for? How could I decide whether I need to add Cpu0 to it?

The MachO format is only used by Apple, you can safely ignore it. But
for informational purposes, the CPUType defines what target the MachO
file is for, it's primarily used by the linker and dynamic loader
(dyld) to determine which version of a file it should read.

2) About include/llvm/BinaryFormat/ELFRelocs/*.def, how could we decide that which items are needed and which are not? what materials do you suggest me to read?

This is ultimately guided by what you and other compiler writers need
to be able to fixup at link-time or later. There are some common
features, almost all targets have some relocation to fill out a whole
word with an address for example, but most relocations exist because
the CPU has some special instruction that needs an address inserted
into particular bits later, maybe PC-relative.

I think you should probably follow the tutorial as a baseline (the
R_CPU0_* entries in ELF Support — Tutorial: Creating an LLVM Backend for the Cpu0 Architecture) and
observe how each one is used later on to get a better understanding.
You could do it the other way, of course: implement each one as it's
needed and you're thinking about it.

3) In Cpu0 backend tutorial(llvm 3.9), there is a file named Cpu0Schedule.td:
//===----------------------------------------------------------------------===//
// Functional units across Cpu0 chips sets. Based on GCC/Cpu0 backend files.
//===----------------------------------------------------------------------===//
def ALU : FuncUnit;
def IMULDIV : FuncUnit;

//===----------------------------------------------------------------------===//
// Instruction Itinerary classes used for Cpu0
//===----------------------------------------------------------------------===//
def IIAlu : InstrItinClass;
def II_CLO : InstrItinClass;
def II_CLZ : InstrItinClass;
def IILoad : InstrItinClass;
def IIStore : InstrItinClass;
//#if CH >= CH4_1 1
def IIHiLo : InstrItinClass;
def IIImul : InstrItinClass;
def IIIdiv : InstrItinClass;
//#endif
def IIBranch : InstrItinClass;

def IIPseudo : InstrItinClass;

//===----------------------------------------------------------------------===//
// Cpu0 Generic instruction itineraries.
//===----------------------------------------------------------------------===//
//@ http://llvm.org/docs/doxygen/html/structllvm_1_1InstrStage.html
def Cpu0GenericItineraries : ProcessorItineraries<[ALU, IMULDIV], , [
//@2
  InstrItinData<IIAlu , [InstrStage<1, [ALU]>]>,
  InstrItinData<II_CLO , [InstrStage<1, [ALU]>]>,
  InstrItinData<II_CLZ , [InstrStage<1, [ALU]>]>,
  InstrItinData<IILoad , [InstrStage<3, [ALU]>]>,
  InstrItinData<IIStore , [InstrStage<1, [ALU]>]>,
//#if CH >= CH4_1 2
  InstrItinData<IIHiLo , [InstrStage<1, [IMULDIV]>]>,
  InstrItinData<IIImul , [InstrStage<17, [IMULDIV]>]>,
  InstrItinData<IIIdiv , [InstrStage<38, [IMULDIV]>]>,
//#endif
  InstrItinData<IIBranch , [InstrStage<1, [ALU]>]>
]>;

There is 10 InstrItinClass definition. When should we add a new InstrItinClass definition for a new cpu or chip?
I read the comment in TargetItinerary.td:

// Instruction itinerary classes - These values represent 'named' instruction
// itinerary. Using named itineraries simplifies managing groups of
// instructions across chip sets. An instruction uses the same itinerary class
// across all chip sets. Thus a new chip set can be added without modifying
// instruction information.
//
class InstrItinClass;
def NoItinerary : InstrItinClass;
But I still can not understand it. Are there any materials?

As far as I know there's no large body of documentation on LLVM
schedulers. It's not really my area, but I think there are two
different ways to define schedules, and the InstrItin* variant is the
obsolete one. The newer method associates instructions with their
timing properties in the scheduler definition; InstRW seems to be the
key class there, and AArch64 has schedulers using it that you could
look at for inspiration.

But to answer your question anyway. Going by that description you'd
add a new InstrItinClass when there is a CPU you care about that
handles an instruction differently enough that you want to schedule it
specially. You can see why that isn't ideal because just one weird CPU
could split up the instruction's class for everyone.

Cheers.

Tim.