Implement VLIW Backend on LLVM (Assembler Related Questions)

cycheng · December 7, 2018, 1:46am

Hello,

I want to implement LLVM backend for a specific VLIW hardware. I am working on defining its instruction set, and assembly language.

The hardware has two pipelines, int and float. Each pipeline can do 3 operations/cycle, 3 operations forms an instruction.

One of the Integer Instruction looks like this:
add Ri, Rj, Rk; add Rl, Rm, Rn; add Ro, Rp, Rq

An int instruction and a float instruction forms a VLIW instruction (bundle), e.g.

{
add Ri, Rj, Rk; add Rl, Rm, Rn; add Ro, Rp, Rq
fadd Fi, Fj, Fk; fadd Fl, Fm, Fn; fadd Fo, Fp, Fq
}

I want to express above concept in this way:
// Assembly Language
{
add Ri, Rj, Rk
add Rl, Rm, Rn
add Ro, Rp, Rq
fadd Fi, Fj, Fk
fadd Fl, Fm, Fn
fadd Fo, Fp, Fq
}

Q1:
My first question is, the instruction encoding can only be determined after parser has finished parsing the entire bundle.

e.g. When parser see “add Ri, Rj, Rk”, it generates one encoding, but when parser see another “add Ri, Rj, Rk”, it will modify previously generated encoding.

I would like to know can LLVM’s assembler support this?
Or I should define my instruction in this way:
add_type1 Ri, Rj, Rk
add_type2 Ri, Rj, Rk, Rl, Rm, Rn
add_type3 Ri, Rj, Rk, Rl, Rm, Rn, Ro, Rp, Rq

Q2.
Some of the instructions need to setup additional configuration, e.g.
{
scache wa ; Set cache mode: write allocate
ssize 64 ; Set write size = 64 bits
sendian big ; Set big endian writing
store R0, 0x1000000 ; Write “R0” to 0x1000000
}

So, again, parser has to parse the entire bundle to generate correct encoding.
Or I should define my instruction in this way:

store R0, 0x1000000, wa, 64, big, … (10 options can be set)

Q3.
The destination register can be omitted, e.g.
add , Rj, Rk

So can I use this form to express omitting destination, or I should define new instruction for it?
e.g.
add_no_dest Rj, Rk

Q4.
Can I define the instruction which has the same name but with different count of operands, e.g.
fadd Fi, Fj, Fk
fadd Fl, Fm, Fn, rounding_mode

So fadd has two versions
(a) normal rounding
(b) special rounding mode

Or I should define it in this way:
fadd
fadd_round_mode1
fadd_round_mode2

…

fadd_round_mode15

(16 rounding mode)

Thank You,
CY

pogo59 · December 10, 2018, 6:09pm

I believe the assembler parser does not immediately emit the object-file encoding, but produces an internal machine-instruction form that is later encoded and emitted. This should give you an opportunity to make choices about encoding after the parsing is complete.

I don’t know enough about how instruction syntax is specified to answer your other questions.

–paulr

Krzysztof_Parzyszek · December 10, 2018, 7:19pm

In the intermediate language that assembler works on an instruction is represented by an MCInst. An MCInst can have other instructions as operands, and this is how the Hexagon backend implements bundles.

A top-level MCInst (i.e. the entire bundle) is encoded all at once from the point of view of the target-independent mechanisms. Those mechanisms use target-specific code that each implementation needs to provide, and in your code you can handle each bundle as you want.

Check MCCodeEmitter and how different targets implement it.

As for the syntax---the parser needs to be able to determine the bundle boundary. (For example Hexagon uses braces {} to enclose each bundle.) The way the assembler works is that it constructs an instruction and passes it to the associated streamer. The streamer is typically an assembly streamer (i.e. printing the instruction assembly), or an object file streamer (e.g. ELF, etc.)

The answers to all your questions are "yes", or "it's doable", but the degree of complexity may vary between different choices.

The major suggestion that I have is to make sure that the syntax is unambiguous, specifically when it comes to bundle boundaries. Another suggestion is to maintain the "mnemonic op, op, ..." syntax for individual instructions (i.e. mnemonic followed by a list of operands). Hexagon has its own assembly syntax that doesn't follow that, and it makes things a bit more complicated.

-Krzysztof

cycheng · December 11, 2018, 10:36pm

Hi paulr,
Thank you for your response

Hi Krzysztof,
This is really helpful! Thank you for your guidance!!
I would like to trace the Hexagon’s llvm implementation.
I am very interested on how Hexagon implement instruction
pattern matching, instruction scheduling, and register
allocation, could you give me some suggestions or reading
lists to help me understand Hexagon’s llvm implementation?
Thank you

CY

2018年12月11日(火) 4:19 Krzysztof Parzyszek via llvm-dev <llvm-dev@lists.llvm.org>:

Krzysztof_Parzyszek · December 14, 2018, 3:13pm

Hi Cy,

The main difference between Hexagon (i.e. VLIW) and most other targets is obviously the fact that we have instruction bundles. (Btw, AMDGPU is another target that uses bundles, at least in some cases.)
The short answer to how this is handled in our backend, is that we create the bundles very late in the code generation, so throughout most of the optimizations, Hexagon code looks as if it was just a typical sequence of individual instructions (no different from any other non-VLIW architecture). For the most part this works quite well, except for instruction scheduling. LLVM has several schedulers (pipeliner, pre- and post-ra scheduler), and all of them happen before the bundles are created. Each of the schedulers allows some form of input from individual targets and we heavily exploit that. In case of the pre-ra machine scheduler we have our own implementation of the key scheduler components that connect with the scheduler framework. The machine pipeliner was originally written for Hexagon before it became target-independent, but internally it is aware of bundles to some degree. The schedulers rely on instruction latencies, so we make sure that we have properly specified. Also, we take advantage of DAG mutations, which is the mechanism that schedulers provide to give targets the ability to alter the scheduling graph.
Finally we create bundles using the DFA packetizer, which is a finite state machine that builds instruction packets based on resource usage associated with each instruction (that information is specified in the .td files describing each instruction). This happens after register allocation. The register allocation for Hexagon is the same as for any other target, we don't do anything special there.

-Krzysztof

cycheng · December 16, 2018, 12:50pm

Hi Krzysztof,

Thank you for your great guidance! I saw the key sentence: “… we heavily exploit that”
Looks like the hardest part (of course other parts are also not easy for me!) is scheduler,
I am going to figure out MachinePipeliner and how it works on Hexagon, and try to
figure out how Hexagon use pre-ra and post-ra, then investigate DFAPacketizer,
VLIWPacketizerList. Thank you : )

CY

2018年12月15日(土) 0:13 Krzysztof Parzyszek <kparzysz@codeaurora.org>:

Topic		Replies	Views
Multimedia IO instructions & partial backend implementations for simple CPUs LLVM Dev List Archives	4	67	November 7, 2013
backend question LLVM Dev List Archives	4	73	March 14, 2011
LLVM Backend Code compilation Beginners llvm	4	839	April 12, 2022
Problem in X86 backend LLVM Dev List Archives	8	93	October 30, 2014
LLVM and backend LLVM Dev List Archives	1	75	February 3, 2009

Implement VLIW Backend on LLVM (Assembler Related Questions)

Related Topics