Say, I have an instruction whose assembler syntax looks like:
It is encoded as:
|FEDCBA9| 876 | 543 | 210 |
|opcode |src1 |src2 | dst |
There are three operands, two of which are “don’t care registers” - the instruction does not read
src2 and does not write
dst; it is evaluated for the side effects (flags).
How should I model this instruction in *.td file? Should I define it as having all three operands, or just
Does it ever matter what
src2 are? If not, you could always encode them as
r0 or something like that, and define it in .td as having a single operand, and modifying flags.
Thanks Krzysztof. I did as you suggested and it seems to work well. I’m wondering if there can be any issues with disassembling such instructions though.
By the way, the target has implemented all 16 truth functions of two binary variables, i.e. there are two
nots, for example, one for each operand. There are also two projections functions which simply return one of the operands. Not sure what to do with them…
The ignored registers are the degree of variability here, which I guess is your concern.
not instruction, you can define its encoding as having
dst both be 0, and define an extra
not_x instruction with the same opcode but register encodings covering all other registers. Together, the encodings of these two instructions would cover all possibilities for this opcode. This extra instruction would never show up during code generation, its only use would be for the decoder to understand different variants of
not (with different registers). After successfully decoding
not_x, you could replace it with the regular
not in the disassembler, and so it would still show up as the
Edit: as for the other combinations—is there a hardware manual that lists mnemonics for these instructions? Maybe you can just adopt the forms listed there?
Thanks again. I tried a few approaches but the one you suggested didn’t come to my mind. I’ll try it, too.
Unfortunately the mnemonics do not always disambiguate which encoding to use, it is up to the assembler in these cases. One example is the
not instruction, which can be encoded two ways (depending on which of the two operands is the source), another is a “move reg to reg” which in general case can be encoded three or four different ways. I guess I should start trying and see what issues come up, if any.
Wait, so the manual only says what the instruction does, but not which encoding to use? Isn’t there a list of all available encodings? What I meant is that if each encoding has an official mnemonic, then you could simply implement those, and then come up with assembler aliases for the
not instruction and others.
Taking “move” as an example, you could just pick
or dst, src, src as the “real” instruction, and make
move dst, src be an alias of that. Then the
move instruction would either not be present in the codegen, or could be implemented as a pseudo-instruction (e.g.
move isn’t the best example, since it corresponds to
COPY, which is already defined in the compiler and is somewhat special, but it illustrates the idea.]
That’s right. There are two manuals. The first one lists the instruction formats only, but no assembly syntax. The other describes assembly syntax, but it does not always say which encoding to use.
Also, the assembly syntax does not cover all possible encodings. For example, “branch never” and “call never” instructions exist in hardware, but not in assembly syntax.
I could share the link but I’m afraid it will be of no use to you because it is not in English and I don’t want to take too much of your time.
That would be the fifth
I got the idea, thank you
I think the question here is, if someone hand-assembled some code with a
not instruction written in some non-specific way, what should the disassembler print? Would you expect the disassembler to recognize it as one of the many variants of
If I was doing this, I think I would just invent mnemonics for the encodings that don’t have official names, define them in the .td files, and then use aliases/pseudo-instructions for the user-facing operations. Then the disassembler would decode the instruction, recognize it as one of the “invented” ones, look up the replacement in some table and use that.
That is the behavior I would expect. Inventing mnemonics for all possible encodings is possible, but what’s the point, except for completeness? Am I missing something?
Not having an assembly syntax for every possible encoding is probably unusual, but it is the least unusal thing in the ISA.
By mnemonics here I specifically mean mnemonics in LLVM codegen, and those would be there to make disassembly easier. When the decoder cannot identify some bytes as an instruction, it kind of leaves it up to you to deal with it by hand. At least you should know how many bytes to skip to try disassembling the next instruction. If every encoding has an instruction, the decoder would say that it found
opcode_02fc r4, r1, r7, where
opcode_02fc is something you invented. Then you can check that it’s, for example, an instruction that negates the last operand, so in the disassembler you can replace it with
not r7. If you didn’t have the mnemonic, then you’d have to deal with the raw bytes by hand.
I see now, thanks. I was already doing that intuitively, that’s why I didn’t think you’re talking about that same thing.