in most of the architectures, assembly operands are comma-separated.
I would like to parse an assembly code that is space-separated and I am having a bit of problem.
In ParseInstruction function, I don’t know what is the easiest way to figure out how many operands a mnemonic expected to have.
In comma-separated assembly code, it just consuming commas (while (getLexer().is(AsmToken::Comma))) and adds operands, but it’s not the case for space…
I have a dirty hack, that I manually provide such information (number of operands) in a function called for example getMnemonicAcceptInfo and with a for loop I parse the operand!!
What would you suggest for parsing space-separated assembly codes when it comes to figuring out if a mnemonic has two operands or one?
practically I cannot use a function namly getMnemonicAcceptInfo (mnemonic as input, and number of possible outputs as output), because there are mnemonics that accepts different number of operands! :-/
Any help is highly appreciated.
From what I understand instruction parsing is divided into two parts:
- Parsing an operand list (XXXAsmParser::ParseInstruction)
- Turning the operand list into an actual instruction (XXXAsmParser::MatchAndEmitInstruction)
The second part does the validation (e.g. how many operands, what kind, etc) while the first part only does the parsing. That’s why I think in the first part you have to handle all possible operand combinations (i.e. parse the first operand, and keep parsing operands as long as you see spaces). LLVM will reject instructions with too many operands (as defined in the .td files).
Is this something that would work with your assembly syntax?
Thanks for your prompt reply.
What I mean, is located at line 4192 (MipsAsmParser.cpp source code [llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp] - Woboq Code Browser).
It, first, has to parse the instruction, and based on the number of operands it uses a pattern in MatchAndEmit.
My problem is, what would be a suitable substitute if operands in the assembly code are not comma-separated, instead space-separated. (as you know, space is automatically removed so I cannot simply switch AsmToken::Comma to AsmToken::Space.)
Thanks a lot.
Would getLexer().isNot(AsmToken::EndOfStatement) in that condition do the trick? The lexer is already splitting the input at spaces.
Ah I see, I didn’t think about spaces being ignored
I just checked and MCAsmLexer has a setSkipSpace function that could be used to not ignore whitespace when parsing. I haven’t tried it out though.
for now I just change the while loop to look like this:
while (getLexer().getKind() == AsmToken::Identifier || getLexer().getKind() == AsmToken::Integer)
and at the moment it seems like it is working, at least now!