I am having problems parsing some C code with inline assembly that has C-style comments preceeding some instruction operands. See the code code below, where f1 builds correctly but f2 fails.
Did some trial and error, finding out that one can not have more than one C-style comment per line.
The backend I am using is RISC-V, but I do not think this is relevant. This seems an issue with the lexer, but I am not familiar enough with that part of the project for being very effective debugging the problem.
Any help is appreciated.
// RUN: %clang -fPIC --target=riscv64-unknown-elf -mabi=lp64f -O3 -o - -S %s | FileCheck %s
unsigned long int f1() {
unsigned long int dst;
__asm__ __volatile__("li %[dst], 0x1234 /* this is fine */ \n"
"add zero, %[dst], %[dst]\n"
: [ dst ] "=r"(dst)
:
:);
return dst;
}
#if 1
unsigned long int f2() {
unsigned long int dst;
__asm__ __volatile__("li /* this is fine */ %[dst], /* this is NOT fine */0x1234/\n"
"add zero, %[dst], %[dst]\n"
: [ dst ] "=r"(dst)
:
:);
return dst;
}
#endif
I notice that clang uses a #
for the final asm comment so I wondered what would happen if there were multiple /* */
and maybe that’s the issue.
But apparently not:
__asm__ __volatile__("li /* this is fine */ /* this is fine */ %[dst], 0x1234 \n"
lui a0, 1 # this is fine # this is fine
Which is odd but not invalid.
Might be to do with the position of the comment more than the number of comments overall.
"li %[dst], /* this is fine */ 0x1234 \n"
Will also error (Compiler Explorer).
Quick guess, you’d want to look at llvm/lib/MC/MCParser/AsmParser.cpp::&AsmParser::Lex()
.
Thanks for the pointer, it is very helpful to have a place to start looking at.
You are right that the exact pattern of the issue is not just having more than one comment, as it is position-dependent.
For example, a comment between the opcode and the first operand is fine. But if the comment is before the second operand, then that is a no-no. I just did not want to start ellaborating about combinations that work and combinations that do not work so can theorize how does it get wrong.
These inputs I am trying are actually accepted in gcc and are not well supported in clang. I got compiler users who had code working for gcc that now is failing with clang. Trying to fix the issue rather than working around the problem on their side…
It seems actually an issue with this function not handling comments:
bool RISCVAsmParser::ParseInstruction
That explains why this issue is position-dependent. If the comment is placed before the first operand it gets gracefully handled by llvm/lib/MC/MCParser/AsmParser.cpp.
If at the end, it is also handled by the MCParser.
If after the first operand and before the end of the line, then the handling is delegated on the target’s implementation of ParseInstruction.
I think I can fix this… do no help me yet! 
Great! I’d be interested to know if it already works for targets other than RISCV.
I have the impression that it fails for Arm 
On Wed, 14 Jun 2023 at 16:18, David Spickett via LLVM Discussion Forums <notifications@llvm.discoursemail.com> wrote:
Great! I’d be interested to know if it already works for targets other than RISCV.
Visit Topic or reply to this email to respond.
To unsubscribe from these emails, click here.
Sorry, just wanted to save your time.
There is a difference between AsmParser::Lex()
and AsmLexer::Lex()
. The latter is low-level function that should rarely be called in target’s AsmParser, if at all.
The issue will go away if you replace getLexer().Lex()
with just Lex()
or getParser().Lex()
.
There are also two typos in your example:
/* this is NOT fine * /0x1234/
→
/* this is NOT fine */0x1234
In many cases, getParser().Lex()
and getLexer().Lex()
are anti-patterns. If you do
if (!f(...))
return ...;
getLexer().Lex();
Consider moving Lex
into the body of f
:
if (!f(...))
return ...;
parseToken
or parseOptionalToken
are useful. There is some complexity if one wants to track precise SMLoc
, though.
I have landed some commits to remove a lot of getLexer().Lex()
(XXX Use parseOptionalToken. NFC
).
Many targets don’t appear to test the error code paths (rg '\[\[#@LINE\+1' test/MC
for some examples). I hope that the owners can improve the situation.