A possible bug in the assembly parser for ARM

Dear developers,

As the following code snippet copied from lib/MC/MCParser/AsmParser.cpp shows, when parsing a label, AsmParser::parseStatement calls the onLabelParsed method of a target parser after emitting the label:

// Emit the label.
if (!getTargetParser().isParsingInlineAsm())
Out.EmitLabel(Sym, IDLoc);

// If we are generating dwarf for assembly source files then gather the
// info to make a dwarf label entry for this label if needed.
if (getContext().getGenDwarfForAssembly())
MCGenDwarfLabelEntry::Make(Sym, &getStreamer(), getSourceManager(),
IDLoc);

getTargetParser().onLabelParsed(Sym);

For ARM, calling onLabelParsed after emitting the label seems to be a bug.

If I understand it correctly, ARMAsmParser::onLabelParsed (defined in lib/Target/ARM/AsmParser/ARMAsmParser.cpp) performs two tasks:

  1. Complete the current implicit IT block (if one is open for new conditional instructions) BEFORE the label, so that the IT block cannot be entered from the middle of it.

  2. Emit a .thumb_func directive, if the label is the first label following a previously parsed .thumb_func directive without an optional symbol.

Considering the tasks above, calling onLabelParsed after the label is emitted leads to two types of errors in the generated code:

  1. Instructions of an IT block BEFORE the label may be incorrectly emitted AFTER the label.

  2. .thumb_func directives, which should be emitted BEFORE the corresponding function symbols, are emitted AFTER the function symbols.

I tested llvm-mc with the following assembly code:

.text
.syntax unified
.p2align 1
.code 16
.globl f1
.globl f2
.thumb_func
f1:
CMP r0, #10

.thumb_func

MOVPL r0, #0

f2:
MOVS r1, #0
.Ltmp:
CMP r0, #0
ITTT PL
ADDPL r1, r1, r0
SUBPL r0, r0, #1
BPL .Ltmp
MOV r0, r1
BX lr

.end

The generated assembly code was as follows:

.text
.p2align 1
.code 16
.globl f1
.globl f2
f1:
.thumb_func
cmp r0, #10

f2:
it pl
movpl r0, #0
.thumb_func
movs r1, #0
.Ltmp:
cmp r0, #0
ittt pl
addpl r1, r1, r0
subpl r0, r0, #1
bpl .Ltmp
mov r0, r1
bx lr

By comparing the generated assembly code with the original assembly code, it can be seen that both types of errors are present in the generated code.

I tested llvm-mc with the following command line:

llvm-mc -arch=thumb -filetype=asm -mattr=+soft-float,-neon,-crypto,+strict-align -mcpu=cortex-m3 -n -triple=armv7m-none-none-eabi -o=it-block-roundtrip.s it-block.S -arm-implicit-it=always

where it-block.S is the original assembly file and it-block-roundtrip.s is the generated assembly file.

Ming Zhang

I agree that this is producing incorrect output. Are you able to
create a bug in https://bugs.llvm.org/ ?

When producing an object file from llvm-mc the .thumb_func appears to
give the right symbol type to f1 and f2, but when reassembling the
it-block.S file f1 is not given type STT_FUNC. The incorrect position
of the label f2 appears in the generated llvm-mc object, which could
cause a problem if the label where a branch target.

Peter