[LLVM] Forward temp label references on ARM in LDR with .ltorg in inline assembly are broken in trunk

I'm not entirely sure what caused this, but the following code, which used to behave as expected, is now broken:

---- lolwut.c ----------------------------

void lolwut(void) {
  __asm __volatile (
    "ldr r1, =1f \n"
    ".ltorg \n"
    "1: \n\t"
    : : : "r0", "r1" );
}

Hi Gordon,

I'm not entirely sure what caused this, but the following code, which used
to behave as expected, is now broken:

---- lolwut.c ----------------------------

void lolwut(void) {
  __asm __volatile (
    "ldr r1, =1f \n"
    ".ltorg \n"
    "1: \n\t"
    : : : "r0", "r1" );
}

-------------------------------------------

~/clang -target armv7-none-eabi -O0 -c -emit-llvm lolwut.c -o lolwut.bc
~/llc -O0 lolwut.bc -o lolwut.s

---- lolwut.s ----------------------------

  .file "lolwut.bc"
  .text
  .globl lolwut
  .align 2
  .type lolwut,%function
lolwut: @ @lolwut
  .fnstart
@ BB#0: @ %entry
  @APP
  ldr r1, .Ltmp0
  .align 2
.Ltmp0:
  .long ".L11"

".L11":

  @NO_APP
  bx lr
.Ltmp1:
  .size lolwut, .Ltmp1-lolwut

------------------------------------------

Somehow, the forward referenced label at 1: in the original assembly is
getting mangled when its constant pool entry is created (the bad character
is a 0x02 hex). In previous versions, the inline assembly was unchanged
in the output. Does anyone know what's going on here? I found the
checkin that changed how ldr rx, = was handled but haven't had a chance to
revert and try a prior revision to see if this still happens.

I think the bit that changed is that we now always parse inlineasm, but
before we would not parse it when outputting a .s file. See commit r201333
for details: http://llvm-reviews.chandlerc.com/D2686. I think you can pass
the -no-integrated-as flag to disable parsing inlineasm with the integrated
assembler. There is some discussion on adding a clang/llvm flag to disable
parsing of inlineasm:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140217/205513.
html

The mangling you see is the assembler generating a private label. It
includes the unprintable character 0x2 so that it cannot conflict with any
user crated label.

I'll file a bug on this after I track down my bugzilla password but I
wanted to ask here first because I'm willing to fix it if someone can
point me in the right direction.

It looks like this is behavior is not particular to just .ltorg, but shows
up when printing the private labels.

$ cat tt.s
.syntax unified
foo:
  b 1f
1:
  add r0, r0, r0
$ llvm-mc < tt.s
        .text
foo:
        b ".L11"
".L11":
        add r0, r0, r0
$ llvm-mc < tt.s | hexdump -C
00000000 09 2e 74 65 78 74 0a 0a 0a 66 6f 6f 3a 0a 09 62

..text...foo:..b|

00000010 09 22 2e 4c 31 02 31 22 0a 22 2e 4c 31 02 31 22

.".L1.1".".L1.1"|

00000020 3a 0a 09 61 64 64 09 72 30 2c 20 72 30 2c 20 72 |:..add.r0, r0,
r>
00000030 30 0a |0.|
00000032

I'm not sure if this is a bug or the intended behavior for printing these
labels.

As David says, r201333 is most likely the change responsible. This example uses valid assembly in the inline assembly block so while the example in the other thread is somewhat debatable (it's not valid assembly), there is definitely a bug of some kind if this doesn't assemble in both the integrated assembler and gas.

At the moment, I think this is an unintentional side-effect of parsing the inline assembly for assembly output. Assuming we continue to parse inline assembly when emitting assembly (which is under debate at the moment) the simplest solution seems to be that we shouldn't use a non-printable character to make the labels unique. The labels will still be expanded (as gas does internally) but this won't matter as long as the unique labels are parseable by both assemblers. This begs the question "how do we make them unique?" to which I don't have a good answer at the moment. A more difficult solution would be to convert the IR back to using local labels before emission.

The frontend -no-integrated-as isn't linked up to the backend -no-integrated-as yet. You might need to use '-mllvm -no-integrated-as' as the workaround. The patch being discussed in the other thread should end up connecting the two.

The mangling you see is the assembler generating a private label. It
includes the unprintable character 0x2 so that it cannot conflict with any
user crated label.

I'll file a bug on this after I track down my bugzilla password but I
wanted to ask here first because I'm willing to fix it if someone can
point me in the right direction.

It looks like this is behavior is not particular to just .ltorg, but shows
up when printing the private labels.

$ cat tt.s
.syntax unified
foo:
  b 1f
1:
  add r0, r0, r0
$ llvm-mc < tt.s
        .text
foo:
        b ".L11"
".L11":
        add r0, r0, r0
$ llvm-mc < tt.s | hexdump -C
00000000 09 2e 74 65 78 74 0a 0a 0a 66 6f 6f 3a 0a 09 62
>..text...foo:..b|
00000010 09 22 2e 4c 31 02 31 22 0a 22 2e 4c 31 02 31 22
>.".L1.1".".L1.1"|
00000020 3a 0a 09 61 64 64 09 72 30 2c 20 72 30 2c 20 72 |:..add.r0, r0,
r>
00000030 30 0a |0.|
00000032

I'm not sure if this is a bug or the intended behavior for printing these
labels.

I am pretty sure this is a bug. It was found by the recent change to
parse assembly, but it is an independent bug.

Would you mind reporting it with your simple "b 1f" example in llvm.org/bugs?

Thanks,
Rafael

Would you mind reporting it with your simple "b 1f" example in
llvm.org/bugs?

Filed a bug report here: http://llvm.org/bugs/show_bug.cgi?id=18928

Gas isn't able to handle quotes in label names, and chokes on 0x02 regardless of whether the label is quoted or not:

Unquoted:
lolwut.s: Assembler messages:
lolwut.s:23: Error: junk at end of line, first unrecognized character valued 0x2
lolwut.s:25: Error: unknown pseudo-op: `.l1'

Quoted:
lolwut.s: Assembler messages:
lolwut.s:23: Error: bad expression
lolwut.s:23: Error: junk at end of line, first unrecognized character is `.'
lolwut.s:25: Error: junk at end of line, first unrecognized character is `"'

Additionally, I'm not sure this should be done with the numbered local labels at all. I know this has to be done at some point, but the semantics of these labels allows duplicates within a single function / .s file because they are all relative to the instructions referencing them. For example, there is nothing wrong with the following as far as the assembler is concerned (aside from being pointless and unreadable :slight_smile: ):

lolwut:
    b 1f
1:
    b 1b
    b 1f
1:

LLVM expands them into different temp labels properly aside from the 0x02, but since this form of numbered labels are designed such that they don't need to be uniqued, why bother? In the case of those labels, a problem that doesn't exist is being solved.

Cheers,
-Gordon Keiser

The mangling you see is the assembler generating a private label. It
includes the unprintable character 0x2 so that it cannot conflict with
any user crated label.

I'll file a bug on this after I track down my bugzilla password but I
wanted to ask here first because I'm willing to fix it if someone can
point me in the right direction.

It looks like this is behavior is not particular to just .ltorg, but
shows up when printing the private labels.

$ cat tt.s
.syntax unified
foo:
  b 1f
1:
  add r0, r0, r0
$ llvm-mc < tt.s
        .text
foo:
        b ".L11"
".L11":
        add r0, r0, r0
$ llvm-mc < tt.s | hexdump -C
00000000 09 2e 74 65 78 74 0a 0a 0a 66 6f 6f 3a 0a 09 62
>..text...foo:..b|
00000010 09 22 2e 4c 31 02 31 22 0a 22 2e 4c 31 02 31 22
>.".L1.1".".L1.1"|
00000020 3a 0a 09 61 64 64 09 72 30 2c 20 72 30 2c 20 72
>:..add.r0, r0,
r>
00000030 30 0a |0.|
00000032

I'm not sure if this is a bug or the intended behavior for printing
these labels.

I am pretty sure this is a bug. It was found by the recent change to parse assembly, but it is an independent bug.

Would you mind reporting it with your simple "b 1f" example in llvm.org/bugs?

Thanks,
Rafael