PR36144: X86 Intel syntax and masm flavor

Hi,

We have a significant regression since llvm 5.0.0 in the x86 assembler.

The following snippets fail:

1)

.intel_syntax
0:
  jmp 0b

2)

.intel_syntax
  and edi, 0b010101

when running through `llvm-mc -arch x86-64`.

This regression was introduced in r301390, which was driven by PR27884.

I think https://llvm.org/PR36144 describes this very well, and I think we should
get this fixed, since it's a pretty basic thing to support in the assembler.

Here are a few solutions to this:

1) Introduce a new asm dialect/flavor/style to assemble masm files.

2) Only set the flags based on the target triple. Also suggested in PR27884.

3) Only set the flags based on a new command line flag.

Let me know if any other solution comes to mind.

While we get this issue fixed, is it reasonable to revert r301390?

Thanks,

I think we should revert r301390 just on principle from looking at the code. If I understand correctly, it flips the bit for “is parsing inline asm” to true when encountering a plain .intel_syntax directive. That’s just wrong.

Sorry, I spoke too soon. This only happens for intel style inline assembly in LLVM IR. I don’t have a good suggestion.

Sorry, I spoke too soon. This only happens for intel style inline assembly in LLVM IR. I don’t have a good suggestion.

Why is it relevant that the issue is contained? The 0b support in MS asm shouldn’t break the general intel assembly syntax.

After looking more closely at the patch in question, I again think it should be reverted.

The patch sets a boolean to indicate that inline asm is being parsed in three places now:

  1. true when parsing intel inline asm
  2. true when .intel_syntax directives are encountered (BUG!)
  3. false when .att_syntax is encountered

We should only set this to true when parsing inline asm, clearly. I sent my second email after I saw place 1 when re-reading the patch and thought, oh, everything is working as intended. I’ll go ahead and do it, since it is clearly wrong. .intel_syntax should not imply IsParsingInlineAsm.

After looking more closely at the patch in question, I again think it should be reverted.

The patch sets a boolean to indicate that inline asm is being parsed in three places now:

  1. true when parsing intel inline asm
  2. true when .intel_syntax directives are encountered (BUG!)
  3. false when .att_syntax is encountered

We should only set this to true when parsing inline asm, clearly. I sent my second email after I saw place 1 when re-reading the patch and thought, oh, everything is working as intended. I’ll go ahead and do it, since it is clearly wrong. .intel_syntax should not imply IsParsingInlineAsm.

Thanks, Reid!

So, unfortunately it is not so easy to revert that patch. Previously, here is what would happen:

  1. clang frontend parses __asm { … } blob with the help of MC. IsParsingMSInlineAsm is true in the AsmLexer. 0b111 is recognized as a binary integer literal.
  2. clang rewrites the user asm, including many integer literals, to gnu/intel syntax, something that llvm-mc could parse in standalone mode. This rewrites 0b111 to 7, for example.
  3. Backend builds a string literal to parse from the inline asm blob. String literal says something like “and al, 7” instead of “and al, 0b111”.
  4. Backend parses and re-emits asm in AT&T syntax, and you get “andb $7, %al”.

r311639 removed the infrastructure that rewrote immediate expressions in step 2 to plain decimal integers, so now this doesn’t work. We do step 1 correctly, step 2 never happens, and step 3 can’t parse 0b111, since it thinks it’s a label reference.

I think we need to fix forward at this point. I think the best way forward would be some global option to control how 0b is interpreted, rather than a distinct masm vs intel asm dialect.

Nevermind, I came up with a solution and sent it out here: https://reviews.llvm.org/D53535
That should at least get things back to normal.