Getting rid of tabs in LLVM's assembly output?

Scenario: sometimes when creating tests for MC, I run llc, take its
assembly (.s) output and copy-paste parts of it into a test.

Problem: I then get tabs in my tests, which are discouraged by LLVM's
own code standards, because assembly output uses tabs extensively.

Proposal: get rid of tabs by just replacing them with two spaces everywhere.

I had an informal chat about this with Jim on the IRC channel and he
tends to agree, but I just wanted to make sure it makes sense for the
community before starting this. The change mostly involves lib/MC and
the target-specific TableGen files defining instructions.

Eli

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Eli Bendersky
Subject: [LLVMdev] Getting rid of tabs in LLVM's assembly output?

Problem: I then get tabs in my tests, which are discouraged by LLVM's
own code standards, because assembly output uses tabs extensively.

Proposal: get rid of tabs by just replacing them with two spaces everywhere.

That will cause some really ugly output; can a somewhat more intelligent formatter be used to keep columns aligned?

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Eli Bendersky
Subject: [LLVMdev] Getting rid of tabs in LLVM's assembly output?

Problem: I then get tabs in my tests, which are discouraged by LLVM's
own code standards, because assembly output uses tabs extensively.

Proposal: get rid of tabs by just replacing them with two spaces everywhere.

That will cause some really ugly output; can a somewhat more intelligent formatter be used to keep columns aligned?

I don't mind getting rid of tabs as a general thing, but it is a rather large undertaking. I concur with Chuck that we'd want to replace them with some intelligent column-aware formatting rather than a straight "two spaces per tab."

FWIW, partly for this exact reason, there's effectively a local exception that tabs are OK in .s file assembler test cases. If that's the main motivation, I wouldn't worry about it.

-Jim

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Eli Bendersky
Subject: [LLVMdev] Getting rid of tabs in LLVM's assembly output?

Problem: I then get tabs in my tests, which are discouraged by LLVM's
own code standards, because assembly output uses tabs extensively.

Proposal: get rid of tabs by just replacing them with two spaces everywhere.

That will cause some really ugly output; can a somewhat more intelligent formatter be used to keep columns aligned?

I don't mind getting rid of tabs as a general thing, but it is a rather large undertaking. I concur with Chuck that we'd want to replace them with some intelligent column-aware formatting rather than a straight "two spaces per tab."

include/llvm/Support/FormattedStream.h supports PadToColumn, which is perfect for this.

FWIW, partly for this exact reason, there's effectively a local exception that tabs are OK in .s file assembler test cases. If that's the main motivation, I wouldn't worry about it.

I agree, tabs in testcases are fine IMO.

-Chris

Here's how some random output currently looks in a 2-space-per-tab editor:

  cmpl $0, -28(%rbp)
  je .LBB2_9
  movsbl -81(%rbp), %eax
  movq -16(%rbp), %rcx
  movb 56(%rcx), %dl
  andb $1, %dl
  movzbl %dl, %esi

So if you're worried about ugliness, it's already there :slight_smile:

Eli

But its pretty easy to change the tabstop within the editor to make it readable.

True, in this case... The output is not trying to be intelligent in
the general case, just spitting out tabs. I agree that to replace
this, however, it would be best to look at some smart column-padded
formatting than use a constant tab -> N spaces replacement. I'll see
if this is something I can get to.

Eli

Maybe it's naive, but I would expect it to be easy for each backend to
expose a constant N which is the length of the longest mnemonic, and then
for the printer to pad to N+1 or N+2....

That would probably work for X86, but other targets (ARM in particular) often have operands which are printed/parsed as suffices on the mnemonic itself. Because of these, the backend does not statically know the longest potential string-of-stuff-before-the-tab.

–Owen

> Maybe it's naive, but I would expect it to be easy for each backend to
> expose a constant N which is the length of the longest mnemonic, and then
> for the printer to pad to N+1 or N+2….

That would probably work for X86, but other targets (ARM in particular)
often have operands which are printed/parsed as suffices on the mnemonic
itself. Because of these, the backend does not statically know the longest
potential string-of-stuff-before-the-tab.

Are you thinking of something beyond the ".F32.I16" suffixes (for
example)? If not, the result may not be TableGeneratable, but is
probably conservatively known as "8 + natural mnemonic length" for
these purposes.

Tim.

(N.b. I have been looking almost exclusively at the 64-bit
architecture for the last year, I could well be massively wrong about
the 32-bit world).

The great thing is, if its close enough, it doesn't matter if there exist
corner cases that are formatted less well.

It affects pretty much all ARM instructions, which can have a predicate suffix. Some can also have an S suffix. Those are not expressed to tblgen as part of the mnemonic, so it can’t know about them when computing a maximum mnemonic length.

–Owen

It affects pretty much all ARM instructions, which can have a predicate

That suggests to me that a Target callback may be the way to go rather than relying on TableGen getting the right answer. Realistically ‘\t’ doesn’t get it right every time at the moment. Even ignoring long instructions (which exist), InstAliases only work with a single space.

Make it a well-documented pure-virtual method in the appropriate class and implementors will be even more more likely to see it.

Tim.

Even if in some freak cases the formatting isn't perfect, it can still
be a massive improvement if the pad-to-column formatter is used in
tandem with Chandler's suggestion. Don't forget that the current
situation is to just emit a single tab after each instruction, so even
relatively simple cases get formatted badly.

Eli