Extending AsmPrinter

I'm looking at extending AsmPrinter to pretty-print comments after
instructions (I'm adding the necessary fields to MachineInstr to do this).

I'm trying to grok AsmWriterEmitter and having a tough go of it. I look at
X86GenAsmWriter1.inc (the Intel syntax writer) and understand that
there's a case block for printing operands under several switch statements,
one per "fragment."

I need to recode AsmWriterEmitter to call a function right before it emits
the final newline for an instruction to do any instruction post-processing
needed (e.g. printing comments).

AsmWriterInst ::AsmWriterInst seems to be the place to do this but my
reading says that it always appends a newline to whatever string it's
creating. So this can't be used for fragments that don't end in a newline.
Is this routine only used to generate the switch cases for fragments that
end an instruction? Is it the right place to insert the necessary code?

I also have a few questions on the general design of AsmPrinter. Why is
runOnMachineFunction implemented for each target AsmPrinter? I would
have expected this to live in the base AsmPrinter class with calls out to
specialized helper functions (Template Method pattern). If things were
designed this way, it would be trivial to do what I want:

AsmPrinter::runOnMachineFunction(...) {
[...] // Call helpers to print out constant pools, check calling conventions,
       / /etc.

  // Print out code for the function.
  for (MachineFunction::const_iterator I = MF.begin(), E = MF.end();
       I != E; ++I) {
    // Print a label for the basic block.
    if (I != MF.begin()) {
      printBasicBlockLabel(I, true); // Overridden for each target
      O << '\n';
    }
    for (MachineBasicBlock::const_iterator II = I->begin(), E = I->end();
         II != E; ++II) {
      // Print the assembly for the instruction.
      O << "\t";
      printInstruction(II); // Overridden for each target
      postInstructionAction(II); // New API for comment printing, etc.
      O << "\n";
      ++EmittedInsts;
    }
  }
}

Then TableGen wouldn't have to worry about printing newlines either.
The best thing is that parts of postInstructionAction are independent
of the target. Comment printing is a good example. The only thing that
matters on any machine llvm targets is the comment delimeter, which is
already handled by TargetAsmInfo.

Along the same lines, AsmWriterEmitter always dumps this exact
code to each target in the generated printInstruction function:

  if (MI->getOpcode() == TargetInstrInfo::INLINEASM) {
    printInlineAsm(MI);
    return true;
  } else if (MI->getOpcode() == TargetInstrInfo::LABEL) {
    printLabel(MI);
    return true;
  }

This could just as well go in the AsmWriter base class and TableGen wouldn't
have to worry about it.

Thoughts?

                                         -Dave

I'm looking at extending AsmPrinter to pretty-print comments after
instructions (I'm adding the necessary fields to MachineInstr to do this).

ok

I also have a few questions on the general design of AsmPrinter. Why is
runOnMachineFunction implemented for each target AsmPrinter? I would

Historical reasons that aren't very good. Over the years, I've taken several stabs at merging the asmprinters from various targets together. They used to be completely separate with no common base class for example. Any progress to merging them further would be great. :slight_smile:

have expected this to live in the base AsmPrinter class with calls out to
specialized helper functions (Template Method pattern). If things were
designed this way, it would be trivial to do what I want:

Yes, this is one advantage, another is that we wouldn't have to fix the same bug in multiple places :slight_smile:

-Chris

> I also have a few questions on the general design of AsmPrinter. Why is
> runOnMachineFunction implemented for each target AsmPrinter? I would

Historical reasons that aren't very good. Over the years, I've taken
several stabs at merging the asmprinters from various targets together.
They used to be completely separate with no common base class for example.
Any progress to merging them further would be great. :slight_smile:

Ok, I'll see what I can do. :slight_smile:

> have expected this to live in the base AsmPrinter class with calls out to
> specialized helper functions (Template Method pattern). If things were
> designed this way, it would be trivial to do what I want:

Yes, this is one advantage, another is that we wouldn't have to fix the
same bug in multiple places :slight_smile:

Right. I've figured out where I need to make the changes in AsmWriterEmitter
given the current design.

I'm trying out a few interesting things here based on custom streambufs.
It requires some surgery to AsmPrinters as a std::ostream won't work
anymore due to the enhanced functionality of the custom streambufs.
Is this still interesting to the larger LLVM community? One thing I have
to do is figure out how to handle asm dumps to cerr.

                                                   -Dave

Yes, this is one advantage, another is that we wouldn't have to fix the
same bug in multiple places :slight_smile:

Right. I've figured out where I need to make the changes in AsmWriterEmitter
given the current design.

Ok, cool.

I'm trying out a few interesting things here based on custom streambufs.
It requires some surgery to AsmPrinters as a std::ostream won't work
anymore due to the enhanced functionality of the custom streambufs.
Is this still interesting to the larger LLVM community? One thing I have
to do is figure out how to handle asm dumps to cerr.

It really depends on what you mean. I've found that iostreams are extremely slow, much slower than C stdio streams. Eventually I'd like to move them to stdio, not to more complex streambufs :slight_smile:

-Chris

Basically, what I've done is augment basic_streambuf to keep track of
columns and provide manipulators that use this ability to set the output
at a specific column (by padding with spaces if necessary). This makes
it easy to line up comments, for example.

This would be quite a bit uglier with C stdio. For example:

// Custom streambuf (pseudo-llvm APIs)

instr->print(O);
O << comment("Place after instruction");

// C stdio

int current_column = instr->print(outfile);
printComment(current_column, "Place after instruction");

This means that every ::print/::dump/etc API that's used to print asm has to
track columns by parsing output strings and looking for newlines and tabs.
With a custom streambuf it's automatic.

There's value to this whole newfangled encapsulation thing. :slight_smile:

As for performance, I guess I haven't seen a huge issue. The compilation
time is spent in the optimization, not in the asm output.

Right now the custom streambuf is only used to print asm. I don't see a
reason to use it elsewhere. Of course, other kinds of custom streambuf
can be useful. I once wrote a circular-buffering streambuf that was great
for dumping debug output. At program termination the streambuf dumped
that last N lines of output sent to it. So you didn't get gobs of debug
output when all you really cared about was what happened toward the end.

                                        -Dave

It really depends on what you mean. I've found that iostreams are
extremely slow, much slower than C stdio streams. Eventually I'd like to
move them to stdio, not to more complex streambufs :slight_smile:

Basically, what I've done is augment basic_streambuf to keep track of
columns and provide manipulators that use this ability to set the output
at a specific column (by padding with spaces if necessary). This makes
it easy to line up comments, for example.

Nifty.

This would be quite a bit uglier with C stdio. For example:

The syntax used doesn't depend on the underlying implementation methodology. It would be straight-forward to define a new "formated_ostream" (or whatever) that uses operator<< etc, but that isn't actually implemented in terms of ostreams :slight_smile:

This means that every ::print/::dump/etc API that's used to print asm has to
track columns by parsing output strings and looking for newlines and tabs.
With a custom streambuf it's automatic.

Yep.

There's value to this whole newfangled encapsulation thing. :slight_smile:

Hey, I've heard of that... do people really use it? :wink:

As for performance, I guess I haven't seen a huge issue. The compilation
time is spent in the optimization, not in the asm output.

The time this matters is during -O0 compilation, when you do no optimization. Compile time is incredibly important because -O0 compiles directly impact the edit/compile/debug turn-around cycle.

Right now the custom streambuf is only used to print asm. I don't see a
reason to use it elsewhere.

It sounds very handy to me for a number of things. :slight_smile: Even tablegen and the CBE could use it to produce nicer generated code. We could use is in the .ll printer to line up the #uses comments in the .ll file, etc. For it to really be widely usable though, it would have to be very efficient.

-Chris

>> Yes, this is one advantage, another is that we wouldn't have to fix the
>> same bug in multiple places :slight_smile:
>
> Right. I've figured out where I need to make the changes in AsmWriterEmitter
> given the current design.

Ok, cool.

> I'm trying out a few interesting things here based on custom streambufs.
> It requires some surgery to AsmPrinters as a std::ostream won't work
> anymore due to the enhanced functionality of the custom streambufs.
> Is this still interesting to the larger LLVM community? One thing I have
> to do is figure out how to handle asm dumps to cerr.

It really depends on what you mean. I've found that iostreams are
extremely slow, much slower than C stdio streams. Eventually I'd like to
move them to stdio, not to more complex streambufs :slight_smile:

Eventually I'd like to move them to native syscalls adapted by
lib/System and suited to the purposes of compilers :slight_smile:

Reid.

I was just about to say that. While debugging my custom streambuf I had the
"opportunity" of investigating how libstdc++-v3 defines cout and friends.
It's not pretty. Basically, there's a stdio_sync_filebuf used as a proxy to
C's FILE *. Because stdio_sync_filebuf does no buffering itself (delegating
that to FILE), basic_streambuf::overflow is called a lot. Some of this is
mitigated with xsputn for multiple characters, but it's a lot of virtual
function calls.

To get something faster probably means a custom buffering solution (which
my custom streambuf has, but is not particularly intelligent) and native
syscalls. The downside of my implementation is that it is POSIX-specific
and thus won't be real useful for Windows.

                                           -Dave

I was just about to say that. While debugging my custom streambuf I had the
"opportunity" of investigating how libstdc++-v3 defines cout and friends.
It's not pretty. Basically, there's a stdio_sync_filebuf used as a proxy to
C's FILE *. Because stdio_sync_filebuf does no buffering itself (delegating
that to FILE), basic_streambuf::overflow is called a lot. Some of this is
mitigated with xsputn for multiple characters, but it's a lot of virtual
function calls.

Yes, it is extremely inefficient.

To get something faster probably means a custom buffering solution (which
my custom streambuf has, but is not particularly intelligent) and native
syscalls. The downside of my implementation is that it is POSIX-specific
and thus won't be real useful for Windows.

Why not just use the unlocked stdio calls?

-Chris

Because I'm lazy. :slight_smile:

No, seriously, because I just haven't had time to do it. And because I'm
lazy. If/when I submit this to LLVM I'll look at going that route. As for
now, I just have to get something working for our needs here.

                                          -Dave

and thus won't be real useful for Windows.

Why not just use the unlocked stdio calls?

Because I'm lazy. :slight_smile:

It seems much easier to use a robust and well tested system than invent a new crazy scheme :wink:

No, seriously, because I just haven't had time to do it. And because I'm
lazy. If/when I submit this to LLVM I'll look at going that route. As for
now, I just have to get something working for our needs here.

ok!

-Chris

I'm wondering if there are issues with static initialization _a_la_ the
contortions iostreams goes through to initialize cout and friends. Right
now everything I have is initialized in constructors.

I haven't looked at this enough to know for sure yet.

                                               -Dave

Actually, there's another reason not to use unlocked calls. They require
POSIX compliance.

To get portability and most of the performance I plan to look at unbuffered
stdio.

                                              -Dave

Actually, there's another reason not to use unlocked calls. They require
POSIX compliance.

Posix is pretty available, what system doesn't have them?

To get portability and most of the performance I plan to look at unbuffered
stdio.

Buffering is goodness, no?

-Chris

> Actually, there's another reason not to use unlocked calls. They require
> POSIX compliance.

Posix is pretty available, what system doesn't have them?

Windows, for one. If POSIX is ok, it's better in my mind to just directly
use open, write and friends, which is what I do now. Going the cstdio
route should only be done for portability reasons to support non-POSIX
systems.

> To get portability and most of the performance I plan to look at
> unbuffered stdio.

Buffering is goodness, no?

Buffering is goodness. The problem with using buffered cstdio under
iostreams is that you have to go through overflow() and/or xsputn() to send
data to cstdio. These are virtual calls and they will be more frequent
because there's no buffering at the iostreams level. It seems wasteful
to do buffering at two different levels so I'd say it's better to do it before
the virtual calls happen, for performance reasons.

                                              -Dave

Posix is pretty available, what system doesn't have them?

Windows, for one. If POSIX is ok, it's better in my mind to just directly

Windows has POSIX calls.

use open, write and friends, which is what I do now. Going the cstdio
route should only be done for portability reasons to support non-POSIX
systems.

I don't think open/write etc are POSIX. Windows has them but you have to use different headers etc to get access to them. I am no expert in this area though.

To get portability and most of the performance I plan to look at
unbuffered stdio.

Buffering is goodness, no?

Buffering is goodness. The problem with using buffered cstdio under
iostreams is that you have to go through overflow() and/or xsputn() to send
data to cstdio.

I'm not suggesting using iostreams. I'm suggesting using FILE*'s directly.

These are virtual calls and they will be more frequent because there's no buffering at the iostreams level. It seems wasteful to do buffering at two different levels so I'd say it's better to do it before the virtual calls happen, for performance reasons.

I agree.

-Chris

>> Posix is pretty available, what system doesn't have them?
>
> Windows, for one. If POSIX is ok, it's better in my mind to just
> directly

Windows has POSIX calls.

Ok, I'll admit I don't know a lot about what's available there. I'm going off
previous experience where POSIX was a pain. If that's changed, great!

I don't think open/write etc are POSIX. Windows has them but you have to
use different headers etc to get access to them. I am no expert in this
area though.

% man open
NAME
       open, creat - open and possibly create a file or device
[...]
CONFORMING TO
       SVr4, SVID, POSIX, X/OPEN, BSD 4.3. [...]

I'm not suggesting using iostreams. I'm suggesting using FILE*'s
directly.

Ah. Basically operator<< for FILE*. I don't see a lot of benefit to
redefining operator<< for every type just to avoid ostreams. The
problem isn't basic_ostream _per_se_. There are two issues we're
worried about:

1. The static constructor contortions in place to handle the extremely
   unlikely case of a static object constructor writing to one of the
   standard streams.

2. Slowness of std::cout and friends due to their reliance on FILE * buffering
   and resulting excessive virtual function calls.

I just implemented "standard" AsmStreams (the name for my custom streams) to
handle the cases where we want to dump Asm to standard error. I just point
the AsmStreambuf to STDERR_FILENO. I simply avoid the constructor issue by
declaring that no user shall write to an AsmStream in a static object
constructor. So we don't have worries about multiple constructor calls at
program startup. There is no ios_base::Init equivalent for AsmStream.

The second concern is addressed as described earlier. I do the buffering
at the AsmStream level to minimize virtual calls.

This will still be slightly slower than raw FILE * or POSIX calls because
there will be some virtual function call overhead. But it should be much
better than standard streams and we get all of the nice benefits of iostreams.

Of course, all to be evaluated through benchmarking.

                                                  -Dave

Ah. Basically operator<< for FILE*. I don't see a lot of benefit to
redefining operator<< for every type just to avoid ostreams. The
problem isn't basic_ostream _per_se_. There are two issues we're
worried about:

The problem is basic_ostream and the design of the whole iostreams library. Use of virtual base classes makes every virtual method particularly expensive, for example.

1. The static constructor contortions in place to handle the extremely
  unlikely case of a static object constructor writing to one of the
  standard streams.

I also dislike this strongly, but it isn't really related.

2. Slowness of std::cout and friends due to their reliance on FILE * buffering and resulting excessive virtual function calls.

Use of FILE* is not the issue, virtual functions are.

I just implemented "standard" AsmStreams (the name for my custom streams) to
handle the cases where we want to dump Asm to standard error. I just point
the AsmStreambuf to STDERR_FILENO. I simply avoid the constructor issue by
declaring that no user shall write to an AsmStream in a static object
constructor. So we don't have worries about multiple constructor calls at
program startup. There is no ios_base::Init equivalent for AsmStream.

This gives you the virtual overhead of streams without the buffering overhead of FILE*. Instead of the buffering overhead of stdio you have your own different but exactly equivalent buffering overhead.

The second concern is addressed as described earlier. I do the buffering
at the AsmStream level to minimize virtual calls.

How is this different than buffering done by stdio?

This will still be slightly slower than raw FILE * or POSIX calls because
there will be some virtual function call overhead. But it should be much
better than standard streams and we get all of the nice benefits of iostreams.

The only problem you seem to have solved is avoiding having to define a few operator<<'s, this doesn't seem like a very good tradeoff to me.

-Chris

> Ah. Basically operator<< for FILE*. I don't see a lot of benefit to
> redefining operator<< for every type just to avoid ostreams. The
> problem isn't basic_ostream _per_se_. There are two issues we're
> worried about:

The problem is basic_ostream and the design of the whole iostreams
library. Use of virtual base classes makes every virtual method
particularly expensive, for example.

I don't know about "particularly." The performance loss of virtual functions
is mostly due to lack of inlining. Once that's gone, a pointer adjustment
here or these isn't going to matter much.

> 1. The static constructor contortions in place to handle the extremely
> unlikely case of a static object constructor writing to one of the
> standard streams.

I also dislike this strongly, but it isn't really related.

Isn't this the motivation behind "#include <iostreams> is hereby FORBIDDEN?"

> 2. Slowness of std::cout and friends due to their reliance on FILE *
> buffering and resulting excessive virtual function calls.

Use of FILE* is not the issue, virtual functions are.

To some degree. I'd say the bigger issue is that libstdc++-v3 has put the
buffering behind the virtual calls, causing many more virtual calls than
necessary. This is a direct consequence of using FILE *.

This gives you the virtual overhead of streams without the buffering
overhead of FILE*. Instead of the buffering overhead of stdio you have
your own different but exactly equivalent buffering overhead.

What "overhead" are you talking about, specifically? I'm not sure what you're
getting at here. All buffering necessarily has some overhead.

> The second concern is addressed as described earlier. I do the buffering
> at the AsmStream level to minimize virtual calls.

How is this different than buffering done by stdio?

It's in front of the virtual calls. So you only invoke virtual calls when the
buffer is full. At that point, you're writing to the file cache at best or to
disk at worst (or is that to the network?). A virtual call seems a small
price to pay for the flexibility it provides.

> This will still be slightly slower than raw FILE * or POSIX calls because
> there will be some virtual function call overhead. But it should be much
> better than standard streams and we get all of the nice benefits of
> iostreams.

The only problem you seem to have solved is avoiding having to define a
few operator<<'s, this doesn't seem like a very good tradeoff to me.

It's more than a few. Do a "grep 'operator<<'" in your C++ standard include
directory. Arguably we don't need all of those, but we'll need a lot of them.

It also means developing new manipulators and possibly locale support if
we want to use it for error messages and the like.

What you're talking about is developing a new C++ I/O library, which is
certainly a worthy goal, but one I'm not being paid to achieve. :slight_smile:

If someone wants to take on this larger project, there would be many in the
C++ community who would be interested in helping. The Boost folks, for
example, have been talking about I/O designs for a couple of months now.

                                               -Dave

1. The static constructor contortions in place to handle the extremely
  unlikely case of a static object constructor writing to one of the
  standard streams.

I also dislike this strongly, but it isn't really related.

Isn't this the motivation behind "#include <iostreams> is hereby FORBIDDEN?"

Yes it is. As I said, I don't like this aspect of the standard streams, but it isn't very related to this problem.

2. Slowness of std::cout and friends due to their reliance on FILE *
   buffering and resulting excessive virtual function calls.

Use of FILE* is not the issue, virtual functions are.

To some degree. I'd say the bigger issue is that libstdc++-v3 has put the
buffering behind the virtual calls, causing many more virtual calls than
necessary. This is a direct consequence of using FILE *.

How so? It's entirely possible to use FILE*'s without virtual methods: just don't use iostreams.

This gives you the virtual overhead of streams without the buffering
overhead of FILE*. Instead of the buffering overhead of stdio you have
your own different but exactly equivalent buffering overhead.

What "overhead" are you talking about, specifically? I'm not sure what you're
getting at here. All buffering necessarily has some overhead.

Buffering has overhead from copying data around. It doesn't matter whether you do in by using stdio buffering or by building your own buffers to avoid virtual method overhead.

The second concern is addressed as described earlier. I do the buffering
at the AsmStream level to minimize virtual calls.

How is this different than buffering done by stdio?

It's in front of the virtual calls. So you only invoke virtual calls when the
buffer is full. At that point, you're writing to the file cache at best or to
disk at worst (or is that to the network?). A virtual call seems a small
price to pay for the flexibility it provides.

How is that different than stdio?

This will still be slightly slower than raw FILE * or POSIX calls because
there will be some virtual function call overhead. But it should be much
better than standard streams and we get all of the nice benefits of
iostreams.

The only problem you seem to have solved is avoiding having to define a
few operator<<'s, this doesn't seem like a very good tradeoff to me.

It's more than a few. Do a "grep 'operator<<'" in your C++ standard include
directory. Arguably we don't need all of those, but we'll need a lot of them.

You only need to define the ones you need, and can do it over time.

It also means developing new manipulators and possibly locale support if
we want to use it for error messages and the like.

Manipulators and locales are two other really ugly things of iostreams :slight_smile:

What you're talking about is developing a new C++ I/O library, which is
certainly a worthy goal, but one I'm not being paid to achieve. :slight_smile:

Fair enough.

-Chris