[PATCH] Circular Buffered Debug Stream

This is the patch to provide dbgs(), an optionally circular-buffering
debug stream. The idea is to buffer debug output and print the last
N bytes of output, where N is specified via a command-line switch
-debug-buffer-size. This is useful when compiling very large codes
where one is only interested in the last few thousand lines (or so) of
debug output, just before some failure.

This patch is for discussion purposes. I will phase in the patch in
small chunks. In fact this patch doesn't even show all of the places
that need to change, just the places we changed in LLVM 2.5. This posting
gives the general idea of what things will look like when the conversion
process is complete.

Please review and let me know what needs to change before I start
merging stuff in.

Thanks!

                                -Dave

dbgs.patch (60.8 KB)

I left in quite a few unnecessary headers from previous incarnations
(streambuf, etc.). I will fix those. Please comment on the rest.

                                 -Dave

Hi David,

Please send complete and minimal patches to avoid wasting reviewer's time,

-Chris

What do you want, complete or minimal? I don't want to pre-send 50
or so patches (one per source file) that simply change errs() to dbgs().
Thus I thought I would send a patch representative of what will happen.

What, specifically, do you want reviewed before I start checking in?
I'll be happy to prepare a patch that shows just that.

                           -Dave

I want a patch that does one thing (e.g. implements dbgs(), optionally switches clients to use it) that is complete enough that I can apply it to the tree and test it if I so choose. It should not have unrelated stuff in it (like other unrelated work).

-Chris

Ok, let's start with that. Here's the patch to add the circular raw_ostream.

Do you want another patch that modifies Debug.h and then one more patch that
shows client use?

                                -Dave

dbgs.patch (6.07 KB)

What, specifically, do you want reviewed before I start checking in?

I’ll be happy to prepare a patch that shows just that.

I want a patch that does one thing (e.g. implements dbgs(),

Ok, let’s start with that. Here’s the patch to add the circular raw_ostream.

This idea of the patch makes sense to me, but here are a few questions:

  • int BufferSize;

  • std::vector BufferArray;

  • bool DelayOutput;

  • std::vector::iterator Cur;

Please doxygenify these.

Instead of using a std::vector for BufferArray, please just new[] an array since it is fixed size.

Why is BufferSize needed with the vector? Isn’t it always the size of BufferArray? Also, why is it signed?

What does DelayOutput do? Even reading the code, I don’t really understand what it is doing.

  • circular_raw_ostream(raw_ostream &Stream, int BufferSize = -1,
  • bool Delete = false)
  • : raw_ostream(/unbuffered/true), TheStream(0), DeleteStream(false),
  • BufferArray(BufferSize < 0 ? 8192 : BufferSize),

Please make BufferSize an ‘unsigned’ and default it to 8192. Please use PRESERVE_STREAM instead of ‘false’.

Do you want another patch that modifies Debug.h and then one more patch that
shows client use?

Sure, you don’t need to convert everything over, but once the general approach looks fine you can do the rest without review.

-Chris

+ int BufferSize;
+ std::vector<char> BufferArray;
+ bool DelayOutput;
+ std::vector<char>::iterator Cur;

Please doxygenify these.

Ok.

Instead of using a std::vector for BufferArray, please just new[] an array
since it is fixed size.

Ok.

Why is BufferSize needed with the vector? Isn't it always the size of
BufferArray? Also, why is it signed?

Well, you need it with a new[]'d array. :slight_smile:

It's signed because running loops with signed induction variables is better
than with unsiged induction variables. In this case it's moot because
::write_impl is passed a size_t anyway.

It's just learned habit by me to make anything having to do with induction
variables signed.

I'll make it unsigned.

What does DelayOutput do? Even reading the code, I don't really understand
what it is doing.

Originally it told the stream whether to buffer. It's superfluous now so I'll
remove it.

Please make BufferSize an 'unsigned' and default it to 8192. Please use
PRESERVE_STREAM instead of 'false'.

Ok.

Do you want another patch that modifies Debug.h and then one more patch
that shows client use?

Sure, you don't need to convert everything over, but once the general
approach looks fine you can do the rest without review.

All right.

                              -Dave

Here's an updated version of the circular buffer. Ok to check in?

                              -Dave
Index: include/llvm/Support/circular_raw_ostream.h

Please make BufferSize an 'unsigned' and default it to 8192. Please use
PRESERVE_STREAM instead of 'false'.

Here's an updated version of the circular buffer. Ok to check in?

This is looking a lot better, here are some more comments:

+ /// current_pos - Return the current position within the stream,
+ /// not counting the bytes currently in the buffer.
+ virtual uint64_t current_pos() {

I didn't notice this earlier, but is there any reason for current_pos to be non-const in raw_ostream?

+ /// circular_raw_ostream - Open the specified file for
+ /// writing. If an error occurs, information about the error is
+ /// put into ErrorInfo, and the stream should be immediately
+ /// destroyed; the string will be empty if no error occurred.

This comment is out of date, there is no ErrorInfo.

+ ///
+ /// As a side effect, the given Stream is set to be Unbuffered.
+ /// This is because circular_raw_ostream does its own buffering,
+ /// so it doesn't want another layer of buffering to be happening
+ /// underneath it.
+ ///
+ circular_raw_ostream(raw_ostream &Stream, unsigned BuffSize = 8192,
+ bool Delete = false)
+ : raw_ostream(/*unbuffered*/true),
+ TheStream(0),
+ DeleteStream(PRESERVE_STREAM),
+ BufferSize(BuffSize),
+ BufferArray(0) {
+ if (BufferSize > 0)
+ BufferArray = new char[BufferSize];

How about just asserting that BufferSize is not zero? This stream won't work well and doesn't make sense with a buffer of zero.

+ void setStream(raw_ostream &Stream, bool Delete = false) {
+ releaseStream();
+
+ TheStream = &Stream;
+ DeleteStream = Delete;
+
+ if (BufferSize > 0) {

The only time where it makes sense for BufferSize to be zero here is if the circular stream was default constructed. In this case, it seems reasonable to allocate a buffer here instead of trying to operate without one.

It seems reasonable to say (and add to the doxygen comment for the default ctor) that the only operations valid on a default constructed stream are destruction and setStream.

+ void releaseStream() {
+ // Delete the stream if needed. Otherwise, transfer the buffer
+ // settings from this raw_ostream back to the underlying stream.
+ if (!TheStream)
+ return;
+ if (DeleteStream)
+ delete TheStream;
+ else if (BufferSize > 0)
+ TheStream->SetBufferSize(BufferSize);
+ else
+ TheStream->SetUnbuffered();

Again, I don't think an unbuffered circular stream makes a lot of sense here. Another issue is that this is transfering the circular stream's buffer to the underlying stream. Would it make more sense and would it be worth it to save the underlying streams buffer size and restore it in releaseStream?

..

+void circular_raw_ostream::write_impl(const char *Ptr, size_t Size) {
+ if (BufferSize > 0) {

This check can go away.

+ // Write into the buffer, wrapping if necessary.
+ while (Size > 0) {
+ while (Size > 0 && Cur != BufferArray + BufferSize) {
+ --Size;
+ *Cur++ = *Ptr++;
+ }

This should use memcpy.

+void circular_raw_ostream::dumpLog(void) {
+ if (BufferSize > 0) {
+ // Write out the buffer
+ const char *msg = "*** Log Output ***\n";
+ int num = std::strlen(msg);
+
+ TheStream->write(msg, num);
+ printLog();
+ }
+}

Is this to debug the implementation of the stream? Should this go into the tree?

-Chris

This is looking a lot better, here are some more comments:
> + /// current_pos - Return the current position within the stream,
> + /// not counting the bytes currently in the buffer.
> + virtual uint64_t current_pos() {

I didn't notice this earlier, but is there any reason for current_pos to be
non-const in raw_ostream?

Probably not. I'm going to let someone else fix that, though.

> + /// circular_raw_ostream - Open the specified file for
> + /// writing. If an error occurs, information about the error is
> + /// put into ErrorInfo, and the stream should be immediately
> + /// destroyed; the string will be empty if no error occurred.

This comment is out of date, there is no ErrorInfo.

Will remove.

> + ///
> + /// As a side effect, the given Stream is set to be Unbuffered.
> + /// This is because circular_raw_ostream does its own buffering,
> + /// so it doesn't want another layer of buffering to be happening
> + /// underneath it.
> + ///
> + circular_raw_ostream(raw_ostream &Stream, unsigned BuffSize = 8192,
> + bool Delete = false)
> + : raw_ostream(/*unbuffered*/true),
> + TheStream(0),
> + DeleteStream(PRESERVE_STREAM),
> + BufferSize(BuffSize),
> + BufferArray(0) {
> + if (BufferSize > 0)
> + BufferArray = new char[BufferSize];

How about just asserting that BufferSize is not zero? This stream won't
work well and doesn't make sense with a buffer of zero.

Sure it does. In fact it's crucial for making dbgs() work like errs() in
the default case.

> + void setStream(raw_ostream &Stream, bool Delete = false) {
> + releaseStream();
> +
> + TheStream = &Stream;
> + DeleteStream = Delete;
> +
> + if (BufferSize > 0) {

The only time where it makes sense for BufferSize to be zero here is if the
circular stream was default constructed. In this case, it seems reasonable
to allocate a buffer here instead of trying to operate without one.

See above.

> + void releaseStream() {
> + // Delete the stream if needed. Otherwise, transfer the buffer
> + // settings from this raw_ostream back to the underlying stream.
> + if (!TheStream)
> + return;
> + if (DeleteStream)
> + delete TheStream;
> + else if (BufferSize > 0)
> + TheStream->SetBufferSize(BufferSize);
> + else
> + TheStream->SetUnbuffered();

Again, I don't think an unbuffered circular stream makes a lot of sense
here.

See above. :slight_smile:

Another issue is that this is transfering the circular stream's
buffer to the underlying stream. Would it make more sense and would it be
worth it to save the underlying streams buffer size and restore it in
releaseStream?

Oh, I think this code is just wrong. I'll rework it.

> +void circular_raw_ostream::write_impl(const char *Ptr, size_t Size) {
> + if (BufferSize > 0) {

This check can go away.

Nope. We don't want to pring the "Log Output" banner if we're not buffering
(and delaying output until termination). Again, we want dbgs() to work like
errs() in the default case and output immediately.

> + // Write into the buffer, wrapping if necessary.
> + while (Size > 0) {
> + while (Size > 0 && Cur != BufferArray + BufferSize) {
> + --Size;
> + *Cur++ = *Ptr++;
> + }

This should use memcpy.

Ok.

> +void circular_raw_ostream::dumpLog(void) {
> + if (BufferSize > 0) {
> + // Write out the buffer
> + const char *msg = "*** Log Output ***\n";
> + int num = std::strlen(msg);
> +
> + TheStream->write(msg, num);
> + printLog();
> + }
> +}

Is this to debug the implementation of the stream? Should this go into the
tree?

No, this is what gets dumped at program termination. Remember, this stream
buffers everything you send it, effectively dropping everything except the
last N characters, when N == BufferSize. Then at program termination it
dumps the buffer.

logs() might be a better name than dbgs() for this. What do you think?

                              -Dave

No, it's right. It's not transferring the buffer at all. It's just setting
the buffer size of the held stream when we release it. This code is identical
to the code in formatted_raw_ostream.

                                -Dave

Here's the updated patch.

                                    -Dave

+ /// DELETE_STREAM - Tell the destructor to delete the held stream.
+ ///
+ static const bool DELETE_STREAM = true;

Well, that didn't go through right. Here it is again.

                                  -Dave

Index: include/llvm/Support/circular_raw_ostream.h

Argh! Stupid bug fixed. :slight_smile:

Index: include/llvm/Support/circular_raw_ostream.h

This is looking a lot better, here are some more comments:

+ /// current_pos - Return the current position within the stream,
+ /// not counting the bytes currently in the buffer.
+ virtual uint64_t current_pos() {

I didn't notice this earlier, but is there any reason for current_pos to be
non-const in raw_ostream?

Probably not. I'm going to let someone else fix that, though.

Alright, I'll do it.

How about just asserting that BufferSize is not zero? This stream won't
work well and doesn't make sense with a buffer of zero.

Sure it does. In fact it's crucial for making dbgs() work like errs() in
the default case.

Ah, I forgot the usage model.

Is this to debug the implementation of the stream? Should this go into the
tree?

No, this is what gets dumped at program termination. Remember, this stream
buffers everything you send it, effectively dropping everything except the
last N characters, when N == BufferSize. Then at program termination it
dumps the buffer.

Aha, ok. So the usage model is that it normally is unbuffered and pass through like errs(), but you can pass a flag that causes all the -debug output to be circular buffered?

logs() might be a better name than dbgs() for this. What do you think?

If it writes out to stderr, I think dbgs() is a better name, logs() implies that it writes to a log file (at least to me).

-Chris

> No, this is what gets dumped at program termination. Remember, this
> stream
> buffers everything you send it, effectively dropping everything
> except the
> last N characters, when N == BufferSize. Then at program
> termination it
> dumps the buffer.

Aha, ok. So the usage model is that it normally is unbuffered and
pass through like errs(), but you can pass a flag that causes all the -
debug output to be circular buffered?

Right!

> logs() might be a better name than dbgs() for this. What do you
> think?

If it writes out to stderr, I think dbgs() is a better name, logs()
implies that it writes to a log file (at least to me).

Fair enough.

                              -Dave

+ /// circular_raw_ostream - A raw_ostream that saves its output in a
+ /// circular buffer.

A better description would be "which *can* save its data to a circular buffer, or can pass it through directly to an underlying stream if specified with a buffer of zero."

When it is buffering, what causes it to flush? Your current model requires "printLog" which is not an obvious name for this, would it be better to do it on destruction? How about using something like flush_to_underlying_stream() or something like that?

+ /// DELETE_STREAM - Tell the destructor to delete the held stream.
+ ///
+ static const bool DELETE_STREAM = true;
+
+ /// PRESERVE_STREAM - Tell the destructor to not delete the held
+ /// stream.
+ ///
+ static const bool PRESERVE_STREAM = false;

I don't find these to be very clear names. How about giving them a name involving 'ownership' like TAKES_OWNERSHIP? "preservation" isn't really the same as "not deleting when the stream is destroyed" to me.

+ /// TheStream - The real stream we output to. We set it to be
+ /// unbuffered, since we're already doing our own buffering.
+ ///
+ raw_ostream *TheStream;

Now that I understand the model :slight_smile: I don't see why you need to fiddle with the underlying buffering of TheStream. In the dbgs() use case, the underlying stream will be errs() which is already unbuffered.

+ /// printLog - Dump the contents of the buffer to Stream.
+ ///
+ void printLog(void) {
+ TheStream->write(BufferArray, BufferSize);
+ Cur = BufferArray;
+ }

If the buffer has circled around, won't this print it out-of-order? If the current pointer is 5 and the length is 10, it seems like this should do a write([5..9]) then write([0..4])?

+ /// current_pos - Return the current position within the stream,
+ /// not counting the bytes currently in the buffer.
+ virtual uint64_t current_pos() {

This is now const on mainline.

+ public:
+ /// circular_raw_ostream - Open the specified file for
+ /// writing.

This is not a file, please read the comments you're copying :slight_smile:

+ ///
+ /// As a side effect, if BuffSize is nonzero, the given Stream is
+ /// set to be Unbuffered. This is because circular_raw_ostream
+ /// does its own buffering, so it doesn't want another layer of
+ /// buffering to be happening underneath it.
+ ///

Again, I don't think this should happen anymore. You should add some comment about what Delete does, and change it to not use 'false' as I asked for before.

+ circular_raw_ostream(raw_ostream &Stream, size_t BuffSize = 8192,
+ bool Delete = false)
+ : raw_ostream(/*unbuffered*/true),
+ TheStream(0),
+ DeleteStream(PRESERVE_STREAM),
+ BufferSize(BuffSize),
+ BufferArray(0) {
+ if (BufferSize > 0)

Please change all these predicates to BufferSize != 0 instead of using >.

+
+#include "llvm/Support/circular_raw_ostream.h"
+
+#include <algorithm>
+
+using namespace llvm;
+
+void circular_raw_ostream::write_impl(const char *Ptr, size_t Size) {
+ if (BufferSize > 0) {

Please use:

if (BufferSize == 0) {
   .. write ..
   return;
}

to reduce nesting.

+void circular_raw_ostream::dumpLog(void) {
+ if (BufferSize > 0) {
+ // Write out the buffer
+ const char *msg = "*** Log Output ***\n";
+ int num = std::strlen(msg);
+
+ TheStream->write(msg, num);
+ printLog();
+ }
+}

I'm still unclear about this and printLog. The "Log Output" prefix is confusing to me, because this isn't a log file. Maybe just remove the prefix?

-Chris

> + /// circular_raw_ostream - A raw_ostream that saves its output in a
> + /// circular buffer.

A better description would be "which *can* save its data to a circular
buffer, or can pass it through directly to an underlying stream if
specified with a buffer of zero."

Make sense.

When it is buffering, what causes it to flush? Your current model
requires "printLog" which is not an obvious name for this, would it be
better to do it on destruction? How about using something like
flush_to_underlying_stream() or something like that?

Currently, it uses a signal handler. It's important that one gets output
on a SIGTERM, SIGSGEV and other such things and I don't know that object
desctruction is guaranteed in those cases.

Eventually we'll want to dump on SIGUSER1 as well but there's no generic
libSystem hook for that yet.

> + /// DELETE_STREAM - Tell the destructor to delete the held
> stream.
> + ///
> + static const bool DELETE_STREAM = true;
> +
> + /// PRESERVE_STREAM - Tell the destructor to not delete the held
> + /// stream.
> + ///
> + static const bool PRESERVE_STREAM = false;

I don't find these to be very clear names. How about giving them a
name involving 'ownership' like TAKES_OWNERSHIP? "preservation" isn't
really the same as "not deleting when the stream is destroyed" to me.

That's a good idea.

I just took these from what Dan (I think) did in formatted_raw_ostream.
I'll change them here and rename DeleteStream to OwnsStream as well.

> + /// TheStream - The real stream we output to. We set it to be
> + /// unbuffered, since we're already doing our own buffering.
> + ///
> + raw_ostream *TheStream;

Now that I understand the model :slight_smile: I don't see why you need to fiddle
with the underlying buffering of TheStream. In the dbgs() use case,
the underlying stream will be errs() which is already unbuffered.

Might circular_raw_ostream be used with other streams at some point?

> + /// printLog - Dump the contents of the buffer to Stream.
> + ///
> + void printLog(void) {
> + TheStream->write(BufferArray, BufferSize);
> + Cur = BufferArray;
> + }

If the buffer has circled around, won't this print it out-of-order?
If the current pointer is 5 and the length is 10, it seems like this
should do a write([5..9]) then write([0..4])?

Aha! Good catch. This worked differently in our implementation here. I'll
fix it.

> + /// current_pos - Return the current position within the stream,
> + /// not counting the bytes currently in the buffer.
> + virtual uint64_t current_pos() {

This is now const on mainline.

Ok.

> + public:
> + /// circular_raw_ostream - Open the specified file for
> + /// writing.

This is not a file, please read the comments you're copying :slight_smile:

:slight_smile:

> + ///
> + /// As a side effect, if BuffSize is nonzero, the given Stream is
> + /// set to be Unbuffered. This is because circular_raw_ostream
> + /// does its own buffering, so it doesn't want another layer of
> + /// buffering to be happening underneath it.
> + ///

Again, I don't think this should happen anymore. You should add some
comment about what Delete does, and change it to not use 'false' as I
asked for before.

I've cleaned this up, but see my question about usage with other streams.

> + circular_raw_ostream(raw_ostream &Stream, size_t BuffSize = 8192,
> + bool Delete = false)
> + : raw_ostream(/*unbuffered*/true),
> + TheStream(0),
> + DeleteStream(PRESERVE_STREAM),
> + BufferSize(BuffSize),
> + BufferArray(0) {
> + if (BufferSize > 0)

Please change all these predicates to BufferSize != 0 instead of using

Any particular reason? I'll do it but I'm just curious. It seems that if
these things get changed to signed someday it could cause trouble.

> +#include "llvm/Support/circular_raw_ostream.h"
> +
> +#include <algorithm>
> +
> +using namespace llvm;
> +
> +void circular_raw_ostream::write_impl(const char *Ptr, size_t Size) {
> + if (BufferSize > 0) {

Please use:

if (BufferSize == 0) {
   .. write ..
   return;
}

to reduce nesting.

Ok.

> +void circular_raw_ostream::dumpLog(void) {
> + if (BufferSize > 0) {
> + // Write out the buffer
> + const char *msg = "*** Log Output ***\n";
> + int num = std::strlen(msg);
> +
> + TheStream->write(msg, num);
> + printLog();
> + }
> +}

I'm still unclear about this and printLog. The "Log Output" prefix is
confusing to me, because this isn't a log file. Maybe just remove the
prefix?

I wanted something to signify in the output where the debug dump starts.
That's for that case where some stuff might still get output through errs().
I anticipate that while I'll change most uses of errs() to dbgs(), we
probably don't want to change all of them.

I agree the naming here is poor. I'll fix that.

                                   -Dave

Here's the next iteration.

                                       -Dave
Index: include/llvm/Support/circular_raw_ostream.h