logging in lldb

Sorry, I have been derailed today by trying to make format member
detection work more reasonably. I plan to come up with the next
iteration of the proposal soon, but I am not sure if it will be today,
as it is quite late here already. I do think this needs to be done in
incremental steps, but I'd like to make the iterations slightly bigger
to avoid code churn.

pl

Hello again,

I have updated the example code to reflect some of the results from
this discussion. I think we are generally converging XXX, I am going
to try to summarize it here, to see if we have any issues left to
discuss.

- everyone seemed to prefer the formatv-based api, so that should be
the main API going forward. Printf will need to be maintained until
all users are migrated. I have updated by example to use that.

- there were concerns about performance characteristics of this in the
case logging is disabled. I'd like to reiterate that this is one of my
design goals, and the new proposed API has the same overhead (checking
for nullness of a local variable) as the current one.

- Related to the previous item, Jim was worried about the usage of a
macro will hide the fact that macro will hide the fact that the
performance impact of the logging is small. I replied that anyone
caring about performance that much will be familiar with this already.
I also think this is no different than what a lot of other logging
systems do already -- llvm's DEBUG(), assert(), boost::log all
evaluate their arguments only if they are "enabled" (whatever that
meant exactly), even though the call presents itself as a simple
macro.
I am not sure whether this topic is closed. Does anyone want to add
anything to this?

- there was a discussion about automatically adding log source
information. My original proposal added it unconditionally. Greg
suggested we make that optional (similar to how we optionally prepend
timestamp, thread id, ...). Jim was worried that adding the extra data
will make hamper readability of the logs. I disagreed.
Having the source information be optional sounds like a good idea (i'd
propose to make it on by default, as a log of log lines don't make
sense without it: "__FUNCTION__ called with signal %d" and similar). I
am not sure whether this answers Jim's concerns though. Jim, what do
you say to that? If this would be enough (disabling source information
on a global level), then great. If not, then we can come up with a way
to suppress this information on a per-call-site basis. Something like:
LLGB_LOG_SUPP(log, LogInfo::Source, ...)
where the second argument would be the list of things to suppress for
this log line.

BTW, I have done some statistics on a semi-randomly chosen file
ThreadPlanCallFunction.cpp: This file contains 19 logging statements.
Of those, 10 print the class name, 8 print the class and function name
(of those, 7 print the *wrong* function name), and there is only two
that do not print any source information. One of these prints a table
of registers, so it could so it could suffer from readability
problems. However, it does this by printing all registers to a
temporary stream, and then printing that stream. Since the prepending
would only add the source before the first line of the log, this could
be another way to sidestep this problem (just output "\n" before your
multi-line string). All the other log statements in this file would
benefit from standardizing on the source printing IMO, as that would
mean we can actually align things correctly, and visually separate the
repetitive part (which is already present most of the time), and
variable part.
What do you think of that?

- Chris suggested merging lldb and llvm logging infrastructure. I have
given this some thought, and I think it is possible, although it will
make the implementation more tricky. I will write a separate email
about that, but I want to give it more thought (and also, the proposal
might depend on the exact set of requirements we converge on here).

- anything else I missed?

Cheers,
pl

(of those, 7 print the wrong function name)

I think this is the best argument for automating the source information as much as possible. Refactoring moves stretches of code into new functions. Log lines get copy and pasted. Like comments, the fixed text in the log lines easily gets out of date, which can make reading a log even harder.

If people want the ability to opt-out of source info on certain log lines, I could live with that. But I favor default-on and automatically generated source info.

There were additional comments on the lldb-commits thread. I am moving
them here and responding:

A couple of thoughts / two cents.
I don't mind the current if (log) { ... } style of logging, even with the PRIx64's and having to do filepath.GetPath().c_str() and all that. I like being able to do extra work in a if (log) block of code -- create a SymbolContext for something or whatever, and as I'm debugging that section of code I like to be able to put a breakpoint there and look more closely. If this is all in a preprocessor macro, a breakpoint is going to stop me on the if() and then I'm trying to step through a macro? And if my computation is down in a lambda I need to figure out how to put a breakpoint in that or step into that somehow to look at the values being computed. How do I create a SymbolContext my_symctx(...); and then include that in what I log if I have to do all of this in the formatv arguments?

I am not proposing we completely eliminate the ability to do have
blocks of code that run only when logging is enabled. I recognize that
there are situations where that is necessary. However, 99% of our
current logging code does not do that, and I am trying to reduce the
footprint of that. So, I don't think you have to worry about this use
case.

I'm not thrilled with the formatv reinvention of format specification. The printf formatters are a bizarre little invention, but it's a bizarre invention that we've all worked with for decades and everyone who works in C knows well. The formatv format specification may be superior, but I don't want to see innovation here, I want clarity with the linga franca that every C programmer knows. I think the barrier to adopting something non-standard is very high.
The formatv format specification string seems to explicitly mention the ordering of the arguments, like llvm::formatv("{0} {1} {2}", first, second, third). I'm guessing this allows you to print the arguments in different order than they appear? Like llvm::formatv("{0} {2} {2} {1}", first, second, third)? What's the benefit of this vrs. the uncertainty of which arguments are used in different parts of the string (short of counting them out by hand)? If I have a formatv format specification like "{0} {1} {2} {3} {4}" and I want to insert an argument between 2 and 3, I need to renumber 3 and 4 now? Or do I do "{0} {1} {2} {5} {3} {4}" and throw my argument on to the end?
Simply *ability* to put the arguments in any order is my concern. Of course I'm never going to do it, but I'm going to have to work on code where other people have done it.

I think being able to refer to arguments directly i'a useful feature
to have in some circumstances (printf also allows it, although with a
syntax I always have to look up when I need it), but I agree it should
be used sparingly. I don't think anyone will put the arguments out of
order just for the fun of it. We can make that an official policy
somewhere, if you're worried about it. I guess one of the reasons for
the simplicity of the design was to only support the direct-reference
versions (I admit I originally also though I could just write {}, and
get the arguments in order).

FWIW, if I had a printf statement with more than five arguments, I'd
also have to count them by hand to make sure I got them matched right,
especially if PRIx64's get involved.

I don't personally want to see the file / function / line numbers in my logging. It's fine if that's an option that other people can enable if they want it. There's going to be a growth in the size of our text for the constant strings for every log point having its filename and function, but it will probably not be significant (maybe the strings can be coalesced, I wouldn't bet one way or the other.)

I am not sure what what be a good way to try this out, but I wouldn't
be surprised if the overall size effect of this was a reduction in
binary size -- the thing is, most of our logging statements contain
this information already, but it's a part of the format string, so it
definitely cannot be coalesced. If they were referenced as __FILE__,
any sane compiler/linker will merge those references into a single
string constant.

I am not sure which part of the code base you are working with mostly,
but in my experience most code adds file/function information already
(RegisterContextLLDB seems to be the exception), and you have no way
of turning it off. Currently we are going toward making that optional,
so this would be an improvement in that regard as well.

If formatv could take something a LOT closer to printf, I'd be better with it. Something like the python string formatters would be a real improvement - just %d for base 10 numbers, not having to worry about the size of the argument in the format string, for instance, and would still be readable & understandable by any C programmer out there.

Funnily enough, the inspiration for the syntax actually did come from
python, although not the % operator, but format() function (which is
also the "new" way to do formatting in python). It was chosen because
it is more powerful. I personally wouldn't go too overboard like some
of the proposals floated around here, but I think in the common cases,
the syntax is simple enough -- instead of %d you write {0}. If you
happen to want hex, you write {0:x}. And the ability to format a
StringRef without jumping through massive hoops is a killer feature
for me.

If we're going to start using this alternate logging scheme, eventually there will be a push to use it everywhere, or someone will go through all the other classes in the code base and switch it out without consultation. We need to all sign on to this idea before it goes into the code base at all.

This is why we are discussing it here. Does my response alleviate your concerns?

Personally I'm not really a fan of source/line information at all. There is only a very small probability that anyone using a released LLDB (e.g. through Xcode) is going to have their log lines match up with ToT, because we're changing files all the time. You change one thing on line 10, and now every other log line that comes after it, potentially tens of thousands of lines in the same file, are going to have their line information out of sync.

This is the reason I propose to not embed (and assume the reason that
the current log lines don't do it either) *line* information into the
logs. However, file+function are changing much less frequently, and
they are helpful, as often you just want to say "I've reached this
place, with these variable values". This way, you don't have to write
the first part out explicitly. Even if the function name changed
during a refactor, it's still likely you will be able to find it under
a similar name.

cheers,
pl

I’m not thrilled with the formatv reinvention of format specification. The printf formatters are a bizarre little invention, but it’s a bizarre invention that we’ve all worked with for decades and everyone who works in C knows well. The formatv format specification may be superior, but I don’t want to see innovation here, I want clarity with the linga franca that every C programmer knows. I think the barrier to adopting something non-standard is very high.

While everyone knows it well, somehow it’s still complicated enough that its usage is riddled with errors. I can’t even count the number of bugs I’ve fixed related to incorrect usage of format specifiers.

The formatv format specification string seems to explicitly mention the ordering of the arguments, like llvm::formatv(“{0} {1} {2}”, first, second, third). I’m guessing this allows you to print the arguments in different order than they appear? Like llvm::formatv(“{0} {2} {2} {1}”, first, second, third)? What’s the benefit of this vrs. the uncertainty of which arguments are used in different parts of the string (short of counting them out by hand)? If I have a formatv format specification like “{0} {1} {2} {3} {4}” and I want to insert an argument between 2 and 3, I need to renumber 3 and 4 now? Or do I do “{0} {1} {2} {5} {3} {4}” and throw my argument on to the end?

Either / or would be fine. BTW, this is not the primary advantage of having the arguments be numbered. The primary advantage is that it allows you to print the same argument multiple times without computing it. formatv(“{0} {0} {0}”, 7); will print “7 7 7”.

If formatv could take something a LOT closer to printf, I’d be better with it. Something like the python string formatters would be a real improvement - just %d for base 10 numbers, not having to worry about the size of the argument in the format string, for instance, and would still be readable & understandable by any C programmer out there.

I don’t think familiarity with C is a compelling argument. We’re using C++ here, and the more C-isms we can get rid of the better, as long as there is a compelling advantage.

BTW, you mentioned Python, but incidentally the %d %s style of string formatting in Python is not recommended (being considered the “old” way), and is replaced with a “new” way which, it turns out, is almost identical to the syntax of formatv (along with C#, Python is where I took inspiration from in designing it)

Aside from the points mentioned above, formatv has some other nice advantages.

  1. Instead of silently compiling your code and then segfaulting when you pass a type that printf doesn’t know how to format, formatv will report a helpful compiler error.

  2. It allows you to write formatters for user-defined types, so that instead of re-inventing this every time you write a format line, and potentially passing many arguments to printf, you can pass one argument to formatv, and have consistency in every single log line where you ever try to print this.

Aside from not being familiar to C programmers, I believe it to be better in every possible way.

I guess I should elaborate. The real primary advantage is simply not specifying the format specifier at all. When you have to match the format specifier to the type of the argument you’ve already lost the battle, because you’ve introduced a source of programmer error. And not just a hypothetical source either, considering the number of bugs I’ve fixed in incorrect usage of format specifiers.

Chiming in over here to say that I put up a revision which adds formatv support to various parts of llvm. Along with this, I’ve provided a formatter for FileSpec so that we can now print FileSpecs as easily as we would print integers, with various options for printing directory, filename, etc.

I think it helps to see some concrete instances of how this can help us, rather than just focusing on the unfamiliar syntax. Greg LGTM’ed it, but since I know there was at least one concern over on this side, I’ll wait a few days to see if the objections still remain after seeing it in action. I’ll be checking here and the code review thread for comments.

Hello again,

I'd to get back to the question of unifying llvm's and lldb's logging
mechanisms that Chris asked about. In the way these two are
implemented now, they have a number of similarities, but also a number
of differences. Among the differences, there is one that I think will
be most painful to resolve, so I'll start with that one:

I am talking about how to disable logging at compile-time. Currently,
llvm's logging mechanism can be controlled both at runtime and
compile-time. lldb's can be only controlled at runtime. While we may
not want/need the compile-time knob, it is a very hard requirement for
llvm, which tries to squeeze every ounce of performance from the
hardware. So, if we are going to have a common logging API, we will
need to support being compiled without it.

This has impact on the kind of syntax we are able to use. I see two
problems here.

1. The first one is that our log patterns are split into independent
parts. Currently the pattern is:
Log *log = GetLogIf(Flag);
...
if (log) log->Printf(...);

The API we (mostly?) converged to above is:
Log *log = GetLogIf(Flag);
...
LLDB_LOG(log, ...);

If we want to compile the logging away, getting rid of the second part
is easy, as it is already a macro. However, for a completely clean
compile, we would need to remove the first part as well. Since
wrapping it in #ifdef would be too ugly, I think the easiest solution
would be to just make it go away completely.

The way I understand it, the reason we do it in two steps now is to
make the code fast if logging is off. My proposal here would be to
make the code very fast even without the additional local variable. If
we could use the macro like:
LLDB_LOG(Flag, ...)
where the macro would expand to something like:
if (LLVM_UNLIKELY(Flag & lldb_private::enabled_channels)) log_stuff(...)
where `enabled_channels` is just a global integral variable then the
overhead of a disabled log statement would be three instructions
(load, and, branch), some of which could be reused if we had more
logging statements in a function. Plus the macro could hint the cpu
and compiler to optimize for the "false" case. This is still an
increase over the status quo, where the overhead of a log statement is
one or two instructions, but I am not sure if this is significant.

2. The second, and probably bigger, issue is the one mentioned by
Jason earlier in this thread -- the ability to insert random code into
if(log) blocks. Right writing the following is easy:
if (log) {
  do_random_stuff();
  log->Printf(...);
}

In the first version of the macro, this is still easy to write, as we
don't have to worry about compile-time. But if we need this to go
away, we will need to resort to the same macro approach as llvm:
LLDB_DEBUG( { do_random_stuff(); LLDB_LOG(...); });
Which has all the disadvantages Jason mentioned. Although, I don't
think this has to be that bad, as we probably will not be doing this
very often, and the issues can be mitigated by putting the actual code
in a function, and only putting the function calls inside the macro.

So, my question to you is: Do you think these two drawbacks are worth
sacrificing for the sake of having a unified llvm-wide logging
infrastructure? I am willing to drive this, and implement the llvm
side of things, but I don't want to force this onto everyone, if it is
not welcome. If you do not think this is a worthwhile investment then
I'd rather proceed with the previous lldb-only solution we discussed
above, as that is something I am more passionate above, will already
be a big improvement, and a good stepping stone towards implementing
an llvm-wide solution in the future.

Of course, if you have alternative proposals on how to implement
llvm-wide logging, I'd love to hear about it.

Let me know what you think,
pavel

Just to let you know, I will be on vacation until the end of the year,
so probably will not respond to any comments until January. If you
have any concerns, do let me know, as I'd like to get started when I
get back.

pl

Hi Pavel, sorry for not keeping up with the thread, I've been super busy all this week. I'm not going to object to where this proposal has ended up. I personally have a preference for the old system but not based on any justifiable reasons.

Yeah, I’m with Jason. I don’t find the current state of things hard to read or work with.

The proposed solutions seem a little uglier to me. I try to avoid macros, to me it makes it look like the code is shouting at you, giving the log lines more prominence then they deserve in the overall flow of the code. And all upper case is harder to read. If accumulating the log involves substantial code and the code is all in macros it won’t be debuggable, which is a minor shame. But it’s not a huge deal since you can move the the code out of the log statement temporarily if you need to debug it.

I don’t see the need to work on this, but if it bugs you and you have the time to do it, the moieties weigh evenly enough that I can’t see a reason to object.

Jim

Happy new year everyone. :slight_smile:

I have refreshed the implementation at
<Login; with something more polished. I
consider this almost-ready, I just need feedback on a couple of
details:

- log->Warning/Debug/Error/Verbose: Currently we have a large number
of printf-style logging functions, which are very rarely used (and all
they do is prefix the log line with some string), and only one
formatv-style. It would be easy to add LLDB_WARN, LLDB_ERR, etc.
macros, so that these usages can be ported to the formatv-style.
However, I am not sure whether that's really necessary. One of the
goals of this process could be to standardize on one function. I would
prefer to keep just one log function, but I could be easily convinced
otherwise. For reference, these are the current approximate usages of
the logging functions (as determined by grep)
log->Error: 38 (mostly in ProcessWindows)
log->Warning: 25
log->Debug: 10 (mostly ClangExpressionParser)
log->Verbose: 6 (only ProcessWindows)
log->Printf: 2884

- ProcessWindowsLog: I am not going to go on a reformatting spree to
change all logging statements, but I would still like to change remove
the logging macros that windows log introduced, since they are very
custom, and I'd like to keep the number of logging syntaxes below 3
:). Zachary, Adrian, is that OK with you?

Yeah, I’m with Jason. I don’t find the current state of things hard to read or work with.

The proposed solutions seem a little uglier to me. I try to avoid macros, to me it makes it look like the code is shouting at you, giving the log lines more prominence then they deserve in the overall flow of the code. And all upper case is harder to read. If accumulating the log involves substantial code and the code is all in macros it won’t be debuggable, which is a minor shame. But it’s not a huge deal since you can move the the code out of the log statement temporarily if you need to debug it.

As it stands now, you'll still be able to write if(log) blocks. I
would also recommend this style for consideration:
LLDB_LOG(log, "current complex state is: {0}", GetCurrentComplexState());
- it does not interfere with the flow of the containing function, as
it is succinct, and you can still step through the complex function.

I don’t see the need to work on this, but if it bugs you and you have the time to do it, the moieties weigh evenly enough that I can’t see a reason to object.

I must confess have a bit of an ulterior motive for doing this. :slight_smile: My
goal is to start working on the modularization of the codebase. But I
don't want it to be just mindless moving of code around. I want to
spend some time considering the design of each piece, which should
slow things down long enough for the moves to not be annoying, and
hopefully produce better components. I'll write more on that soon - I
don't want to derail this discussion with it.

And yeah, the current logging state has been bugging me for a long time. :wink:

cheers,
pl

Happy new year everyone. :slight_smile:

I have refreshed the implementation at
<https://reviews.llvm.org/D27459> with something more polished. I
consider this almost-ready, I just need feedback on a couple of
details:

  • log->Warning/Debug/Error/Verbose: Currently we have a large number
    of printf-style logging functions, which are very rarely used (and all
    they do is prefix the log line with some string), and only one
    formatv-style. It would be easy to add LLDB_WARN, LLDB_ERR, etc.
    macros, so that these usages can be ported to the formatv-style.
    However, I am not sure whether that’s really necessary. One of the
    goals of this process could be to standardize on one function. I would
    prefer to keep just one log function, but I could be easily convinced
    otherwise. For reference, these are the current approximate usages of
    the logging functions (as determined by grep)
    log->Error: 38 (mostly in ProcessWindows)
    log->Warning: 25
    log->Debug: 10 (mostly ClangExpressionParser)
    log->Verbose: 6 (only ProcessWindows)
    log->Printf: 2884

Yea, these are unnecessary and it would be great to get rid of them.

  • ProcessWindowsLog: I am not going to go on a reformatting spree to
    change all logging statements, but I would still like to change remove
    the logging macros that windows log introduced, since they are very
    custom, and I’d like to keep the number of logging syntaxes below 3
    :). Zachary, Adrian, is that OK with you?

Fine with me, they were only introduced as an alternative to the verbose if (log) syntaxes. If there’s an alternative that also solves the problem and we can adopt everywhere, then I’m all for it.

I must confess have a bit of an ulterior motive for doing this. :slight_smile: My
goal is to start working on the modularization of the codebase. But I
don’t want it to be just mindless moving of code around. I want to
spend some time considering the design of each piece, which should
slow things down long enough for the moves to not be annoying, and
hopefully produce better components. I’ll write more on that soon - I
don’t want to derail this discussion with it.

I’d start with one support library which we document and write in stone (perhaps even enforce in CMake) that it can have no other dependencies on LLDB libraries. Then start moving various common data structures and classes over there. Like you said, we should save the details for another thread, but in any case i look forward to seeing some proposals :slight_smile:

Happy new year everyone. :slight_smile:

I have refreshed the implementation at
<Login; with something more polished. I
consider this almost-ready, I just need feedback on a couple of
details:

- log->Warning/Debug/Error/Verbose: Currently we have a large number
of printf-style logging functions, which are very rarely used (and all
they do is prefix the log line with some string), and only one
formatv-style. It would be easy to add LLDB_WARN, LLDB_ERR, etc.
macros, so that these usages can be ported to the formatv-style.
However, I am not sure whether that's really necessary. One of the
goals of this process could be to standardize on one function. I would
prefer to keep just one log function, but I could be easily convinced
otherwise. For reference, these are the current approximate usages of
the logging functions (as determined by grep)
log->Error: 38 (mostly in ProcessWindows)
log->Warning: 25
log->Debug: 10 (mostly ClangExpressionParser)
log->Verbose: 6 (only ProcessWindows)
log->Printf: 2884

Yea, these are unnecessary and it would be great to get rid of them.

- ProcessWindowsLog: I am not going to go on a reformatting spree to
change all logging statements, but I would still like to change remove
the logging macros that windows log introduced, since they are very
custom, and I'd like to keep the number of logging syntaxes below 3
:). Zachary, Adrian, is that OK with you?

Fine with me, they were only introduced as an alternative to the verbose
if (log) syntaxes. If there's an alternative that also solves the problem
and we can adopt everywhere, then I'm all for it.

Fine with me as well.