LLDB Coding Style

I brought this up in a thread on lldb-commits, but since it is of more general interest, I want to make a thread here as well.

Can we have clear direction on LLDB coding style? Ideally in the form of an update to lldb.llvm.org, but as that might require a little more effort, even some details in a response to this thread would be a help. Some things I’ve deduced from looking at the code, and other things I’m not so sure about, because of inconsistencies in the code or just no clear rule.

Indentation width: 4
Column limit: 140 (does this apply to comments too? Most function-declaration comments seem to wrap at 80)
Brace style: Allman
if (foo)
{
// code here
}

Break after function return type: Always, only on declarations, only on definitions, only in headers, or never?

Space before function parentheses: When?

Indent case labels inside switch: A or B below?
switch (foo)
{
case A:
case B:
}

Indent braces inside of a case: A or B below?
switch (foo)
{
case A:
{
}
case B:
{
}
}

Any other rules I should be cognizant of?

This is my take on a couple of these - most likely the combination of how the convention was explained to me when I started work on LLDB, and then a few years of me adjusting to it, with varied success.
To be taken with a grain of salt, or two.

I brought this up in a thread on lldb-commits, but since it is of more general interest, I want to make a thread here as well.

Can we have clear direction on LLDB coding style? Ideally in the form of an update to lldb.llvm.org, but as that might require a little more effort, even some details in a response to this thread would be a help. Some things I’ve deduced from looking at the code, and other things I’m not so sure about, because of inconsistencies in the code or just no clear rule.

Indentation width: 4
Column limit: 140 (does this apply to comments too? Most function-declaration comments seem to wrap at 80)

I don’t think there is an explicit column limit. That’s not to be taken as 300 columns being OK, but we are also not as strict as the rest of LLVM on a specific number.
I tend to wrap my function declarations when they have a lot of arguments, not so much for column limits, but for readability.
I also have a penchant for grouping options in option classes, but that’s just my personal taste, not a guideline.

Brace style: Allman
if (foo)
{
// code here
}

I sometimes get away with
if (foo)
oneLineHere();

but yes, braces on separate line - and I believe there actually is a preference for erring on the side of having braces even in the one statement case.

Break after function return type: Always, only on declarations, only on definitions, only in headers, or never?

Space before function parentheses: When?

I do both always. Or rather, the first I do always.
The second, I tend to do it at declarations/definitions - a little less religiously at call sites
As a post-hoc rationalization, that makes it easier to look for function definitions vs usages, i.e. look for ‘Foo (‘ if you want to see where Foo is defined, for ‘Foo(‘ to see where Foo is used
In practice, I think there is enough variety in this regard that this will fail more often than it works

Indent case labels inside switch: A or B below?
switch (foo)
{
case A:
case B:
}

Indent braces inside of a case: A or B below?
switch (foo)
{
case A:
{
}
case B:
{
}
}

Any other rules I should be cognizant of?

Well, the obvious; member variables start with an m_, globals with g_, class and function names start with uppercase letters, and then go SomewhatLikeThisYouSee, member variables go instead m_likeThisForVariables.


lldb-dev mailing list
lldb-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Thanks,
- Enrico
:envelope_with_arrow: egranata@.com :phone: 27683

AFAIK we don’t have a column limit. I certainly don’t enforce one on myself, though I break lines when they don’t “look nice” anymore.
I try to stick with a separator between the function name and the open parenthesis when declaring a function/method but not when calling one. I try to add spaces after “for”, “if”, “while”, “switch,” etc.

In general I prefer to wrap even single line for/else blocks in braces, but in code where pre-existing logic doesn’t have braces, I’ll follow suit. (This is particularly prevalent when using logs: if(log) log->Printf(“…”):wink:

As far as case statements:

switch (foo) {
case A:
{
}
}

is what I’d prefer. It really bugs me when there isn’t a clean staircase of close-braces.

For instance variables of a class, I try to align them on a 4-space tab, and the * in a pointer or & in a reference sticks with the name and extends to the left of the alignment tab, e.g.:

A m_a
B *m_b

Enrico’s point about naming applies here, too: m_ should mean an instance variable and g_ should mean a global (or class-level static).
In general, if I use a std::vector or std::map in a class, I’ll typedef it so that making iterators etc. doesn’t look horrid. Unless the expression on the right hand side has its type explicitly named (typically in a cast, but I’d accept a constructor too) I do not use the “auto” keyword.

Sean

This is my take on a couple of these - most likely the combination of how the convention was explained to me when I started work on LLDB, and then a few years of me adjusting to it, with varied success.
To be taken with a grain of salt, or two.

I brought this up in a thread on lldb-commits, but since it is of more general interest, I want to make a thread here as well.

Can we have clear direction on LLDB coding style? Ideally in the form of an update to lldb.llvm.org, but as that might require a little more effort, even some details in a response to this thread would be a help. Some things I've deduced from looking at the code, and other things I'm not so sure about, because of inconsistencies in the code or just no clear rule.

Indentation width: 4
Column limit: 140 (does this apply to comments too? Most function-declaration comments seem to wrap at 80)

I don’t think there is an explicit column limit. That’s not to be taken as 300 columns being OK, but we are also not as strict as the rest of LLVM on a specific number.
I tend to wrap my function declarations when they have a lot of arguments, not so much for column limits, but for readability.
I also have a penchant for grouping options in option classes, but that’s just my personal taste, not a guideline.

Brace style: Allman
    if (foo)
    {
        // code here
    }

I sometimes get away with
if (foo)
  oneLineHere();

but yes, braces on separate line - and I believe there actually is a preference for erring on the side of having braces even in the one statement case.

Mail probably put in extra spaces in Enrico's reply, should be 4 spaces. If you are going to use curly's, they are "Allman". If you want to do a two liner like this, that's fine, but generally put a blank line after for readability. I don't think we have a prejudice against this, but I try not to do:

if ()
{
    // Something
}
else
   // Something

Because the groupings get harder to parse.

Break after function return type: Always, only on declarations, only on definitions, only in headers, or never?

Always in definitions. There are some header files that don't do this for the declaration, for instance the Thread Plan subclasses where you have to override a bunch of methods to implement the class but the methods you are implementing are boiler plate so taking up a lot of visual space for them isn't useful. But we feel the need for a hard and fast rule I would go with always.

Space before function parentheses: When?

I do both always. Or rather, the first I do always.
The second, I tend to do it at declarations/definitions - a little less religiously at call sites
As a post-hoc rationalization, that makes it easier to look for function definitions vs usages, i.e. look for ‘Foo (‘ if you want to see where Foo is defined, for ‘Foo(‘ to see where Foo is used
In practice, I think there is enough variety in this regard that this will fail more often than it works

I generally do space after unless you are chaining function calls, so:

    x = foo (arguments);

because it helps my eyes parse the function name more easily. But

    x = foo->GetSomething()->GetSomethingElse().CallIt(arguments)

because the spaces in that case break up the flow.

Indent case labels inside switch: A or B below?
    switch (foo)
    {
        case A:
    case B:
    }

Indent braces inside of a case: A or B below?
    switch (foo)
    {
        case A:
        {
        }
        case B:
            {
            }
    }

We don't seem to be very consistent about this. When we started off we were going to always line up the "case" with the curly for the switch, but our usage seems to vary.

Any other rules I should be cognizant of?

Well, the obvious; member variables start with an m_, globals with g_, class and function names start with uppercase letters, and then go SomewhatLikeThisYouSee, member variables go instead m_likeThisForVariables.

We also indicate the the "owning" variables - shared pointer, unique pointer and the like with _sp, _up and though we don't use them much anymore _ap.

And locals should be foo_bar_baz, always lower case.

You will see some places where Sean has to plug into the LLVM code more directly where he'll start coding like LLVM. I think that's a little weird, but it makes sense to him so I'm okay with that.

Jim

One point about this discussion, by the way.

While I support adherence to a consistent style for new/changed code, this should in no way be taken as support for going through and fixing indentation/style on old code.
We have internal branches that become hell to merge when e.g. spacing has been altered subtly, or brace depth is changed…
If we all do our part to clean the parts we’re touching, then I think that will be enough to keep LLDB clean.

Sean

Definitely agree with you on this. I have no plans to go fixup old code, and I don’t think others should either. And if we see a change go through that does attempt to fix up old code, we should block it.

That said, I’m working on a clang-format file that will automate this formatting for us (or at least, for me, should nobody else choose to use it), and I plan to run it against all my changelists before submitting them. Note that this only reformats lines that have already been touched by the CL, so there’s no risk of formatting-only changes making it in.

On the same note, if you have a type Foo and you are frequently going to trade in std::shared_ptr across the codebase, it is a convention to typedef std::shared_ptr FooSP; and then always trade in FooSPs
It is shorter to type, to parse at a glance, and if C++21 ever decided to make a new awesomer smart pointer type, you just redefine the typedef to std::awesome_shared_ptr and you’re good to go!

Hey Zach,

It’d be great if you posted the command line you run to do that. I’m sure there are others (myself included) that would love to see how that works.

Thanks!

-Todd

Please do forward that clang-format file when you have it put together. I would love to have access to that.

Sean

It’s already checked in. It’s not perfect yet, but it’s close. The way I run it against only changes in a CL involves having a chromium checkout on my disk (yea, wtf, i know), and I don’t know how to do it otherwise. Not saying there’s not a way, but I would ask over on cfe-dev or llvm-dev for more info.

Note, we tend to be much more concerned about the naming of things than the aligning of them.

Among other things it should be clear orthographically whether something is an ivar/local/global. Also except for types that come from the system (and a few analogous ones like lldb::addr_t), types & variables should be distinct, which we do by having variables start with lower case letters and types upper case.

And give descriptive names for things. It really helps if I can look at a line of code and figure out from the variable names on the line what the line is doing. A corollary of this is unless the use is very obvious, resist the temptation to use one-letter variable names. We seem to use "s" for streams, but other than that we're pretty good about this.

Another important rule is always call the same thing the same name. That way if I want to look for how something is used, I can search the code for that name & I'll come up with a bunch of useful hits. If you choose a good name for the thing, then reusing it will be obvious. If you call things "a" or "b" or "variable" then you won't be able to do this.

Jim

+djasper

Ok, just for the record, here’s what I do (for context, this discussion is about how to reformat the contents of an existing CL, without touching lines that were not modified by the CL).

  1. Checkout chrome (wtf right? I’m sure there’s a better workflow, I just don’t know what it is. djasper might though, which is why I’ve added him)
  2. set the CHROMIUM_BUILDTOOLS_PATH environment variable to point to d:\src\chromium\src\buildtools (update this for your situation)
  3. Update PATH environment variable to point to the version of git that is in chrome’s depot_tools
  4. Commit some changes to my local git repo, but don’t dcommit them yet.
  5. git cl format

This will look through your commit log and collect all the changes that are not upstreamed yet, and reformat those changes, leaving the result in your working copy. For me, this is usually just a single commit that hasn’t been upstreamed yet, so I just git commit --amend to add the formatting to it.

I’m not sure how to do this without chrome’s “git cl format” command. Daniel?

Use clang-format-diff.py checked into cfe/tools/clang-format.

That appears to only work against a diff though. Which means I would have to generate a diff, format it, reset my HEAD back to before I generated the diff, then re-apply it. Is there nothing that just operates directly on the repo?

No. The diff is the input, but it uses it to determine file ranges and then formats the actual files.

Just to toss out a controversial opinion here, I consider it a bug that LLDB doesn’t follow the documented LLVM coding standard. This only drives a wedge between LLDB and the rest of the LLVM community.

-Chris

I want to strongly, emphatically agree.

Coding standards are most valuable when consistent across a broad, shared
body of code. We need more contributors in LLDB, and one place to get them
is from the existing large pool of LLVM developers. We should be lowering
the barriers there, especially for easy things like coding standards.

Also, as we are actively developing convention, standard, and formatting
tools, the cost of changing this is going down and the value to the
*existing* developers of using the common coding standard is going up.

My biggest issues with the LLVM coding style are:
- no easy way to identify class instance variables. I would prefer to see "m_" or at least "_" as a prefix for all instance variables.

An example that stops us from being able to enable the warning for hidden local variables comes from iterator_range.h:

  iterator_range(IteratorT begin_iterator, IteratorT end_iterator)
      : begin_iterator(std::move(begin_iterator)),
        end_iterator(std::move(end_iterator)) {}

And another:

    DiagStatePoint(DiagState *State, FullSourceLoc Loc)
      : State(State), Loc(Loc) { }

The name of the ivars match the argument names.

- I would prefer type names formatted in a specific way (LLDB uses camel case) and variables in another way (LLDB uses lower case separated by '_'). Without this types, variables, and instance variables all look the same.

> I brought this up in a thread on lldb-commits, but since it is of more general interest, I want to make a thread here as well.
>
> Can we have clear direction on LLDB coding style?

Just to toss out a controversial opinion here, I consider it a bug that LLDB doesn’t follow the documented LLVM coding standard. This only drives a wedge between LLDB and the rest of the LLVM community.

I want to strongly, emphatically agree.

Coding standards are most valuable when consistent across a broad, shared body of code. We need more contributors in LLDB, and one place to get them is from the existing large pool of LLVM developers. We should be lowering the barriers there, especially for easy things like coding standards.

I've been poking my head a bit in llvm/clang recently, and although I much prefer the way we write code in lldb, and think some aspects of the way llvm is coded make it harder to understand than necessary, the fact that it had a different coding convention from lldb's was not a major barrier. If you are going to be around in this business for any length of time you'll have to play in lots of other people's code, and follow the conventions you find there. I guess I've never found that to be a really big deal.

I do agree that the lack of a clear coding standard for lldb does make it harder to submit patches confidently, and we should rectify that. We actually talked about it ourselves when we started lldb, but we didn't write anything up because for a very long time we had no external contributors.

Also, as we are actively developing convention, standard, and formatting tools, the cost of changing this is going down and the value to the *existing* developers of using the common coding standard is going up.

Given that good formatting tools will be amenable to a variety of styles, this doesn't seem a very persuasive argument. Provided there is a style file for any given coding convention, the presence of the tools is style neutral.

Jim

I’ve been largely observing the thread, but just so not all the conversations happen internally, I should point out that I’m encouraging the team to go ahead and formalize coding standards so we can publish them to the community. I’ll strongly discourage ambiguity and guidelines that involve picking one style or another based on “what looks better.” We should make it clear where we expect conformance from submissions. Where our existing code base is inconsistent I strongly encourage hewing to established LLVM convention. We shouldn’t be arbitrarily different, and where we do differ we should be able to describe the specific rationale (as I’m sure Greg will gladly do when it comes to line length and naming conventions.)

I should probably also take a moment to introduce myself! I manage Apple’s LLDB efforts for Chris as well as the performance tools in Xcode. Since taking on the team I’ve been largely focused on our Swift efforts, but I do endeavor to keep up with what’s going on in the broader LLDB community.

Kate Stone k8stone@apple.com
 Xcode Runtime Analysis Tools