Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

Hi everyone,
I would like to start a discussion about enforcing use of clang-tidy (or better clang-tidy-diff) on patches before sending it to review.
I like how clang-format simplified sending patches and reviews, e.g. “Use clang-format” instead of giving long list of lines that should be formatted by reviewer.
I believe that clang-tidy can be also be very helpful here.
Note that by enforcing I mean the same way as we enforce clang-format - It is not about “every changed line need to be formatted by clang-format” because thee might be some special formatting that looks better
formatted by hand, it is about saving time for 99.9% of the lines. The same applies to clang-tidy, where there might be some bugs or false positives, but I would like to save my time as reviewer to not mention
things like “use auto”, “use range loop”, “pass by value” over and over.

But before enforcing clang-tidy, we have to get to consensus on adding new rules to coding style.
I will list clang-tidy checks that I think should be used in LLVM. To make it short I will not justify why it is better to use this style because the documentation of checks is doing good job there.

List of checks that I would like to add to .clang-tidy config
modernize-*

http://clang.llvm.org/extra/clang-tidy/checks/modernize-deprecated-headers.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-loop-convert.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-make-shared.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-pass-by-value.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-redundant-void-arg.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-replace-auto-ptr.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-shrink-to-fit.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-auto.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-bool-literals.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-default-member-init.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-equals-default.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-equals-delete.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nullptr.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-override.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-emplace.html

http://clang.llvm.org/extra/clang-tidy/checks/modernize-use-using.html

I skipped some checks because they work only in C++14.
Coding style right now only mention using auto.
http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

Loops are mentioned here:
http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop - but thi probably should be changed to “use for range loop if possible. Try to not evaluate end() multiple times if for range loop is not possible to use”

I don’t know which of the listed changes sounds controversial to you, so if you would like to start discussion about one of them, then please also mention that you agree/disagree with the others.
One of the things that I would like to propose, is using push_back whenever the inserted type is the same or it implicitly casts itself.

std::vector<Value*> v;
Instr *I;
Value *V;
IntrinsicInstr *II;
v.push_back(I);
v.push_back(V);
v.push_back(II);

Use emplace_back only if we insert temporary object like:

std::vector v2;
v2.emplace_back(a, b, c); // instead of v.push_back(SomeType(a, b, c));

The first reason is make code simple and more readable. I belive that code ‘v.push_back(I)’ is more readable than ‘v.emplace_back(I)’ because I don’t have to think about if the element type has special constructor taking Instr, or it is just vector of Instr.
From yesterday discussion, Daniel Berlin proposed using emplace_back everywhere to make code easier to maintain. I think it is valid argument, but I believe it would reduce readability.
There is also other reason - emplace_back can’t take arguments of some types like bitfields, NULL, static member. Using emplace can also lead to memory leak in case of smart ptrs
"

auto *ptr = new int(1);
v.push_back(std::unique_ptr<int>(ptr));

This is because replacing it with emplace_back could cause a leak of this pointer if emplace_back would throw exception before emplacement (e.g. not enough memory to add new element).".

In the case of push_back/emplace_back for the same type, there should be no performance changes, however emplace_back might generate more template instantiations slowing down compilation.

There is also topic of using insert/emplace for maps showing that map.emplace can be slower than map.insert. I would not want to distinguish between emplace for maps and emplace for vectors,

so I believe using emplace for constructed temporaries, even if there would be some slightly performance regression, looks simpler and more readable.

I am happy to change to write changes to LLVM coding standards document, but I firstly would like to see what community thinks about it.

NOTE that I don’t propose running clang-tidy and modifying whole codebase. It might happen some day, but we firstly need to agree that these changes are better.

There are also other checks from performance and misc that I would like to enable. Some of them are already enabled (all misc), but because they are more about finding bugs, I would like to narrow discussion only to modernize.

Piotr

From yesterday discussion, Daniel Berlin proposed using emplace_back
everywhere to make code easier to maintain. I think it is valid argument,
but I believe it would reduce readability.

Just to be clear; I proposed not trying to switch uses back and forth
without real data, and to come to some agreement about what should be
written in the first place, preferably based on real data, and then,
switching in some organized fashion, not just randomly :slight_smile:

IE Either have a clear coding standard and enforce it, or leave uses alone
one way or the other without some demonstration that it actually matters.

I would *personally* prefer to just use emplace_back everywhere, and not
try to switch between the two, without demonstration of real benefit.

(i'd also accept the other way, use push_back everywhere, and not try to
switch between the two, without demonstration of real benefit).

This preference is based on a simple truth: People suck at deciding when to
use one or the other. Yes, i know how to use it and when to use it. Lots
of our types are cheaply movable, etc, so you probably won't see any
performance difference even when using them "right". People have enough
style/etc issues without having to try to think hard about this.

So that isn't what I would necessarily propose for LLVM.

For LLVM, my answer would be "either make tool do what we want, force use
of tool, or be consistent and obvious with what we want and then fix it if
it leads to performance issue"

There is also other reason - emplace_back can't take arguments of some
types like bitfields, NULL, static member. Using emplace can also lead to
memory leak in case of smart ptrs
"

auto *ptr = new int(1);v.push_back(std::unique_ptr<int>(ptr));

This is because replacing it with emplace_back could cause a leak of this
pointer if emplace_back would throw exception before emplacement (e.g.
not enough memory to add new element).".

This seems, IMHO, not a useful argument for LLVM since it does not try to
use exception based error handling to recover.

From yesterday discussion, Daniel Berlin proposed using emplace_back
everywhere to make code easier to maintain. I think it is valid argument,
but I believe it would reduce readability.

Just to be clear; I proposed not trying to switch uses back and forth
without real data, and to come to some agreement about what should be
written in the first place, preferably based on real data, and then,
switching in some organized fashion, not just randomly :slight_smile:

IE Either have a clear coding standard and enforce it, or leave uses alone
one way or the other without some demonstration that it actually matters.

I would *personally* prefer to just use emplace_back everywhere, and not
try to switch between the two, without demonstration of real benefit.

(i'd also accept the other way, use push_back everywhere, and not try to
switch between the two, without demonstration of real benefit).

This preference is based on a simple truth: People suck at deciding when
to use one or the other. Yes, i know how to use it and when to use it.
Lots of our types are cheaply movable, etc, so you probably won't see any
performance difference even when using them "right". People have enough
style/etc issues without having to try to think hard about this.

So that isn't what I would necessarily propose for LLVM.

For LLVM, my answer would be "either make tool do what we want, force use
of tool, or be consistent and obvious with what we want and then fix it if
it leads to performance issue"

Sure, that's why we have clang-tidy. I propose using push_back everywhere

except when temporary is passed, where using emplace_back would simplify by
not having to write type name.
modernize-use-emplace can find places like push_back(Type(a, b, c)) and
transform it into emplace_back(a, b, c);
I hand someone a patches for LLVM and clang that
1) modifies every push_back to emplace_back
2) modifies every emplace_back to push_back (if it is valid)
3) modifies every push_back(Type(a, b)) to emplace_back(a, b);

I actually wanted to do this last time at google because you have some
framework to test performance of clang, but I didn't have enough time.
I only have compared time of running tests vs running tests after 3), and
the results were almost the same.
That's why I think that performance is not concern here.
So if someone with such framework would like to help me, then we could
check if there are any performance wins.

There is also other reason - emplace_back can't take arguments of some
types like bitfields, NULL, static member. Using emplace can also lead to
memory leak in case of smart ptrs
"

auto *ptr = new int(1);v.push_back(std::unique_ptr<int>(ptr));

This is because replacing it with emplace_back could cause a leak of
this pointer if emplace_back would throw exception before emplacement
(e.g. not enough memory to add new element).".

This seems, IMHO, not a useful argument for LLVM since it does not try to
use exception based error handling to recover.

But there is still problem with initializer lists, bit-fields, static

members and nulls. When I was testing this check it actually found many of
these cases in LLVM (specially bit fields and initializer lists).

I’m a bit confused by this whole discussion.

clang-format is neither mandated (by documentation) nor enforced (by any infrastructure/automation) for use in the LLVM project that I know of.

It’s convenient, and in review people may reasonably ask authors to consider running it, etc - but we have no system that requires or checks for that. Might be nice, might not be.

It sounds like even if clang-format were mandated - we mostly accept that such a mandate is forward-looking, and that we don’t want/intend to reformat the entire existing codebase. So I imagine the same would apply to clang-tidy (we wouldn’t expect to tidy the entire codebase - we’d just use clang-tidy as advisory for any new changes as we do with clang-format) - so I don’t think it’d run up against Danny’s protest about choosing a philosophy for wide scale changes, because there would be no wide scale changes.

All that said, I think like clang-format we should probably have a project-wide config that specifies which checks we care about - but the real impediment to adoption of those checks is a streamlined process for running them on a patch. clang-format has nice integration with my editor (vim) and source control system (git-clang-format). I have yet to discover an equally convenient place to put clang-tidy in my workflow. Without that I doubt it’ll see very much adoption within the LLVM community.

  • Dave

I agree that “people suck at deciding”, me in the first place in the line, and that’s why I like clear and good guideline and stick to it unless "demonstration of real benefit”.
I also think we can have clear guideline that are different from “always use emplace_back” or “always use push_back”, like Piotr is suggesting.

That said, I’m not convinced the relative “ugliness" of emplace_back (I’m not sure what’s ugly there, I only see it as a habit to take) vs push_back is enough to justify a set of rules to decide between the two, I’d be fine with “always using emplace_back” in the name of “simple rule is better when possible".

+1
Ultimately whatever we do is not practical if it can’t be done by a tool.

This, and also we should not write code like that (naked pointers, calling new), using make_unique<> for instance would prevent any such situation. Passing a raw pointer to a container of unique_ptr can be flagged by a tool.

I'm a bit confused by this whole discussion.

clang-format is neither mandated (by documentation) nor enforced (by any
infrastructure/automation) for use in the LLVM project that I know of.

It's convenient, and in review people may reasonably ask authors to
consider running it, etc - but we have no system that requires or checks
for that. Might be nice, might not be.

It sounds like even if clang-format were mandated - we mostly accept that
such a mandate is forward-looking, and that we don't want/intend to
reformat the entire existing codebase. So I imagine the same would apply to
clang-tidy (we wouldn't expect to tidy the entire codebase - we'd just use
clang-tidy as advisory for any new changes as we do with clang-format) - so
I don't think it'd run up against Danny's protest about choosing a
philosophy for wide scale changes, because there would be no wide scale
changes.

All that said, I think like clang-format we should probably have a
project-wide config that specifies which checks we care about - but the
real impediment to adoption of those checks is a streamlined process for
running them on a patch. clang-format has nice integration with my editor
(vim) and source control system (git-clang-format). I have yet to discover
an equally convenient place to put clang-tidy in my workflow. Without that
I doubt it'll see very much adoption within the LLVM community.

+1

I don't remember there being a discussion analogous to the "enforcement"
aspect of thread for clang-format. The uptake of clang-format was organic.

Piotr, what is different about the situation with clang-tidy (vs
clang-format) that we need to have any discussion of "enforcement" in this
thread? I think it's totally reasonable to have a discussion of which
checks we want to run by default for the "LLVM style" (to use
clang-format's terminology), and this thread should probably focus on that.

wrt adding rules to the coding standards, I don't think we want to clutter
the coding standards document with a specific rule for every clang-tidy
check we want to have on by default for the "LLVM style", the same way that
we don't have a rule in the coding standards document for every
clang-format config value which happens to be set in the "LLVM style" (i.e.
all the stuff you see when you do -dump-config). Just the main ones is fine.

-- Sean Silva

I agree that “people suck at deciding”, me in the first place in the line, and that’s why I like clear and good guideline and stick to it unless "demonstration of real benefit”.
I also think we can have clear guideline that are different from “always use emplace_back” or “always use push_back”, like Piotr is suggesting.

I don’t think Piotr is suggesting “always use push_back” but “always use push_back when it’s valid” & I’d second this.

To show a simpler example:

std::unique_ptr u(v);

std::unique_ptr u = v;

I’d generally advocate for using copy init (the second form - it doesn’t copy in this case, it moves) over direct init (the first form) because it’s more legible - copy init can only execute implicit ctor operations, whereas direct init can invoke implicit and explicit operations. So from a readability perspective seeing the latter is easier than seing the former - I know the operation is further constrained to simpler/less interesting operations (eg: ‘v’ can’t be a raw pointer in the second form, it might be (& then I have to worry about whether that’s an ownership transfer that’s intended), etc)

& push_back V emplace_back is the same choice - push_back means only implicit ops are used and so it’s more readable, emplace_back requires more careful attention.

I think this is a reasonably good stylistic rule - but I’m happy leaving an open to judgment for sure - there might be readability reasons that an emplace_back may work better in some cases.

I’d think of this like “else after return” - we don’t have any enforcement, sometimes we make judgment about it being better (for consistency - sometimes there’s a pattern such that else makes the code more readable), we haven’t gone back to do any cleanup of the codebase, but it’s pretty consistently applied/desired in code review.

All that said - sure, I’d like to have tools to help me get this right (else after return and “prefer copy init style over direct init”) & make that really easy to do.

  • Dave

That said, I’m not convinced the relative “ugliness" of emplace_back (I’m not sure what’s ugly there, I only see it as a habit to take) vs push_back is enough to justify a set of rules to decide between the two, I’d be fine with “always using emplace_back” in the name of “simple rule is better when possible".

+1
Ultimately whatever we do is not practical if it can’t be done by a tool.

Not sure I follow this - we have lots of things we don’t currently do with a tool that’s still part of our style & stuff we try to catch with code review, etc. (else after return is my best example)

This, and also we should not write code like that (naked pointers, calling new), using make_unique<> for instance would prevent any such situation. Passing a raw pointer to a container of unique_ptr can be flagged by a tool.

Sure - but it generalizes further than that, it’s just a nice simple example.

std::vector v(i);

‘i’ could be an int & this would make a vector of that many ints, or it could be another vector & this is making a copy, etc.

std::vector v = i;

I know taht’s not creating a vector of ‘i’ many ints.

Yeah, not a great example either - but the general idea applies, the latter form is less powerful & so is easier to read because you don’t have to worry/think about it as much.

I think we mostly do this sort of thing out of habit anyway - but codifying it (& if possible, tools to help) could be nice.

I agree that “people suck at deciding”, me in the first place in the line, and that’s why I like clear and good guideline and stick to it unless "demonstration of real benefit”.
I also think we can have clear guideline that are different from “always use emplace_back” or “always use push_back”, like Piotr is suggesting.

I don’t think Piotr is suggesting “always use push_back” but “always use push_back when it’s valid” & I’d second this.

Define “valid”? Just that it will compile?

To show a simpler example:

std::unique_ptr u(v);

std::unique_ptr u = v;

I’d generally advocate for using copy init (the second form - it doesn’t copy in this case, it moves) over direct init (the first form) because it’s more legible - copy init can only execute implicit ctor operations, whereas direct init can invoke implicit and explicit operations. So from a readability perspective seeing the latter is easier than seing the former - I know the operation is further constrained to simpler/less interesting operations (eg: ‘v’ can’t be a raw pointer in the second form, it might be (& then I have to worry about whether that’s an ownership transfer that’s intended), etc)

& push_back V emplace_back is the same choice - push_back means only implicit ops are used and so it’s more readable

I don’t see what’s more readable about “only implicit ctor”. Emplace is very explicit that we intended to construct an object, I don’t understand what hurts readability here?

, emplace_back requires more careful attention.

I think this is a reasonably good stylistic rule - but I’m happy leaving an open to judgment for sure - there might be readability reasons that an emplace_back may work better in some cases.

I’d think of this like “else after return” - we don’t have any enforcement, sometimes we make judgment about it being better (for consistency - sometimes there’s a pattern such that else makes the code more readable), we haven’t gone back to do any cleanup of the codebase, but it’s pretty consistently applied/desired in code review.

I have a different impression: we are actively cleaning the codebase with tidy-checks. For instance look for git log --author=Zelenko.
Another example is that a few months ago the entire LLDB codebase has been formatted with clang-format, after marking the specific places where it was not desirable (tables for instance) with comment to disable clang-format.

I agree that “people suck at deciding”, me in the first place in the line, and that’s why I like clear and good guideline and stick to it unless "demonstration of real benefit”.
I also think we can have clear guideline that are different from “always use emplace_back” or “always use push_back”, like Piotr is suggesting.

I don’t think Piotr is suggesting “always use push_back” but “always use push_back when it’s valid” & I’d second this.

Define “valid”? Just that it will compile?

Yep

To show a simpler example:

std::unique_ptr u(v);

std::unique_ptr u = v;

I’d generally advocate for using copy init (the second form - it doesn’t copy in this case, it moves) over direct init (the first form) because it’s more legible - copy init can only execute implicit ctor operations, whereas direct init can invoke implicit and explicit operations. So from a readability perspective seeing the latter is easier than seing the former - I know the operation is further constrained to simpler/less interesting operations (eg: ‘v’ can’t be a raw pointer in the second form, it might be (& then I have to worry about whether that’s an ownership transfer that’s intended), etc)

& push_back V emplace_back is the same choice - push_back means only implicit ops are used and so it’s more readable

I don’t see what’s more readable about “only implicit ctor”.

It limits the set of operations that can be performed by the code. So when I read it there’s less I have to think about/consider. (& specifically, the implicit operations tend to be simpler/safer/more obvious - copy/move or operations that are similar - not complex/interesting things like “taking ownership from a raw pointer” or “creating a vector of N elements”)

Emplace is very explicit that we intended to construct an object, I don’t understand what hurts readability here?

Going back to the example above, given the following two lines:

std::unique_ptr u(foo());
std::unique_ptr u = foo();

(& the equivalent: emplace_back(foo()) V push_back(foo()) for a vector of unique_ptr)

the copy init/push_back are simpler to read because they aren’t as powerful - I don’t have to wonder if something is taking ownership there (or if I’m creating a vector of N ints, etc). I know it’s a simple/obvious operation, generally (because others shouldn’t be implicit).

, emplace_back requires more careful attention.

I think this is a reasonably good stylistic rule - but I’m happy leaving an open to judgment for sure - there might be readability reasons that an emplace_back may work better in some cases.

I’d think of this like “else after return” - we don’t have any enforcement, sometimes we make judgment about it being better (for consistency - sometimes there’s a pattern such that else makes the code more readable), we haven’t gone back to do any cleanup of the codebase, but it’s pretty consistently applied/desired in code review.

I have a different impression: we are actively cleaning the codebase with tidy-checks.

I think that’s where Danny is pushing back - I’m on the fence about that. So I’d push back on the automated cleanup (& have done so on several occasions as these sort of patches have been sent) & encourage efforts to provide the tools (like clang-tidy integration for patches and editors) to avoid mistakes going forward in an advisory (rather than mandatory) way.

For instance look for git log --author=Zelenko.
Another example is that a few months ago the entire LLDB codebase has been formatted with clang-format, after marking the specific places where it was not desirable (tables for instance) with comment to disable clang-format.

nod Only because it was so far out of LLVM’s style (by design originally). LLVM’s generally “close enough” that the cleanup isn’t desired/intended.

I’d say the same for this particular instance - implicit V explicit ctor calls. Mostly we write them using copy init, not direct init. I suspect push_back/emplace_back is similar (if only due to history - and perhaps due to my/other code review feedback encouraging push_back when possible in reviews that might otherwise be going to more emplace_back) so I’m not sure I care too much about the cleanup.

I think the issue Danny was getting at - which I agree - is unless we have good tools in place to not make these mistakes again going forward (I think clang-format has sort of reached that point - it’s got easy integration in editors and source control systems) then there’s limited merit in doing the cleanup. +1 to that.

If we have good integration for clang-tidy, then I’d be more OK with doing cleanup.

Either way (good or not integration) - if we have some people who have ways of using clang-tidy in their development process, I’d say it’s worth having a project-wide clang-tidy config. I’d vote for putting “use implicit-only ops (like copy init and push_back) over explicit ops (like direct init and emplace_back) where both compile” in that list.

  • Dave

Somewhat unfortunately, we have two discussions here:

  • Clang-tidy has checks that might improve code – should we deploy them? If so which?

I’ll address this in a separate email, and focus this one on:

  • Should we have coding standards around ‘push_back’ and ‘emplace_back’, and if so, what should they say?

I think the amount of confusion makes a coding standard useful.

As for what it should say, I tend to agree with Dave here. In particular:

I agree that “people suck at deciding”, me in the first place in the line, and that’s why I like clear and good guideline and stick to it unless "demonstration of real benefit”.
I also think we can have clear guideline that are different from “always use emplace_back” or “always use push_back”, like Piotr is suggesting.

I don’t think Piotr is suggesting “always use push_back” but “always use push_back when it’s valid” & I’d second this.

Define “valid”? Just that it will compile?

Yep

To show a simpler example:

std::unique_ptr u(v);

std::unique_ptr u = v;

I’d generally advocate for using copy init (the second form - it doesn’t copy in this case, it moves) over direct init (the first form) because it’s more legible - copy init can only execute implicit ctor operations, whereas direct init can invoke implicit and explicit operations. So from a readability perspective seeing the latter is easier than seing the former - I know the operation is further constrained to simpler/less interesting operations (eg: ‘v’ can’t be a raw pointer in the second form, it might be (& then I have to worry about whether that’s an ownership transfer that’s intended), etc)

& push_back V emplace_back is the same choice - push_back means only implicit ops are used and so it’s more readable

I don’t see what’s more readable about “only implicit ctor”.

It limits the set of operations that can be performed by the code. So when I read it there’s less I have to think about/consider. (& specifically, the implicit operations tend to be simpler/safer/more obvious - copy/move or operations that are similar - not complex/interesting things like “taking ownership from a raw pointer” or “creating a vector of N elements”)

Emplace is very explicit that we intended to construct an object, I don’t understand what hurts readability here?

Going back to the example above, given the following two lines:

std::unique_ptr u(foo());
std::unique_ptr u = foo();

(& the equivalent: emplace_back(foo()) V push_back(foo()) for a vector of unique_ptr)

the copy init/push_back are simpler to read because they aren’t as powerful - I don’t have to wonder if something is taking ownership there (or if I’m creating a vector of N ints, etc). I know it’s a simple/obvious operation, generally (because others shouldn’t be implicit).

This is exactly where I come down as well.

Another useful way to think about it is “what do I need to understand to understand the semantics of this operation”.

In order to understand push_back, I need only read its documentation. The type going in will have to have value semantics (potentially with moves). Otherwise it will be an error. And push_back’s documentation says what it does.

In order to understand a given emplace_back call I have to both read its documentation and the documentation for all of the constructors on the type.

Still another way to see the consequence of this is to look at the nature of compiler errors when a programmer makes a mistake.

With emplace_back, if you fail to call the constructor correctly, all of the error messages come with a layer (or many layers) of template instantiation. With push_back(T(…)), the constructor call is direct and the error messages are as well.

With emplace_back, if you are converting from one type to another and the conversion fails, you again get template backtraces. With push_back, all the conversions happen before the push_back method and so the error is local to your code.

Dave pointed out that I didn’t complete one aspect of my argument on the push_back vs. emplace_back:

Thanks for very accurate responses.

  • I totally agree what Dave and Chandler said about explicit and implicit operations, this is what I meant my first email.
    I believe there are places like
    v.emplace_back(A, B);
    istead of
    v.push_back(make_pair(A, B));b
    That can make code simpler. I think in cases like this we can leave it for judgement of contributor.
    Having said that I think the case Chandler and Dave mentioned should be in LLVM style guide.
    @Daniel does this argument seem good enough to prefer push_back instead of emplace_back?

  • About clang-tidy config (Dave):
    We already have clang-tidy config in LLVM and clang - check “.clang-tidy” file. I feel tho that people don’t use clang-tidy as often as clang-format.
    This might be because clang-tidy didn’t have enough features that they would like. I believe that clang-tidy is doing much better and could really save a lot of time for reviewers and contributors.
    But to make it happen we either have to change LLVM style guide, so we won’t need to discuss it over and over in reviews, or gain clang-format trust - it is so good people don’t question it.

  • About enforcing (Sean)
    Sorry, I used wrong word here. I meant to use it the same way as clang-format, to trust people they send cleaned code by clang-tidy the same way we trust them that they used clang-format.
    Reviewer can then ask contributor to run clang-tidy instead of listing things that contributor should change. We are a little far from having mandatory running of clang-format and clang-tidy on every patch automatically.

  • About workflow (Dave)
    If I remember correctly there is some integration for clang-tidy for vim (YCM?). There is also clang-tidy-diff, but maybe we should have git-clang-tidy to make it even simpler.

Thanks for very accurate responses.

  • I totally agree what Dave and Chandler said about explicit and implicit operations, this is what I meant my first email.
    I believe there are places like
    v.emplace_back(A, B);
    istead of
    v.push_back(make_pair(A, B));b
    That can make code simpler.

Do you have examples? The only ones i can come up with are the ones where the push_back variant literally can’t compile because the type isn’t movable.

Perhaps it would be useful to break down categories of can happen here…

Case 1: there is one object already – this is a conversion of a type.

  • If the author of the conversion made it implicit, then ‘v.push_back(x)’ just works.
  • If the author of the conversion made it explicit I would like to see the name of the type explicitly: ‘v.push_back(T(x))’.

Case 2a: There is a collection of objects that are being composed into an aggregate. We don’t have any interesting logic in the constructor, it takes an initializer list.

  • This should work with ‘v.push_back({a, b, c})’
  • If it doesn’t today, we can fix the type’s constructors so that it does.
  • Using ‘emplace_back’ doesn’t help much – you still need {}s to form the std::initializer_list in many cases. Pair and tuple are somewhat unusual in not requiring them.

Case 2b: A specific constructor needs to be called with an argument list. These arguments are not merely being aggregated but are inputs to a constructor that contains logic.

  • This is analogous to a case called out w.r.t. ‘{…}’ syntax in the coding standards1
  • Similar to that rule, I would like to see a call to the constructor rather than hiding it behind ‘emplace_back’ as this is a function with interesting logic.
  • That means i would write T(a, b, c) anyways, and ‘v.push_back(T(a, b, c))’ works.

Case 3: Passing objects of type ‘T’ through ‘push_back’ fails to compile because they cannot be copied or moved.

  • You must use ‘emplace_back’ here. No argument (obviously).

My experience with LLVM code and other codebases is that case 3 should be extremely rare. The intersection of “types that cannot be moved or copied” and “types that you put into containers” is typically small.

Anyways, I don’t disagree with this point with a tiny fix:

I think in cases like this we can leave it for judgement of contributor.

or reviewer. ;]

I continue to think exceptions can be made in rare cases when folks have good reasons. But I expect this to be quite rare. =]

Thanks for very accurate responses.
- I totally agree what Dave and Chandler said about explicit and implicit
operations, this is what I meant my first email.
  I believe there are places like
    v.emplace_back(A, B);
  istead of
    v.push_back(make_pair(A, B));b
  That can make code simpler.

Do you have examples? The only ones i can come up with are the ones where
the push_back variant literally can't compile because the type isn't
movable.

Perhaps it would be useful to break down categories of can happen here...

Case 1: there is one object already -- this is a *conversion* of a type.
- If the author of the conversion made it *implicit*, then
'v.push_back(x)' just works.
- If the author of the conversion made it *explicit* I would like to see
the name of the type explicitly: 'v.push_back(T(x))'.

Case 2a: There is a collection of objects that are being composed into an
aggregate. We don't have any interesting logic in the constructor, it takes
an initializer list.
- This should work with 'v.push_back({a, b, c})'
- If it doesn't today, we can fix the type's constructors so that it does.
- Using 'emplace_back' doesn't help much -- you still need {}s to form the
std::initializer_list in many cases. Pair and tuple are somewhat unusual in
not requiring them.

This sounds extremely reasonable.

Case 2b: A specific constructor needs to be called with an argument list.
These arguments are not merely being aggregated but are inputs to a
constructor that contains logic.
- This is analogous to a case called out w.r.t. '{...}' syntax in the
coding standards[1]
- Similar to that rule, I would like to see a *call to the constructor*
rather than hiding it behind 'emplace_back' as this is a function with
interesting logic.
- That means i would write T(a, b, c) anyways, and 'v.push_back(T(a, b,
c))' works.

Calling emplace_back with 0 or multiple arguments is a clear way of saying

"this constructor takes multiple arguments".
We can do it with initializer list with easy way like:
v.emplace_back() == v.push_back({})
v.emplace_back(a, b ,c) == v.push_back({a, b, c})

I personally never liked the initializer syntax because of tricky casees
like:

vector<string> v{{"abc", "def"}};
Which is equivalent of
vector<string> v = {std::string("abc", "def")};
That will call std::string ctor with 2 iterators likely crashing, and
putting same string might gives us empty string.

In this case programmer probably meant
std::vector<std:string> v({"abc", "def"});
or
std::vector<std::string> v = {"abc", "def"};

But this case is not possible to mess up with push_back (in the case of
vector<vector<string>> or something). At least I hope it is not.
So avoiding braces is my personal preference. It is fine for me if we would
choose to prefer 'v.push_back({a, b, c})' instead of 'v.emplace_back(a, b,
c)', ofc as long as most of the community would prefer first form to the
second :slight_smile:

[1]: http://llvm.org/docs/CodingStandards.html#do-not-

use-braced-initializer-lists-to-call-a-constructor

Case 3: Passing objects of type 'T' through 'push_back' fails to compile
because they cannot be copied or moved.
- You *must* use 'emplace_back' here. No argument (obviously).

My experience with LLVM code and other codebases is that case 3 should be
extremely rare. The intersection of "types that cannot be moved or copied"
and "types that you put into containers" is typically small.

Anyways, I don't disagree with this point with a tiny fix:

I think in cases like this we can leave it for judgement of contributor.

*or reviewer*. ;]

I continue to think exceptions can be made in rare cases when folks have
good reasons. But I expect this to be quite rare. =]

Piotr

Are there any other comments about changing style guide?
I would like to add points like

  • prefer "using’ instead of “typedef”
  • use default member initialization
    struct A {
    void *ptr = nullptr;
    };

(instead of doing it in constructor)

  • use default, override, delete
  • skip “virtual” with override

The last point is to get to consensus with

push_back({first, second})
or
emplace_back(first ,second);

- prefer "using' instead of "typedef"
- use default member initialization
- use default, override, delete
- skip "virtual" with override

I thought we had all of those already...

The last point is to get to consensus with

push_back({first, second})
or
emplace_back(first ,second);

I second Chandler's opinion on this.

cheers,
--renato

> - prefer "using' instead of "typedef"
> - use default member initialization
> - use default, override, delete
> - skip "virtual" with override

I thought we had all of those already...

Nope, some people use it, but I still see a lot of new code with typedefs.

I would like to have it written in style guide so it will be easier to
convince to change in review.

> The last point is to get to consensus with
>
> push_back({first, second})
> or
> emplace_back(first ,second);

I second Chandler's opinion on this.

2:1 for push_back. From commits I also see that most of the people prefer
push_back, but I will wait a little bit with the verdict.

Are there any other comments about changing style guide?
I would like to add points like

  • prefer "using’ instead of “typedef”
  • use default member initialization
    struct A {
    void *ptr = nullptr;
    };

(instead of doing it in constructor)

  • use default, override, delete
  • skip “virtual” with override

The last point is to get to consensus with

push_back({first, second})
or
emplace_back(first ,second);

It might be a bit noisy, but I’d be inclined to start a separate thread for each of these on llvm-dev with a clear subject line relevant to each one. So they don’t get lost and some of them don’t get drowned out by the discussion of others, etc.

Are there any other comments about changing style guide?
I would like to add points like

- prefer "using' instead of "typedef"
- use default member initialization
struct A {
  void *ptr = nullptr;
};

(instead of doing it in constructor)

- use default, override, delete
- skip "virtual" with override

The last point is to get to consensus with

push_back({first, second})
or
emplace_back(first ,second);

It might be a bit noisy, but I'd be inclined to start a separate thread
for each of these on llvm-dev with a clear subject line relevant to each
one. So they don't get lost and some of them don't get drowned out by the
discussion of others, etc.

Sure, I can do it, but at least now I don't see much of attention of people
to any of the subject, so I would not like to start 5 empty threads.
I guess as long as the thread is not getting noisy I will keep only one.
Does it sound ok?

Perhaps - if no one else pipes up shrug

If that’s the case, I’d at least suggest submitting the changes to the style guide separately with clear subject/titles so people reading commits might see/notice/discuss there.

FWIW: I’m also in favor of push_back where valid. Though I’m not sure it’s so much a matter of votes, but justification, etc. As for the others - sure, they all sound good to me.

Also - once these are in the style guide there’s still a separate discussion to be had about whether automated cleanup is worthwhile & how best to go about that sort of thing.