LLVM maintainers, code reviews

Hi Chris and Daniel,

Hi Jan,

These are good points! I'm cc'ing llvmdev, because this is pertinent to the larger community.

I was happy to be able to talk with both of you during the Dev Meeting. Today I
started to think a bit more about code reviews, LLVM and the future. From my
perspective the major thing that sets LLVM apart from other compiler
infrastructures is the attention put on the design to make things work the right
way, both within a module/library and between them. For example, I don't see
projects like LLDB being possible unless the modularity and code quality is a
major focus.
Given that, there are very few people that actually understand the underlying
design principles and ideas behind the code. Why is something done a certain
way, why were other alternatives not as good or not viable? With the popularity
of LLVM increasing all the time and new developers wanting to contribute more
and more code it seems that code reviews will become more of a bottleneck. In
GCC this is not so much of a problem, because there are lot of maintainers that
do code reviews. The reviews are basically enforcing the coding standard, but
the design is very rarely considered. This is perhaps why GCC is a "Big Ball of
Mud" (I'm sure you have read it, but if not here's the link
http://www.laputan.org/mud/). LLVM will not go down the same path as GCC in this
regard, but since the code only exposes the "final" decision on some design and
not the ideas behind it or how it may be used in the future. It seems to me that
it is a lot more difficult for someone from the outside to do a review that
truly reflects the ideas behind the code because they do not have all this
implicit information. In some cases it is not that difficult if the task is
simply to follow an existing pattern, e.g. if a new optimization pass is added,
but often new code make some subtle change to the design that may or may not be
desirable.

Yeah, I completely agree. Getting people to write documentation in general is difficult, documenting "bad ideas that were rejected" is even harder. I'm hoping that the blog will be a good way to document design decisions and other stuff like this though. We do have one decent example so far:
http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html

I'd like to do more similar ones in the future, and encourage other contributors to also write other blog entries in general.

I was wondering what the plan is for new "code reviewers" or "maintainers"? Do
you think they are needed soon, or can it wait some time? It may be that LLVM is
moving forward fast enough today, but eventually new people will have to start
doing code reviews.

Technically, we have something like 150 code reviewers, because anyone with commit access is allowed to review a patch :slight_smile: The trick is to get people to step up more. This is happening a lot more lately than it did even 2-3 years ago, but it certainly could always be better :slight_smile:

Do you think it is viable to just let people work on LLVM
long enough and they will pick up on the designs and ideas and be able to do
reviews, or do you think it is necessary to have a deeper kind of knowledge
transfer for people to do it correctly? My fear is that is that bad changes that
may be fairly small can easily start to pollute the code base and if people
don't realize it, it may require big re-factoring jobs to clean it up. At least
a lot of re-factoring happens today, which is a good, but as LLVM grows and if
there is not a consistent view between the people that do the reviews it is more
likely that design bugs/incompatibilities will get into the code. Have you had
any thoughts or concerns about this?

The big issue historically has been abandonware code: the Itanium backend for example never had reviewers because noone was actively maintaining it. This was easy enough to let people just slam in any patches that they want, because it couldn't break anything important :slight_smile:

The JIT is another interesting case. It is basically abandonware, suffers from poor design (because it was a very early component that didn't have the newfangled infrastructure to build on) and is very difficult to test (particularly for things like JIT debug or unwind info). Because of that, I personally have stayed away from reviewing patches to it. Your work on mc'izing the jit is a light at the end of this tunnel, which will hopefully allow us to shoot the old JIT in the head when the new one is available :slight_smile:

I personally think that there are a several key aspects of our development process that contribute to ensuring that LLVM maintains its high quality over time:

1) peer review (review after commit is enough if people are willing to revert and if people actually do read the patches and speak up).
2) automated testing
3) components that are not on-by-default can be held to lower standards while they are being implemented. If they work out and it seems like a good idea to turn them on by default, they should be given a really hard once-over.
4) continuous refactoring and improvement of the codebase
5) being willing to throw away old stuff that noone is using or maintaining, or replace things that are broken and terrible.

Of course it remains to be seen how this works out over the years, but we've managed about a decade so far without anything going too horribly wrong yet :slight_smile:

-Chris

I'd like to do more similar ones in the future, and encourage other contributors to also write other blog entries in general.

Hi Chris,

After discussing these topics in the dev meeting, I also have some input.

First, blog posts are great for communicating changes and maybe
outlining design decisions, but these decisions change over time, and
the posts become obsolete. On the other hand, HTML documents, while
good for release process, are hard to change and keep it up-to-date.

As I said before, I'm looking for a (or writing a new) script to
convert from the Wiki to LLVM-HTML style, and I think we should use
the Wiki more often to keep not only the documents up-to-date, but
also the design decisions, libraries and how-tos.

I personally think that there are a several key aspects of our development process that contribute to ensuring that LLVM maintains its high quality over time:

1) peer review (review after commit is enough if people are willing to revert and if people actually do read the patches and speak up).

This process can take months to apply and generally impact on the
productivity of the team that sent the patch in the first place. We
had a few cases where the patch was never looked upon until someone
hit the same problem and happened to find it in the llvm-commit list.

Also, without a clear design roadmap for the various parts of LLVM, it
falls too much in the hands of the reviewers to check that everything
is correct, interact with the developer, explain the design decisions
(or not, and complicate things even more).

Peer review is fundamental, but itself alone cannot deal with a higher
volume of patches, probably not even with the current volume, and
increasing the number of committers can have other unexpected side
effects in code quality (they themselves may not know all design
decisions).

3) components that are not on-by-default can be held to lower standards while they are being implemented. If they work out and it seems like a good idea to turn them on by default, they should be given a really hard once-over.
4) continuous refactoring and improvement of the code base
5) being willing to throw away old stuff that no one is using or maintaining, or replace things that are broken and terrible.

IMHO, these are the three key points that are maintaining the high
quality of LLVM and should be kept forever. As much as I disliked
having the union type removed, it was a good design decision.

The same should go for back-ends. It's good to have a long list of
supported back-ends, but if they start hampering the evolution of the
rest, they should be turned off by default or even moved to a separate
patch/project outside the main distribution.

In a nutshell, the design decisions should be communicated more
effectively, and a Wiki is a great place to start. Peer reviewers
should communicate via the Wiki, so patchers could learn and plan
before the next iteration and reduce the cost for everybody.

cheers,
--renato

You may consider using a review tool rather than (or in addition to)
llvm-commit. For instance:

   http://www.reviewboard.org/

With this tool you can assign review responsibility of certain parts of
the code to different reviewers, so if somebody submits a patch of some
source code files, the responsible reviewers will automatically get a
review request in their inbox.

I'd like to do more similar ones in the future, and encourage other contributors to also write other blog entries in general.

Hi Chris,

After discussing these topics in the dev meeting, I also have some input.

First, blog posts are great for communicating changes and maybe
outlining design decisions, but these decisions change over time, and
the posts become obsolete.

Right. Jan was specifically asking about the "why things were decided and what the tradeoffs were", which is an inherently a "moment in time" sort of thing. IMO, this is perfect for a blog post. Blog posts have the advantage that they are explicitly dated.

On the other hand, HTML documents, while
good for release process, are hard to change and keep it up-to-date.

It's actually not any harder to change and keep up to date than anything else. HTML pages have the advantage of showing up in a recursive grep of the sourcebase, and showing up in llvm-commits so they fall into the peer review process.

As I said before, I'm looking for a (or writing a new) script to
convert from the Wiki to LLVM-HTML style, and I think we should use
the Wiki more often to keep not only the documents up-to-date, but
also the design decisions, libraries and how-tos.

I personally really dislike the wiki. It is more difficult to keep up to date, isn't organized well, and has no inherent power over HTML. If you're liking the Wiki because you don't have commit access to the LLVM repo (making it harder to update the docs there) then we should grant you commit access.

I see the wiki as mainly useful for people collaboratively sketching out very early stuff. I personally never look at the wiki and have no idea if anything on there is up to date or useful. Looking now, I see a lot of stuff that should be folded into (for example) CodeGenerator.html after getting reviewed.

I personally think that there are a several key aspects of our development process that contribute to ensuring that LLVM maintains its high quality over time:

1) peer review (review after commit is enough if people are willing to revert and if people actually do read the patches and speak up).

This process can take months to apply and generally impact on the
productivity of the team that sent the patch in the first place. We
had a few cases where the patch was never looked upon until someone
hit the same problem and happened to find it in the llvm-commit list.

If noone responds after a week, please ping the patch. One major issue with a distributed group of reviewers is that everyone assumes that someone else will look at it. Pinging helps with this.

3) components that are not on-by-default can be held to lower standards while they are being implemented. If they work out and it seems like a good idea to turn them on by default, they should be given a really hard once-over.
4) continuous refactoring and improvement of the code base
5) being willing to throw away old stuff that no one is using or maintaining, or replace things that are broken and terrible.

IMHO, these are the three key points that are maintaining the high
quality of LLVM and should be kept forever. As much as I disliked
having the union type removed, it was a good design decision.

The same should go for back-ends. It's good to have a long list of
supported back-ends, but if they start hampering the evolution of the
rest, they should be turned off by default or even moved to a separate
patch/project outside the main distribution.

Yes, I completely agree. The itanium and msil backends got axed not too long ago :slight_smile:

In a nutshell, the design decisions should be communicated more
effectively, and a Wiki is a great place to start. Peer reviewers
should communicate via the Wiki, so patchers could learn and plan
before the next iteration and reduce the cost for everybody.

I agree, I just don't think the wiki is the right forum for it :slight_smile:

-Chris

FYI, several developers already use Rietveld for code reviews: http://codereview.appspot.com/

I’d rather focus on one that we already have a base of familiarity with if we’re going to use something more than email.

I'm a big fan of putting such things *in the code* as comments. It
does logically go with that bit of code, after all.

For some things, yes. But describing design decisions in comments gets
either incomplete and scattered or too verbose and scattered. :wink:

I once had to do maintenance on a Python script that had 3x more
comments than code. I never read them...

It's actually not any harder to change and keep up to date than anything else. HTML pages have the advantage of showing up in a recursive grep of the sourcebase, and showing up in llvm-commits so they fall into the peer review process.

Hi Chris,

I understand that this is more of a personal preference than a
technical discussion...

Diff-ing HTML is not easy and enforcing style even less. Most Wiki
have revision control (albeit, different than your main svn repo),
good diffing and collaboration tools.

It's true that both HTML and Wiki tend to become out-dated if no one
maintains it and a blog post is outdated by definition, but explicitly
done so (dated).

I agree that blog posts are good to convey decisions made, but would
be good to have it immortalized somewhere (HTML or Wiki).

If you're liking the Wiki because you don't have commit access to the LLVM repo (making it harder to update the docs there) then we should grant you commit access.

That's not the point. The story goes like this...

Me, Devang and Talin were discussing the document I've written about
debug information. Both sent me lots of corrections, typos,
suggestions, in the body of the email, which I had to change the HTML
doc and re-send them. Every iteration they would find things, suggest
others, until Talin said: we should put this in the Wiki!

While this is pretty much what you said (draft writing on wiki), there
are lots of things that will remain draft quality for a long time,
like how to produce debug information.

So, as you said, people should write drafts in Wiki and more finalized
documents in HTML (a I concur), Wikis are good to start documents, and
HTML are good to immortalize them. So, encouraging people to start
docs in the wiki and having a way to convert them to HTML (I found
some Python scripts in Google code) would be a good thing.

Later additions, when the HTML doc is already stable, could be
manageable via SVN diff.

If no one responds after a week, please ping the patch. One major issue with a distributed group of reviewers is that everyone assumes that someone else will look at it. Pinging helps with this.

Ok, will do.

>>> I'd like to do more similar ones in the future, and encourage other
contributors to also write other blog entries in general.

>>
>> Hi Chris,
>>
>> After discussing these topics in the dev meeting, I also have some input.
>>
>> First, blog posts are great for communicating changes and maybe
>> outlining design decisions, but these decisions change over time, and
>> the posts become obsolete.
>
> Right. Jan was specifically asking about the "why things were decided and
what the tradeoffs were", which is an inherently a "moment in time" sort of
thing. IMO, this is perfect for a blog post. Blog posts have the advantage
that they are explicitly dated.

I'm a big fan of putting such things *in the code* as comments. It
does logically go with that bit of code, after all.

I agree that a blog is probably one of the better ways of documenting these
things. A lot of discussions happen on the mailing lists and irc, and the person
who is supposed to do the work (and probably initiated the discussions) could
gather all the information and post it so that everyone can get the same
information. With the proper tagging it is easy to find posts related to a
specific module/library. Having a collective journal for the projects makes it
possible to see if the code has diverged from the original design. It is useful
to have a record of how people were thinking about the code and not only what
they did to it. A blog post that is dated it is more likely to be treated with
some skepticism if it very old, compared to Wiki and comments in code, which
give the illusion they are up-to-date and accurate. That does not mean I am
against comments in the code, but I like comments to explain more detailed
things about the implementation. The only thing accurate is of course the code
itself. We don't want to write blogs or comments unless it is needed, and
finding a good balance is difficult.

- Jan

It's actually not any harder to change and keep up to date than anything else. HTML pages have the advantage of showing up in a recursive grep of the sourcebase, and showing up in llvm-commits so they fall into the peer review process.

Hi Chris,

I understand that this is more of a personal preference than a
technical discussion...

Diff-ing HTML is not easy and enforcing style even less. Most Wiki
have revision control (albeit, different than your main svn repo),
good diffing and collaboration tools.

It's true that both HTML and Wiki tend to become out-dated if no one
maintains it and a blog post is outdated by definition, but explicitly
done so (dated).

I agree that blog posts are good to convey decisions made, but would
be good to have it immortalized somewhere (HTML or Wiki).

It's true that this is partially about personal preference, but it goes beyond that. The wiki doesn't fit into the social infrastructure we have for the project: changes to it aren't reviewed on a -commits list, people without commit access can modify it, etc. Depending on your perspective, this is a good thing or bad thing. One datapoint is that UIUC refuses to host wiki.llvm.org specifically because they have policies against hosting web sites with no control over the content.

Beyond the social aspects, the wiki doesn't fit into our release infrastructure either. llvm/docs is snapshotted with each release and archived. This helps with changing APIs and other stuff, which means that you can read about LLVM 1.0's alias analysis stuff at http://llvm.org/releases/1.0/docs/AliasAnalysis.html if you really want to :slight_smile: Because the wiki is not in SVN, it can't realistically do this.

If you're liking the Wiki because you don't have commit access to the LLVM repo (making it harder to update the docs there) then we should grant you commit access.

That's not the point. The story goes like this...

Me, Devang and Talin were discussing the document I've written about
debug information. Both sent me lots of corrections, typos,
suggestions, in the body of the email, which I had to change the HTML
doc and re-send them. Every iteration they would find things, suggest
others, until Talin said: we should put this in the Wiki!

While this is pretty much what you said (draft writing on wiki), there
are lots of things that will remain draft quality for a long time,
like how to produce debug information.

As I mentioned, I don't have a problem with using the wiki as a collaborative development tool, so long as useful docs migrate back to the official html docs.

So, as you said, people should write drafts in Wiki and more finalized
documents in HTML (a I concur), Wikis are good to start documents, and
HTML are good to immortalize them. So, encouraging people to start
docs in the wiki and having a way to convert them to HTML (I found
some Python scripts in Google code) would be a good thing.

Later additions, when the HTML doc is already stable, could be
manageable via SVN diff.

Right. My next objection is that the wiki isn't being used just for collaborative development. For example, the "common backend tasks", "faq", "using llvm" and other stuff should be merged into the official docs. I'm fine keeping wishlist and external project lists on the wiki.

-Chris

Ok, I think we both converged to the same point. :wink:

I'll test the three scripts I found to convert wiki to HTML and try to
fit LLVM's CSS.

If it's any good, we can find the right places to put them in SVN.

cheers,
--renato

Great, thanks a lot Renato!

-Chris