Love this! It reinforces my earlier point of asking to be a code owner. Socialisation is hard, and there isn’t necessarily an intersection of code and people skills for people who would be good code owners.
Also, I don’t think we should enforce or require all code owners to have all skills. That’s the whole point of having multiple code owners.
The more requirements we add to the list, the less people will find that they have the union of all items. But if all they need is a sub-set, the whole is the union, it becomes a lot easier (and more productive) to accept being a code owner.
I’m reminded again of a lightning talk from @kbeylsCan reviews become less of a bottleneck?
At that time, the average number of reviews per patch was about 2.5; however, 85% of patch authors did fewer than the average of 2.5 reviews per patch. So, yes, we say anyone can review anything, but perhaps we need to be a little more welcoming to new reviewers. I mean, for a brand-new contributor, sure, they’re still learning the ropes; but for people who are past that point, we should be somewhere between encouraging and expecting reviews.
+1. Every time someone asks me about upstream I always find a way to wedge in this key phrase: “The more you review people’s patches, the more they’ll review yours”.
I’m not sure how effective that is community-wide, but on the new back-ends (m68k, CSKY, LoongArch, SPIRV) where I beat that drum a lot, I think there was enough cross-reviews that made me happy.
+1 that this is an issue. Quite often, somebody makes a proposal, there’s good discussion, and then it just sort of… stops. And it’s very difficult for folks to tell whether that means the proposal was accepted or not.
In general, what I would expect primarily from code owners as the guiding value is ensuring responsiveness.
Responsiveness can take many forms. For example, for code reviews, it can mean personally being responsive to code reviews in their area. It can also mean watching code reviews from others and just making sure that there aren’t any patches that “fall through the cracks”. For RFC discussions, it means helping drive towards a clear “Yes” or “No” (or a “Yes/No, if…”).
I see a common recurring pattern of what people want from code owners, but there’s a fundamental mismatch between some of them.
On one hand, we want to have multiple owners per area, so that we can cover each other when not available. On the other hand, we want owners to be more authoritative in decision making. What if two or more owners don’t agree with each other, but the majority of the community backs one and not the other? How can one call their support base numbers a winning argument in a way that is different to the problem that we have today (re. silent consensus)? What if one is on holidays and decisions are taken without their input?
On one hand, we want owners to have more authority, but on the other hand, we don’t want to create a hierarchy. If code owners’ opinions have more weight than another experienced developer (who isn’t a code owner), how is that not a hierarchy based on status rather than experience?
On one hand, we want owners to have more responsibility, ranging from reviewing a lot of code, making hard decisions and driving diversity into the project. On the other hand, we still have to write code to our parent companies, universities or even other pet projects (for those who do it pro-bono). It’s hard to convince companies to allow their engineers to spend this much time on upstream discussions if their authority isn’t publicly recognised and actionable (like other OSS projects).
Many of the arguments I see in this thread are converging to a model similar to GNU projects, where we have recognised leaders and sub-leaders, owners of particular areas that receive code from owners of their sub-areas and so on, being a full time job. If this is the direction that the community wants to go, then there will be a lot more important things to do than just to find some people to help with code review.
Our community will radically change and that will take a toll on all of our works, upstream and downstream. I’m not saying this will be worse, or better. I’m just saying it will change and we need to understand what people are asking for.
For the record, let me try to answer my own questions with LLVM’s history in mind, trying to describe what I see as the status quo, not necessarily my opinions. But, given that I work with LLVM for a long time, my opinions are quite close (and biased towards) the status quo.
My understanding is that a code owner is responsible for a “piece of code” by performing any combination of:
Guiding developers towards quality in their areas, following the multiple documented guidelines.
Being the last line of defence, if no one has reviewed a patch or responded to an RFC, they:
Add people they think will be able to help
Reply with their own reviews when all else fails
In that order, to encourage community participation
Participate in substantial code review in their areas (in addition to above)
Adding substantial code to their areas without affecting other developers’ contributions
This is particularly important for fast-paced sub-projects
Less so for localised parts, like back-end targets, tools
Being on the lookout for changes in other components and how that’ll affect their parts
Being on the lookout for people that are having trouble contributing and guiding them through
If we just add more parts and more people per part, with the role definition above, we can continue to scale our works with more redundancy and less conflicts. If we want to add more roles to ownership, then we’ll have to think hard about how that’s going to affect this growth.
To me, this is a trivial converging exercise. We split, and see what works, then move around, reassess, until we find the right combination. New areas will show up, but once stable, it should be mostly adding things.
The existing idea is to create a code owner for new targets in LLVM, but other areas are more fuzzy. Other sub-projects don’t even have an OWNERS file. I think here, each sub-community should splice their own ownership, like you did with Clang, with LLVM being a bit of everyone’s job.
New targets, projects, dialects, sanitizers, libraries are easy and already encoded in policy: the people proposing the new code must agree to be the code owners.
Existing areas could have an influx of proposals. I wouldn’t add people based on their contributions, without their explicit consent, but I’d encourage high contributors to apply for ownership (by some RFC or other). Having been asked to do it by the community reduce the fear of being rejected, and is much easier to do. Existing code owners should take their time to ask people around to participate in their areas.
I wouldn’t have more than 2-4 owners per area. Some areas will have more people willing and able to be owner than others, so some areas will have more core contributors than owners, others will probably only be contributed to by owners. No problems there.
When there are more active people for a demonstrable amount of time. Ownership isn’t about making decisions (so far), so losing ownership doesn’t mean you’re less important, just means other people are more active than you are.
If you don’t code but still review patches and participates in discussions, I wouldn’t remove your ownership. But if your last interaction (commit, email, review) was years ago, then I’d move you to the CREDITS part and allow new people to help.
We write a policy and we encourage code owners to ask those people to join.
If we have a clear policy on being a code owner, then people will more likely be encouraged to ask. They could ask first the existing code owners, who would give an idea of how many people work on the project, how many of them could be owners and how hard it would be for people to accept a new owners, etc.
If we had owners actively seeking to step down, look for others, and understand that this isn’t a life title, but it’s based on contributions, then we’d see more shuffles and more people contributing to LLVM at that level.
One final point I think I should add to the discussion is the idea of moving around.
I have worked in very different parts of LLVM (back-ends, code-gen, front-end driver, vectorizer, mlir) but my code ownership is “Arm Linux”. That does not reflect what I do in LLVM for quite a few years, but it’s stuck there.
I think there should also be some thought in any policy we write about moving across ownership boundaries, to recognise people’s work for project wide contributions. I’m not sure how to state this, so I never proposed anything, but it would be interesting to know what other people think of that and how would they make that work.
A lot of these situations feel pretty hypothetical to me, but then again, we’ve never had multiple code owners for the same area before either. Personally, I think of these situations as ones where the default answer is “no changes are made because there’s not clear agreement yet” and the community needs to work together to reach a medium we can all live with. They’re less about code ownership and more about effective collaboration, in some ways.
Hmmm. I think a hierarchy is inevitable because we’re naming some people by name and giving them specific responsibilities. Even with the best of efforts to not create a hierarchy, one is already implied by that.
I think there’s some nuance here worth chatting about. I don’t mean to pick on @beanz or put words into his mouth, but I think the recent incoming support for HLSL is a pretty good case study.
I think it would make perfect sense to add @beanz as the code owner for HLSL given that he’s the driving force behind the project and he has significant experience in providing quality reviews in the project. However, one of the big reasons for upstreaming HLSL work is so that the quality is better than what would be produced internally (at least to some extent) and so I think having a non-@beanz code owner would provide significantly more value to Microsoft as the company backing that work. But we still need @beanz to own the decisions around what is and isn’t a correct HLSL implementation (at least to some extent, because there’s not really a language standard for it).
Personally, I would like to see a requirement that these sort of larger scale projects have two code owners: one from the proposing party and one from the community. Having a code owner from the proposing party who understands the responsibilities and expectations of code ownership (whatever we decide those to be) signals that there’s an expectation of effort beyond “dumping” code onto the community, and having a code owner from the community signals that the community is willing to put in the effort to support the endeavor. If we can’t find code owners who make sense from either of those parties, it’s a signal that the community might not be able to take on that effort at this time.
@AaronBallman +1! (and always feel free to pick on me )
I would be happy to be a code owner for HLSL and take ownership over things that are exclusive to HLSL and relate to how the language is implemented in clang. I’m not really a definitive expert on HLSL, but I do work with a few so I have good channels to get answers when I need them .
To echo Aaron’s point, it is incredibly valuable to our project (and I suspect to the community as a whole) to have feedback from owners of the architectural parts of clang when it comes down to design and implementation details, adequate test coverage, and more. I certainly don’t think a project of this scope can be successful without the support of code owners across clang, and I’ve been greatly appreciative of the support we’ve received.
I think this points out a need for recognizing the responsibilities of overlapping ownership. While it might be appropriate for me to be a code owner for HLSL, it wouldn’t be appropriate for me to blanket approve changes to the AST, even if they are HLSL-specific.
THIS. THIS. THIS. I cannot agree with this more. Yes please.
Until reading this sentence I hadn’t drawn a connection between becoming a code owner and general career growth, but I think this is a problem our industry has as a whole, where the path from college student to “senior whatever” is very nebulous.
What if we make a path? Define roles and progressions. Maybe we have one or a few “Code owners” for an area, but two or three times as many “code curators”, and “curators” could be a clearly step toward being an “owner”.
Personally I learn a lot through following along and being in the room for a conversation. Having a formalized group of people that are maybe not code owners, but in the path to becoming owners in an area might be a good way of getting those people “in the room”.
What do you think about making part of the responsibility of a code owner nominating other people to be “curators” and helping them grow? We shouldn’t require that someone be a code owner to nominate a curator. Anyone should be able to nominate a “curator” and anyone should be able to be a “curator”. Maybe the only requirement for being a “curator” is a GitHub/Phabricator account.
Me too. Do you think having other defined roles that lead to code ownership would help or be a hurdle here? I strongly agree with your point that we need to assume people won’t nominate themselves. The leaders in the community need to be proactive about nominating and generally about recognizing the contributions of others.
I recently jokingly asked for a review from a code owner (and friend) who hasn’t contributed code to LLVM since 2015… I think acknowledging removing ownership isn’t an insult is a big first step.
I think that we should recognize that to make room for new people in leadership roles sometimes other leaders need to be willing to step aside. I really know what the right solution here is. Maybe an “owner emeritus” designation, but I think defining successful code ownership as passing the torch should be part of our perspective.
I remember a few occasions in the past where a code owner has held parts of the codebase hostage, so I very much feel this concern. I wonder if this is something we could take direction from the proposal process that @clattner brought in a few years back. That process hasn’t really been widely used, but I feel like it was a good way to provide a structured way to address disagreements.
I wasn’t removing anything technical but adding to it. It’s listed as the first thing. “Ensuring reviews/code quality”. I want to spell out the other parts of the job that are directly related to being a leader in our community. If we just have code owner’s focus only on the technical, then we won’t have adequate replacements when they need to move on. While some code owners are already doing this, I think it would be good to really go into details on how to achieve the other parts of the job.
I like this idea.
I believe this is happening more than we know. Right now, individuals have no where to turn to for help if it does happen and they just give up. This effectively drives people away from our community.
This is a very interesting idea. I like the idea of recognizing people who are putting in hard work of doing reviews but are not official code owners. This would also help define the process of how code owners provide mentorship.
I believe this could help, but also think we need to be encouraging folks to step into these roles and not rely on self nomination. I also think sometimes its hard to nominate outside your normal network for whatever reason. It would be good to design a process to make sure we are looking at all available candidates at each step. Perhaps having an intermediate position would be helpful for that too.
Maybe it’s as simple as explaining what to do in these situations and follow a similar procedure.
I think you are mis-understanding what I am saying. I’m asking for more transparency and people who are overseeing the process with the goal of transparency and fairness. I’m not talking about making a group of “king-makers” (or queen-makers). We already have a situation that is “who you know” in order to be a code owner or nominating mostly within their organizations or networks. I want LLVM be balanced when it comes to all interests involved. So I’m suggesting we think about the process a bit and brainstorm how we can achieve that end goal.
FWIW, I love the discussion on this thread - we clearly need to evolve here. To +1 a few random points:
I don’t think it is a bad thing to have a hierarchy of reviewers. This provides a natural escalation path if two co-owners have a disagreement and want a new opinion/tiebreaker etc. The codebase is hierarchical already, and overall owners gives coverage to things that otherwise fall through the cracks.
I hope that massive disagreements will be rare, but if they come up, the LLVM decision making process is the established thing for resolving technical discussions in a visible and inclusive way. I consider it a feature that it doesn’t get used very often, but we have it when/if we need it.
I love the idea of formalizing more of our process and intent; codifying the status quo we like and defining new things in a written way. This is far better for newcomers who haven’t been part of the community for a long time, and I really agree it would reduce some of the intimidation factor that could lead to folks not raising their hand.
I really really love to idea of giving more visibility and recognition to contributors to LLVM of all sorts. llvm/CREDITS.TXT only has 115 people listed in it which is laughable. Maybe there are other ways we can give kudos to contributors, e.g. in the llvm release notes, at developer meetings, etc.
@AaronBallman a slight aside on this ownership conversation here are some things I struggle with for us to all think about… (re Code Reviews)
As you know I help out on clang-format, And whilst I appreciate the recognition of being considered an owner, my bigger concern is:
defect backlog (which has grown significantly since the move to github issues), we improved our defect logging speed/capability but did nothing to increase our fixing speed or increased the number of volunteers significantly enough to keep up (although we did pick up on or two more)
We already had 4 key owners in my view (@owenpan , @HazardyKnusperkeks , @mkurdej and myself @MyDeveloperDay), we work well (in my view) as an informal leadership team. I have a lot of respect for these contributors (+ a few key contributors who jump in from time to time). But to be honest we get a lot of contributions from people outside this team which is great but people drive by with a patch, hang around for the first couple of rounds of review then I believe lose the will to live! very few reviews make it past a couple of rounds. Our core team wastes significant amount of their time on these wasted review (to the extent that I find it difficult to find the time to also fix bugs or add new features myself)
This leave us with a massive review backlog (in Needs revision), for which I never see the work ever finished, we often give significant amounts of time to reviews which end up going silent. - example D54628 (last updated 2019)
When ultimately we review back with either rejection or significant changes the contributor walks away leaving the review is a floating “Needs Review/Needs Revision/Changes Planned” state.
But dead reviews often the author is radio silent (won’t respond to ping), we can’t get them to Abandon.
Abandoning a review requires “taking ownership” which makes you look like the author of the review, I don’t want that…I just want to mark them as “dead”, this also feels a bit rude…
We need a much more proactive “Abandonment” policy/mechanism to flush these items out of the system or the reviewers are constantly having to look at these dead items in their lists of what needs reviewing. (can’t tell you how many times I’ve relooked at D33029 - 5 years old)
I’d like to see an automated “Grimreaper” that kills inactive reviews after N months of inactivity. (its not deletion the owner can reopen), I want it to be a “bot” so we depersonalise the “Abandonment”, i.e. it wasn’t the owner that Abandoned it but the author because they didn’t do anything with it.
Any thoughts on setting a “What to expect will happen to your review” policy…
This doesn’t particularly add to the topic of ownership or reviews, but I’d like to just say as someone who only just made a patch, specifically one in clang-format
I did so because it is probably the “lowest stress” subcomponent of the LLVM monolith, and perhaps with the lowest barrier of entry, not requiring prior extensive study of compilers/codegen/optimizations.
I’m sure you know this, which is why you get lots of outsiders (i.e. those with no commit access, such as myself)
Nevertheless, I had a very enjoyable experience and greatly admire the team of 4 aforementioned reviewers (this is really a thank you letter). So I can’t comment on why people “lose the will to live”
I’d also like to mention that when I first read in the contribution guide the daunting task to “find reviewers”, the code owners file wasn’t much use (glad it’s been improved!), but the method I used which isn’t mentioned in the guide was to look for prior, recent accepted reviews in the same area and just copy the reviewers listed there. (“If other people are doing it, surely it must be okay”)
Perhaps this doesn’t apply to all parts of the codebase because I’ve really only looked at one, but it seems to me that using git blame could result in finding people that are no longer active, etc.
Do you have any insight on why you think folks seem to walk away after a few rounds? Like, is it a matter of folks not realizing how much effort is involved and once they find out, they feel like they’re in over their heads and walk away? Or is there something else going on that we should think about addressing so that we retain these folks?
This is an interesting idea, especially if we switch to GitHub (where I believe bots are easier to support than Phab). In general, I think it makes sense to automate this process – the trick will be in finding a good timeframe in which to run it. e.g., reviewers or patch authors may go on vacation and we don’t want the bot to come by and claim something is abandoned when it actually isn’t. But if we were okay with something like six months of no activity, I have a hard time imagining the bot causing hard feelings.