[Code Review] Reminder about links in code & commit messages

AaronBallman · July 5, 2023, 8:52pm

As a reminder to folks doing code reviews (as well as folks authoring patches), please be sure to check links in code, tests, and commit messages to ensure they’re appropriate. We shouldn’t be adding dead links (and we should update any dead links we find along the way) nor should we be adding links to private resources like internal bug trackers. Links should be to resources that are publicly available for anyone contributing unless there’s a good justification for using a private link in addition to the comment itself (e.g., linking to a private WG21 meeting wiki to provide more context for a comment).

Note, I plan to remove approx 2k rdar:// links in the near future to help clean the code base up a bit. That’s the only systemic use of private links that I’m aware of – these links were less problematic 10+ years ago when Clang was coming mostly out of Apple, but they’re not helpful information for the community at large any longer.

blangmuir · July 5, 2023, 11:50pm

Hi @AaronBallman, is there an existing policy that would cover links to internal bug trackers? I wasn’t able to find anything skimming the developer policy. I am particularly interested in commit messages, since Apple folks are actively adding more commits with rdar:// links all the time. I think changing that could be disruptive, and needs a discussion.

Also, should this go to an LLVM category? This isn’t really clang-specific.

AaronBallman · July 6, 2023, 11:02am

I don’t think we’ve written anything down – common practice is: dead links are bad and should be removed/fixed when discovered. Internal links are indistinguishable from dead links to most of our contributors.

I’m not certain we really need to write anything down (but I’m not opposed should someone want to clarify this for the project as a whole).

What is the justification for needing to put internal links into public source code/tests/commit messages? (At best, I can see a discussion delaying removal of existing rdar links from source/test files because we don’t want to be disruptive, but I don’t think we want to allow every corporate contributor to put their internal references into the public repository and I don’t see why Apple should be the only company allowed to put internal links into the repo. This information is better kept in the downstream repo where it’s far more relevant.)

It certainly could, though there’s no way to move posts between categories. I added some more tags to this post so we can keep discussion in one place.

arsenm · July 6, 2023, 11:21am

I don’t really see the plus to removing this metadata. In the past I’ve found someone who could reference the rdar link to help with issues

AaronBallman · July 6, 2023, 11:54am

<rdar://123456> links are not well known outside of Apple and people who have interfaced with Apple, and provide no relevant context to most folks. That’s a burden for new contributors and creates an in-group and an out-group. It also doesn’t scale at all – we have dozens of corporate contributors with various private bug trackers.

You can find someone who can help with issues using git blame, which is actually more useful than the rdar link to most folks.

(Note, I’m not opposed to links in general – open trackers that aren’t our own, such as GCC’s bugzilla, are great. I’m only opposed to links that nobody can access except a small percentage of the community. That link should never be what’s providing sufficient context to the reader for what it’s commenting on.)

Endill · July 6, 2023, 12:10pm

I just moved it to LLVM Project. Is that what you wanted?

AaronBallman · July 6, 2023, 12:17pm

I couldn’t see the way to do that, but I see now that editing the post has a drop-down you can use for this. Thank you!

asb · July 6, 2023, 12:24pm

Maybe this deserves it own topic, but another piece of metadata I feel is generally a poor match for an open source project (though I get why it can make sense for small teams within that project) is the TODO(assignee_username) format in comments. It’s information that can easily become outdated (and produces unnecessary churn if trying to keep it up to date). It is also liable to put off new contributors looking for easy tasks to pick up. If it’s something worth tracking with an assignment, I’d rather see an issue for it.

AaronBallman · July 6, 2023, 12:34pm

Strong +1, though we do have something written down about that here: LLVM Developer Policy — LLVM 18.0.0git documentation

“Overall, please do not add contributor names to the source code.”

pogo59 · July 6, 2023, 2:55pm

I think it’s more that Apple is the only company brazen enough to do this.

At Sony we have a policy to flag downstream patches with comments citing the internal ticket, because it makes “why did we do this?” a whole lot easier to figure out that way. We don’t put those in upstream patches, or even upstream bug reports, mainly because it’s useless clutter as far as everyone else is concerned. Also, once it’s upstream, there’s a whole lot less need for that kind of back-tracking.

I’d be fully behind a policy not to cite internal tickets in source/test/docs. I’m neutral on citing them in issues or commit messages.

blangmuir · July 6, 2023, 4:43pm

What is the justification for needing to put internal links into public source code/tests/commit messages?

I think the benefit of internal bug identifiers in commit messages is that it allows contributors and tools who have access to that internal bug tracker to connect commits with tickets. They are small bits of metadata that are unobtrusive to readers of the commit log. If the concern is that they should not be the sole source of context for a change, then I fully agree: the commit message needs to stand alone.

These links have been added to commit messages actively and for many years. I think the onus is to show there is sufficient harm to having these links in commits before we create a policy to ban them.

I don’t think we want to allow every corporate contributor to put their internal references into the public repository and I don’t see why Apple should be the only company allowed to put internal links into the repo

I don’t see why any other bug id that is equally unobtrusive should cause a problem. I would certainly not argue that any company should have special status.

mehdi_amini · July 6, 2023, 4:53pm

I would think that while connecting to internal bug trackers is useful, it might be healthier for the community to always link to a LLVM bugs, and allow for third-party to maybe link to their internal tracker from there. That would ensure that the community always has enough context and a place to discuss an issue publicly.

arsenm · July 6, 2023, 5:56pm

Most of the bug reports I receive are through internal JIRAs and the users were never going to file upstream bugs in the first place (although I would often prefer they did). The vast majority of bugs are also simple, and have no worthwhile discussion. I’m not going to put in the manual labor of extracting pertinent details into a public ticket to link to, especially for something I’m just going to fix myself. That would be more effort than the patch itself in most cases.

We recently started adding Fixed: footers to commit messages such that they’re automatically tracked as the commits are merged through various branches for select issues. The point wasn’t any kind of human benefit, it was for tooling so I don’t need to manually manage it.

JDevlieghere · July 6, 2023, 6:04pm

+1. I think we all agree that such a link can never be the sole source of information, but especially in a commit message, a tiny amount of metadata seems harmless.

AaronBallman · July 6, 2023, 6:38pm

A few folks have mentioned this as just a bit of metadata and so what’s the harm? The harm is that this metadata is not useful to the community as a whole but everyone is exposed to it. I’ve had several people mention rdar links specifically as something that has always confused them – a lot of folks don’t even know it’s a bug tracker, let alone what to do when they run into a problem with a test containing such a link. So while it’s a small amount of metadata, I’m more worried about signal-to-noise ratio because the whole community is impacted by these.

FWIW, in many cases, these links are the only source of information. e.g., this is extremely common in Clang: https://github.com/llvm/llvm-project/blob/c1e283851772ba494113311405d48cfb883751d1/clang/test/CodeGen/2009-06-18-StaticInitTailPadPack.c#L4

If we all agree that there needs to be sufficient context for a reader to understand what’s going on, then the rdar link adds zero value. And if there’s not sufficient context, the rdar link is pure frustration. So I am still not seeing what the benefit is for the community.

I disagree that there’s a community onus to explain why dead links should not be allowed in the tree. This is not a change in policy.

I agree that no company should have special status, that’s why I’m removing rdar links in the first place. Apple is the only company that routinely adds internal bug tracker links to the repo. If Apple’s bug tracker was open to the public or upon request by community members, then I’d have zero problems with the links as it’d be no different than links to Microsoft’s public Connect database, or GCC’s bugzilla, etc.

I disagree (vehemently) that we should allow arbitrary internal links to the repo. This creates for a poor user experience, especially for folks new to the community. Inaccessible resources are detrimental, not helpful.

tahonermann · July 6, 2023, 6:45pm

I’m ok with continuing to allow private links in code review comments, in github issues, and in commit messages (when accompanied with appropriate commentary so that the linked content is in no way needed to understand the change made). I don’t think they are appropriate in source code, tests or in documentation.

AaronBallman · July 6, 2023, 7:00pm

It sounds to me like most of the contention is around using these links in commit messages, and nobody is arguing to keep the links in source or test files themselves. If that’s accurate, I think I can hold my nose with commit messages because of the tooling argument. We already allow Fixes <link> and Differential Revision: <link> in commit messages (because of tooling and because of community benefit to linking information together), so allowing other accoutrements to help tooling seems reasonable enough so long as it doesn’t start to overwhelm the useful information in the commit message.

tahonermann · July 6, 2023, 7:06pm

And so long as commit authors add sufficient commentary such that the link is superfluous. If we see commit authors failing to do so (and code reviewers failing to ensure that commit authors do so), then we should reconsider allowances for links in commit messages again.

blangmuir · July 6, 2023, 7:08pm

This addresses my concerns

bcardosolopes · July 6, 2023, 10:57pm

+1

Topic		Replies	Views
[RFC] Specify a community policy for use of private links LLVM Project clang , llvm , policy	41	1861	August 25, 2023
rdar URLs Clang Frontend	2	88	January 7, 2009
[llvm-commits] [llvm] r168244 - /llvm/trunk/CODE_OWNERS.TXT Clang Frontend	5	84	November 27, 2012
Broken link in "http://clang.llvm.org/docs/InternalsManual.html" Clang Frontend	1	74	May 1, 2010
Clang is a resource hog, the installers for Windows miss quite some files, and are defect! LLVM Dev List Archives	14	111	August 23, 2020

[Code Review] Reminder about links in code & commit messages

Related Topics