At the same time, I do see the benefit of having everything accessible to the community, and I wonder if having a corresponding LLVM-wide policy might not have the long-term overall benefit of pushing developers at traditionally more closed companies towards using the public bug trackers more (including us…).
My experience is the only value from the ticket itself is the original testcase. The comments are mostly noise about trying to extract a reproducer. All the contextual discussion about the actual change happens in the public patch review
Thank you all for the discussion on this! To summarize where we landed:
Correct or remove dead links when you find them.
In general, don’t put links to internal or private resources in source code or test files; we strongly prefer publicly accessible resources whenever possible.
It’s fine to put links to internal or private resources in commit messages, code review comments, issue comments, etc. However, the surrounding prose should give enough context that the information behind the link isn’t required for the community at large to understand the situation.
I’ve removed most of the rdar links from the Clang code base in d618f1c3b12effd0c2bdb7d02108d3551f389d3d. I looked around for other private links and could not find others and I’ve not yet tried to find dead links across the code base.
I get the objections to radar links, but could we please not remove the links from existing code? I can understand not allowing them in new commits, but making a huge change like that one runs the risk of creating lots of merge conflicts in forks and makes it harder to interpret diffs that cross the revision where you made the change.
I say this as someone who has made these kinds of wide ranging changes to a large code base in the past and, honestly, I think it was a mistake, even though at the time I felt justified in doing so.
The commit is already in, but even if it wasn’t, I’d still argue that it’s reasonable to remove these links. It predominately impacts test files, which should limit the blast radius for downstream merge conflicts and we want to keep the community code base as clean as we can.
I’m sorry if this causes consternation for downstreams; we could consider reverting select files back to their original state if a downstream needs more time to react to some specific changes.
Honestly, I think we are chasing an unrealistic ideal here. Yes it would be great if all internal tickets were mirrored by external tickets, but the reality is that they are not (especially for older issues since I think the culture might have changed). IMO, removing these links simply means that some maintainer who does have access to the internal tracker will never find some information that could be extremely useful or even critical to fix/improve something.
This has happened to me in the past while doing archeology in libc++ and frankly being able to access Howard’s early libc++ tickets completely unblocked me on some stuff I would have wasted weeks on otherwise. The reality is just that our code base has evolved in ways that are often motivated by internal improvements/issues and it’s not always going to be clearly mirrored upstream. Removing that information seems like a good “idealistic” idea but in practice I suspect it will do more harm than good. I would support however banning these from the code going forward (but not from commit messages).
What I do in libc++ is that I always ensure to have enough context publicly available for things to make sense to anyone, but I still include a link to the internal tracker if there is one because that’s just free additional information for those that have access to it. The thing that’s actively harmful is if people get lazy and only include the internal bug tracker ID, then stuff stops making sense for everyone outside the company.
What about git blame / git log? Has always been my go-to way of finding why something has been added (at which point you get back the link to your private tracker, either in the commit message or in the original patch).
For the years I was code owner of Clang, the general policy was that links into internal bug trackers were acceptable in test cases and commit messages, but should be avoided elsewhere. The reasoning was as follows: with test cases, the test case itself is usually self-explanatory, but having the link in there is useful for tracking the impact the bug had for whatever vendor found and fixed it. For commit messages, it’s helpful metadata for tooling. It is not reasonable to expect that these bugs be mirrored in LLVM’s bug tracker, because any information beyond the reduced test case will often involve confidential information, from the vendor itself or one of its customers. There’s no merit in pursuing that as a policy.
Having links to internal bug trackers in non-test code is problematic because it is too easy to use it as an excuse to not properly document the code itself. And even if you have also documented the code properly, a reader will always be left to wonder—is there something more behind that mysterious link? So our convention was not to allow them in code, because code needs to be fully explained to everyone.
I still believe that the unofficial policy we followed while I was code owner is the most reasonable one, and that this newer, stricter policy against any links to internal bug trackers is unhelpful. It doesn’t add clarity, but it does suppress useful information.
More importantly, I think this was managed extremely poorly. The original post in this thread stated a policy that differs from historical norms, was barely discussed, and is not written down anywhere. It received immediate pushback. Then within two days, the policy was retroactively enforced on a code base with a 15-year history (removing a nontrivial amount of information), attempting to change the conversation to this:
The commit is already in […] we could consider reverting select files back to their original state
It is actively hostile to land such a sweeping change amid existing objections. If you want to push for a policy change here, bring it up as a discussion, but do not commit the change and then pretend that the onus is now on others to argue for individual bits to be restored. If you truly believe that this is the right policy, then it should be an LLVM-wide policy rather than a Clang-specific one. The LLVM community appreciates its code owners and gives them broad license to make the right technical decisions for the project, but it also expects them to listen to the community, take their feedback seriously, and discuss significant changes before they are made. That did not happen here.
with test cases, the test case itself is usually self-explanatory, but having the link in there is useful for tracking the impact the bug had for whatever vendor found and fixed it.
Recently I broke some tests which happen to use a feature whose behavior I (intentionally) changed.
I don’t believe the changed feature is what the broken tests cared about.
Yet one such test had a radr link, no other comment. What that test actual user code that got reduced and put here? Was i breaking something someone actually relied on?
Well, there is no way for me to tell, is there?
So no, these links are not helpful to people maintaining clang but they are also confusing. They hint at the existence of context and information which is presumably of interest, without providing that information.
If these links end up in commit messages and issues there will be more context so that’s fine, but even in tests some context matter. Putting a radr link next to a test is no different that, for example puting nothing else than “fix radr/1234” in a commit message would be. It’s…not helping the community.
It is not reasonable to expect that these bugs be mirrored in LLVM’s bug tracker
I think it is reasonable for a description of the bug to exist somewhere, the context of which products the bug occurs in is never relevant
I still believe that the unofficial policy we followed while I was code owner is the most reasonable one, and that this newer, stricter policy against any links to internal bug trackers is unhelpful. It doesn’t add clarity, but it does suppress useful information.
If it is useful it should be accessible to the community, if it’s not useful it should not be there.
I understand the desire to have a way to maintain a link between community and internal issues but commit messages and bug trackers seem sufficient for that.
I’ve also been wondering about including alive2 and other godbolt links. I don’t know how forever those unique URLs will be. They’re useful in the review but feel like they may disappear someday
While triaging old Clang bugs, I’ve encountered couple of godbolt links in decade-old issues. They were still working, save for the fact that Clang (trunk) has made progress, and doesn’t showcase the intention of the author anymore. But it’s not Compiler Explorer’s fault at any rate.
Old Wandbox links sometimes work, when you’re not hitting internal server error. What’s good about Wandbox is that it saves original output from compiler and/or program.
I can’t say the same about the links to file shares and private cloud drives (Dropbox, Google Drive) though. These are dead more often than not.
I’m against putting any internal links into an open-source repository. They help nobody, except people from the company that owns the internal material. If the code change is simple, there is no need to add any links, if it needs explanation, then the explanation should be included either in the code, or in a public ticket (which doesn’t have to contain any confidential material).
The code changes and information we put in the upstream repo should serve everyone, not just a selected group.
I agree with @DougGregor here, we’ve long had an approach of allowing internal bug tracker links in test cases and commit messages (but not .cpp files that are part of the core product) - this is because bug reports in open source code come from many originating places, and it is useful to folks who are in those organizations to be able to track things, and does not hurt other people.
One thing I don’t understand about the OP: what harm do these cause? We don’t remove things because they aren’t useful to everyone - it is enough to be useful to some people if they don’t cause harm. Was there a motivation I missed?
I don’t think they should live in tests or source, but are totally reasonable for code review, issues, commit messages, etc. I had a similar thought about wg21.link links, but those seem more reasonable because the paper number is in the link itself so there’s usable information even if the service goes down. e.g., https://wg21.link/p1235r0 is the same as https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1235r0.pdf
Can either you or @DougGregor point to some documentation to back up the assertion that this is existing policy? I’ve found no mention of this in our developer policy and when I search through old emails I can find no discussion of this topic that suggests the community made an active decision here, so if there’s details here, I’d be interested in reading the discussion. I also think the community has grown significantly and its needs have changed since then; would you agree?
The only links I could find to an internal resource were from one originating place. If this is useful for tracking purposes and we have dozens of companies contributing, I would expect other links to have shown up organically sometime in the past decade.
I disagree that it does not hurt other people (see below); the harm caused is precisely why I spent effort removing these links.
They’re harmful in several circumstances:
New contributors run into them and have no idea what they are. Think about the experience they have – volunteer in a new community to do some work, run into a rdar link, ask publicly what it is because they’ve never encountered it before (not something less experienced contributors enjoy doing), figure out that the information is in some private bug tracker, go track down an Apple employee (despite us having no listing associating people to companies they work for and also not something new contributors are comfortable doing), and ask for more details. On top of that, they’re unlikely to get those details because oftentimes the folks within the company don’t feel they have the ability to make the call to publicize internal bugs, which puts them in an uncomfortable position and adds legal risk of leaking private IP into the public. This is not a good experience for anyone involved.
It’s been (mildly, unintentionally) abused. In the NFC commit I made, rdar links were removed from source files, though nowhere near as many as in test files. In the test files, there were plenty of instances where the only context was a rdar link.
Because the links all originate from one company, it gives the appearance of giving that one company special treatment. While that might not be true in fact, the appearance alone is damaging to the community.
Given the problems it causes, the argument that “this is useful to those organizations to be able to track things” is putting the burden on the community to support that ability without giving the community much benefit. Given the friction it causes, why is that reasonable?
I do not believe this was mishandled. I don’t think there is a policy that these links are acceptable and we really shouldn’t need a written policy that says “dead links should be updated or removed”, though I’d be perfectly fine writing that down if folks think it’s needed. I asked on this thread and when people pointed out the value of these links within commit messages I changed directions to accommodate that use. That appeared to have resolved the concerns of folks who had commented. But if the community believes this was inappropriate, the commit to revert is https://github.com/llvm/llvm-project/commit/d618f1c3b12effd0c2bdb7d02108d3551f389d3d.
Speaking only for Sony, when we began participating in the open-source community, we consciously chose not to put internal references into issues, reviews, etc., on the grounds that (a) they were noise to everyone else, (b) the back links were unlikely to be all that useful anyway. And (c) at the time, both we and open-source were using (obviously different instances of) bugzilla as the bug trackers, so citing an internal bugzilla in open-source would be ambiguous (unless we used full URLs, and there was some concern about security if we did that). As long as we had proper forward links from our internal tracker to open-source, it would be possible to search the internal tracker to find the relevant information.
After a dozen years of working with LLVM, I personally can’t remember a case where a back link would have been useful to me. It certainly has never been enough of a problem to want to rethink our internal policy decision. Maybe people at Apple find these rdar links are constantly useful? I’d be curious to hear about the actual value in practice, as opposed to the theoretical value.
In most cases IMO: a policy comes codifying an existing practice, and in the absence of documentation the closest thing we have is the existing practice. So my take is that the evidence from the codebase (thousands of radar links?) shows that the admitted practice from the community was exactly as Chris described.
Being the one wanting to change or challenge the admitted practice, I would have thoughts that this proposal would take more time for people to comment and object and ensure some convergence before landing the change.
And to be clear: I am otherwise completely supporting you on the goal: the rdar link in the codebase (source and tests) always bothered me, even when I was at Apple (commit message and issue tracker should have enough breadcrumbs).
I think you bring up fair points, thank you! Unless the answer is a clear no, it’s sometimes a challenge to determine what consensus is or how long to wait on any given discussion. Before landing, I thought I had addressed the concerns that were possible to address. In hindsight, waiting longer would have been a good idea even if it ended with the same results.
@cor3ntin, I think I agree with you in terms of outcome. I would reword the statement though: If it is useful for internal purposes, it should be managed internally.
To me, the problem with allowing such links is that is it unclear what “respect” such links carry. The presence of the link can be perceived as “marking territory”. If the link wasn’t there, the normal community process proceeds. With the link there, contributors are subject to additional uncertainty. Internal issue links should be banned/removed wherever they can be read as “you are not sufficiently informed unless you read X”.