[analyzer] Bugzilla Database Cleanup Policy

Hi all, I looked through the Bugzilla database for the static analysis component. I was wondering what, if any, cleanup policy exists for long standing bugs. I found 620 bugs today. While I did not systematically look at each and every one one those :slight_smile: I noticed in passing many were in one of the following various states:

  1. A duplicate
  2. An issue that had already been solved
  3. An issue that’s not concrete, or has enough information to start with.
  4. Some (many?) of which the originator cannot be contacted for further clarification.

Most of these are Assigned to Ted (especially the ones filed before 2018).

Artem and/or Devin: Is there a policy we’re following if we want to just start going through these issues, triage and cleanup the easier ones?

May I suggest the following?

  1. Maybe for the older ones, we can prove they are fixed and close them, documenting how they were proven to be fixed in the bug, leaving an audit trail?

  2. For ones that are not concrete, vague or have a reproducer, start a discussion on the mailing list, attempt to contact the originator? And after an appropriate time, close the bug as not reproducible?

  3. Mark duplicates in favor of a more complete description of the issue?

Please let me know if you have strong preferences to initiate a cleanup, and I’m happy to follow those. I’m also willing to lead and contribute to a cleanup effort.

Best

I tried to clean up bugzilla bugs about a year ago. 620 doesn’t sound like a lot but i gave up after about 20 or so.

A lot of the early bugs are Objective-C-related because that’s where it all began - the retain count checker. We basically had one checker and people called it “The Checker”. There was also no interprocedural analysis at all.

I don’t think there’s an existing policy so let’s try to come up with something.

It’s pretty unlikely that you’ll get replies on 10-year-old bugs. You can try to ping the bug (all CCd people including the author will receive an email notification) but if it ends up having insufficient information there’s not much we can do.

Generally, i think it’s much better to start with new bugs and work backwards. Fresh bugs are more likely to be relevant, the author is more likely to be available for discussion, and addressing them quickly will make them happy.

Having a reproducer is a must for a good bug report. It doesn’t have to be small, especially given that false positives can’t be automatically reduced. We also shouldn’t ask people to reduce by hand as long as they’re allowed to provide a full preprocessed file, because not only we have enough tools to debug an unreduced bug but also it’s still very easy to accidentally remove essential bits of the puzzle when you’re reducing by hand.

If your best effort to reproduce fails and the author is not responding, closing an old bug as “works for me” is always a valid option. I don’t think there’s much value in building an ancient clang to reproduce the issue and bisecting find the exact commit that fixed.

Once a reproducer is obtained, the next step is to debug the bug. This step is not absolutely necessary as whoever finds the bug report will be able to do that anyway but it can often be done much faster than fixing the bug and also that’s the only way to properly categorize the bug report (find duplicates, assign to umbrella bugs, etc.). It’s usually very hard to guess the root cause just by looking at the report but exploded graph debugging usually yields the exact answer. So i usually try to do that. Especially when the report is about something that i thought was working perfectly.

As for categorization, i’m making “umbrella” bugs for large issues that affect many users and get reported often. I tag these bugs as [Umbrella] and for now there’s three of them (you’ve already seen two). The individual instances are duped to them and the dupe count is supposed to indicate how big of a problem it is (i don’t think it’s actually working though).

Finally, please cc me if you find something interesting ^.^

There was a recent push on reducing the number of open issues by @Endill and many others. Read more about that here and here.

In some cases, we have issues for “feature requests”. What should we do if I don’t think we would implement that feature (likely ever), should we close those tickets? What’s the value of having them?

Also, I don’t have any idea what to do with ObjC issues. I don’t really know of people who actually cares about ObjC much, except for tripleCC, but recently I haven’t heard from him either.

Thank you for bringing this to your subproject!

An opinion from sidelines, since I’m not involved with static analyzer:

If it’s a matter of resources and reviewer bandwidth, we usually keep those around, at least that’s what I saw. They are also useful to mark new feature requests as duplicates, because older ones (usually) has a discussion with implementation challenges.

It feels like Objective-C community is not around anymore for static analyzer (migrated to Swift, I guess), which could be the reason to consider phasing out support for it. This obviously requires an RFC. On a bright side, if that RFC doesn’t reach consensus, it means that you found someone willing to support Objective-C.

CC @AaronBallman @rjmccall

The way we typically handle it in Clang is to leave the feature request open unless there’s a reason to close it. e.g., close the issue if there’s an existing feature that does effectively the same thing (so we’re unlikely to re-implement in a second form); if the feature request is impossible to fulfill; the author retracts the suggestion, etc.

ObjC is still around and supported in Clang (I see folks from Apple working on fixes to it), so I don’t think we should remove ObjC support for things that are working and still potentially in use by users or downstreams.

However, there may be opportunities to clean up some older ObjC offerings that are no longer needed (at least in-tree). For example, I’m not certain how much we need the ARC Migration Tool any longer, so perhaps it’d make sense to remove that and related static analysis checks? There is an old ObjC runtime and a new ObjC runtime; it’s not clear that we need to support the old one any longer, so perhaps it should be removed? @rjmccall would likely have a lot of good insights on this topic.

Sure, I’ll ask around and see if we can get you an authoritative answer about what is and is not needed.

2 Likes

Thank you, I appreciate it!

Any news on this?

Should we move this discussion to outside of a three-year-old thread about the Clang Static Analyzer’s bugzilla database?

1 Like

Oh definitely! However, there’s not much point to the wider discussion until we hear back from folks at Apple, so I was waiting on suggesting an RFC until we knew it wouldn’t be quickly shot down. If it turns out there may be stale functionality we can remove, we should definitely ask a wider audience and not hide the decision here. :slight_smile:

For your immediate question about the bug database, Apple doesn’t consider those bugs defunct as a whole and do still watch the set of new inbound bugs. We can review the existing bugs to see if there are any that we should just close, but we would not support a policy of closing them en masse, and I think you should just migrate all existing bugs over to Github.

All the current Apple-authored standalone ObjC components, such as the ARC Migration Tool, are still supported features in Xcode. There are a few ObjC features in lib/CodeGen that we believe we no longer need to support and could remove, such as our “fragile” ObjC runtime. Apple also does not support ObjC GC, but I believe it was implemented in the GNU runtime at some point, and so to remove frontend support completely we would need agreement from the GNU ObjC runtime maintainers that it’s no longer supported.

2 Likes

Thank you for the details, John!

@davidchisnall What do you think of dropping support of ObjC GC from GNU runtime? If you stepped down from that project, please let me know who we can talk to about this.

As far as I know, there are no users of Apple-style GC with the GNU runtime (the GCC runtime had some older Boehm GC integration that had at least one user, who I think has now moved to ARC). The abstract model was deeply problematic: there was nothing in the type system to differentiate between memory that it was safe to store GC’d pointers into and things that weren’t and so no one that I’m aware managed to write non-trivial code without memory safety bugs in the GC mode (GC code was more likely to contain memory-management bugs than manual retain-release, which was quite an achievement).

I removed the GC-mode code in the runtime 7 months ago. I would be very happy for it to be gone from the compiler too.

3 Likes