[RFC] review notes bot

I had this idea to have a way to specify file-level comments that we always want called out in code review, and have a bot post them.

For instance, in sanitizer_common_interceptors.inc we could add a comment like

// REVIEWNOTE: Note to reviewer: make sure there is a test in compiler-rt/test/sanitizer_common/TestCases

and a bot would parrot it to every CL that touches the file (preferably if would leave REVIEWNOTES for all touched files in one comment, to avoid spam).

The risk I see is that if those notes proliferate, at some point people will ignore the comments.

What do people think?

I think this would be better served by a IFTTT (if this then that) lint tool. You would have some comment syntax that says if block of code x is touched, then y needs to be touched (folder in a file, different segment of code, etc). It ensures what you want happens, but also eliminates noise if people are doing what was requested. I know @mtrofin was also interested in similar functionality for ensuring struct definitions matched between compiler-rt and LLVM, but that ended up being solved with preprocessor macros and a common header file.

It shouldn’t be very difficult to implement (after a reasonable design). It’s mostly just a matter of ensuring there is a user base and someone actually implementing it as a github action/probably associated python script.

I think it’s a nice idea, but as GitHub doesn’t really give us control over how such comments are displayed vs human review comments, I do share your fear about them becoming too noisy to be useful. The undef bot is a good example of this, where it can be a bit annoying how it generates feedback for trivial edits of tests that already use undef.

I think review notes for all touched files in a single comment would be OK (and in general, the more we can combine any other bot feedback into a single comment the better IMHO). You can also use <details> so it tells you there are N automated notes to reviewers and you can click to expand.

Combining feedback and editing instead of posting new comments is possible.

The formatting check already finds an existing comment using an embedded HTML comment, in the Markdown of the first GitHub comment it posts. We could segment that one comment with further HTML comments for sections if needed. As long as all the things that would add to these notes are sequential, to avoid a race to edit.

(the undef checker is actually one of the “formatters”, because that gives it this edit/update feature for free)

The nice part of it being in the file is anyone can change them without finding the automation script. So if someone moved the file for instance, they don’t lose the annotation.

If we did this in Python, they have to find the automation script and we may end up with checks that don’t fire because the file was moved at some point. However Python gives you all the logic for “free” if you want to do more complicated things.

Could of course do both. If 99% of the use cases are “any edit to this file triggers this comment” then put that in the file, the 1% that are more complex (like the undef checker) can be in Python.

Which could be a way to mitigate this part. As it’s not just that a common reviewer note might get ignored itself, but a valid one might be ignored because it’s surrounded by spammy notes.

So perhaps we try to set a standard like:

  • Any edit produces a note - can be done in the file (and if it’s too noisy, removing it is very easy)
  • Anything more complex - must be done in an automation using Python (and we make it easy to add one without being an expert in the whole thing)

I think maintainers would be able to make that decision.

It does depend on what the split is, what sorts of things would we want to check in addition to the one suggestion we have so far? Would be good to gather more.

And if what you’re suggesting would be valuable for reviews right now, we don’t have to wait for some framework to be made, we could add it using Python in the short term.

I later thought perhaps this would add a way to spam using the llvmbot account.

I think a workflow that runs from the PR’s branch is given a token that cannot write to the repository it is PR-ing to. So that would solve that part. So if you modified the workflow itself, you couldn’t write new comments.

If you change the annotations in a file, then you could but it is limited to that PR only. Or we can find a way to only look at the upstream branch’s code (get the list of files from the PR, get their contents from main).

A combination of reviewers denying anything that’s clearly spam and setting up the workflow correctly could address the potential issue.

(someone could directly push garbage annotations, but that goes for a lot of things and is a different problem entirely)