"Hidden emails" on GitHub: should we do something about it?

mehdi_amini · October 18, 2023, 9:11pm

GitHub has a setting to “keep your email private”, which impacts operations done through the Web UI, including merging a pull-request.

Here is the impact in the git log:

$  git log  | grep users.noreply.github.com | sort -u 
    406f1e8211f8f5017f44f46af750dec061e707a2 by Zhang <5205699+Naville@users.noreply.github.com>:
Author: 5chmidti <44101708+5chmidti@users.noreply.github.com>
Author: Aaron Jarmusch <61516473+ajarmusch@users.noreply.github.com>
Author: Aart Bik <39774503+aartbik@users.noreply.github.com>
Author: Abhina Sree <69635948+abhina-sree@users.noreply.github.com>
Author: Abhinav271828 <71174780+Abhinav271828@users.noreply.github.com>
Author: Adam Straw <26910851+adstraw@users.noreply.github.com>
Author: AdityaK <1894981+hiraditya@users.noreply.github.com>
...
Author: Zain Jaffal <110604482+zjaffal@users.noreply.github.com>
Author: Zhang <5205699+Naville@users.noreply.github.com>
Author: zyn0217 <117258288+zyn0217@users.noreply.github.com>
$  git log  | grep users.noreply.github.com | sort -u  | wc  -l
238

The same users pushing commits directly to main wouldn’t see the email hidden.
When merging a PR for someone else, if I check it out locally and push the commit, it would likely also see the original authorship preserved.

So: is this OK or should we do something about it?
An easy step could be to have one of the action bot add a warning comment in the PR when the author has this setting enabled.

jrtc27 · October 19, 2023, 4:04am

In my opinion (and seems to be shared by a significant number on Discord), it’s not, and we should. One notable point brought up was that it makes it hard to find contact addresses authors for re-licensing purposes, as has been done in the past (hopefully not again, but…).

kbeyls · October 19, 2023, 7:28am

I’ve been trying to think this through. So far, I think that:

It is important to be able to reach authors sometimes years later. Is a github user name good enough compared to an email address? With a github user name, we’re relying on github being around years from now. With an email address, we’re relying on the email address still being in use by the person years in the future. I think both are not perfect; but I wouldn’t say one is clearly better than the other.
I do think that it is important in some cases to be able to send a private/confidential message to an author in some circumstances. Email addresses obviously enable that. I’m not sure if it’s possible to send a confidential message to someone if you only know their github user name?

Overall, my feel is that anonymous email addresses might be OK if there is a possibility to send a confidential message to someone when you only know their github user name. If that is not possible, I think it would be best to enforce authors sharing a working email address in the commit log.

jh7370 · October 19, 2023, 8:08am

One of the downsides with hiding email addresses is that it is no longer obvious if a contribution is owned by a corporation rather than an individual acting in their own time. For example, it would no longer be obvious that my contributions are owned by Sony if my email address was hidden. This information was important for the relicensing efforts, but I can imagine there might be other cases where it is important too.

jcranmer · October 19, 2023, 6:38pm

Another issue with this approach is that the committers won’t be notified by the buildbots if a build fails since, as far as I can tell, these emails are not deliverable.

mehdi_amini · October 19, 2023, 6:52pm

For this last issue, there was a discussion about failing buildbots publishing a comment in the PR (which has the advantage that reviewers would also be notified)

j2kun · October 19, 2023, 7:37pm

I have a lot of random things going on via GitHub, and so I prefer to not have my primary GitHub account email associated with everything I do by default. I am OK having my corporate email associated with my MLIR commits, but it would be lame to leave my job and lose the association of that work with GitHub account, which I would expect to last longer than my employment.

I thought it was sufficient to add a repository-local git config to associate a per-repo email address, so long as the locally set email is associated with your GH account in the GH settings. The official GH docs explain this at Setting your commit email address - GitHub Docs

However, in [mlir] add debug messages explaining failure reasons of isValidIntOrFloat by j2kun · Pull Request #69476 · llvm/llvm-project · GitHub the UI seems to claim that the authorship would have been overwritten when squash+merging. Indeed, I tested this on a different repository and the authorship info is rewritten. However, it does NOT rewrite author information when the merge is a rebase+merge. If the convention is to have one commit per PR, then a rebase+merge strategy might help.

mehdi_amini · October 19, 2023, 7:39pm

FYI you don’t have to remove a work email from GitHub when you leave, for example this commit from me 6 years ago with my @apple.com email is still associated with my GitHub user there (and that was done on SVN before the LLVM project moved to GitHub )

Also: my work email is not my “default” in my profile. But I don’t know if it’ll impact what would be used when I merge one of your PR? GitHub may be smart enough to use your commit email…

Seems like lot of requests for work/personal customization:

Allow setting default account email per organization · Issue #32 · todogroup/gh-issues · GitHub (how when you merge the PR yourself you can select which email to use in the UI, but it does not work when I merge it for you: it’ll use the default email of the account)
Specify an email address to use for commits to an organization's repositories · community · Discussion #4168 · GitHub

j2kun · October 19, 2023, 7:44pm

I also posed this question to the GH devs: Is it possible to "expose your email" on a per-repository basis? · community · Discussion #70582 · GitHub

5chmidti · October 19, 2023, 8:06pm

There is also Emails should have an individual "private" option · community · Discussion #58127 · GitHub for selecting which emails can be exposed if you have multiple.

PiotrZSL · November 15, 2023, 11:06pm

I’m bumping this topic, this becomes more problematic.
Problems that I seen:

Scenario A: (Example: #71974)
Pull request contain commits from an real email and users.noreply.github.com email. During merge only users.noreply.github.com email is used. Information about original email is lost.

Maybe CI could detect this somehow and provide some warning to user.

Scenario B
Commits in pull requests are from email “ABC”, but user want to merge this as “XYZ”. In such case merge need to be done manually & information about state “merged” is lost in pull requests, such request is set as “closed”.

Scenario C
Email provided by user is invalid. Should such PR’s be somehow blocked ?

This issue could be problematic when some contributions supposed to be done from a corporate email, but instead merges are done as users.noreply.github.com.

Maybe CI should provide some warnings, and maybe users should be able to self-merge already approved PRs, so they could see an email and be able to select correct one, otherwise person that do squash and merge should be able to adjust author.

tstellar · November 15, 2023, 11:13pm

I know this was discussed a little bit on Discord a few weeks ago, but do we actually have a technical solution to prevent commits with these emails or do we need to implement something?

mehdi_amini · November 16, 2023, 12:36am

We would need to write a github action detecting this I think?

tstellar · November 16, 2023, 12:38am

What do we need to look for? A PR author with a hidden email address?

akorobeynikov · November 16, 2023, 12:39am

I think the “hide email” feature of GitHub rewrites the email address when the PR is merged…

mehdi_amini · November 16, 2023, 6:29am

Isn’t there an API to query information about a PR author? Wouldn’t we get the anonymous email in this case?

tstellar · November 16, 2023, 6:50am

This should do it:

import github
import sys

token = sys.argv[1]
pr_number = int(sys.argv[2])
gh = github.Github(login_or_token=token)
if not gh.get_organization('llvm').get_repo("llvm-project").get_pull(pr_number).user.email:
    print("Private Email")

Once we’ve detected this, what action should we take? I don’t think there is any way to block the merge. Should we just add a comment?

mehdi_amini · November 16, 2023, 6:58am

If we want that people stop using this feature, I would ask them to in a comment. Can were make it similar to clang-format failing (that is a “check fail” mark)?

akorobeynikov · November 16, 2023, 7:06am

One possible way to handle the issue is to require DCO (Developer Certificate of Origin). And there are even bots that can check and enforce this, e.g. https://probot.github.io/apps/dco/

mehdi_amini · November 16, 2023, 7:10am

They won’t enforce the commit author I believe, but instead some comment in the commit description saying “signed-off-by”. That may be an alternative to offer to folks who want their email in the description instead of the Git metadata, but I’m not sure we should just mandate that to everyone for solving this thread (if we want this for another reason, that’s another discussion of course)

Topic		Replies	Views
Proposal: Adding a toplevel .mailmap file LLVM Dev List Archives	4	105	May 29, 2021
Convention for merging outside contributions with GitHub PRs? Code Review	3	251	September 8, 2023
Questions about migration to GitHub. LLVM Dev List Archives	0	116	August 11, 2019
Mailing list changes this week OpenMP	17	156	October 17, 2019
Forbidding @username in commits GitHub	11	603	June 26, 2025

"Hidden emails" on GitHub: should we do something about it?

Related topics