Automating the releases a bit better.

Hello,

Me and Tom was talking yesterday about a way to automate and reduce
the manual work that goes into getting the release testers builds into
github.

Currently the release testers build the distribution, uploads it to
the sftp and then sends a email with the SHA-256 to Tom. Tom then
verifies the files against the sha, signs them with pgp and uploads
them to github.

This is a pretty labor intensive procedure and makes it so that
release artifacts can lag quite a bit, depending on how much time Tom
has available.

I have two ideas on how to make this less annoying:

* We could have the release testers upload a .sha256 file together
with the distribution that contains a single line with the expected
hash. We could then write a script that takes the sha, compares it and
if it's correct signs it with the release key and uploads to github.
This can either be automated to run on a cron schedule or something
that Tom runs manually on his machine. The downside to this method is
that we remove the separate channel for the sha256 transmission. So if
someone would want to upload a malicious build he would "only" need to
gain access to the sftp. I am not that worried about that at this
moment, but something to consider.

* The other more secure option is that the release testers actually
sign the binaries with their own key. These key identities could be
then be send async to Tom and now the script would check the signature
against the list of known testers. This would solve any point of
origin problems. But it would require a bit more on the release
testers side. For my part I think it might be worth doing this, we
could even write a script that could automate this on the testers side
as well.

I direct this question to the testers and the community at whole, what
do you guys think about the extra work and the security tradeoffs
here?

Thanks,
Tobias

I have the same question I had when the last discussion of pointer signing came up: what is the threat model?

The first doesn't seem to gain any benefit at all from the hash. This could easily be computed on the server because anyone with the ability to tamper with the distribution can also tamper with the hash.

The second still doesn't really answer the question about what the signature is for. A cryptographic signature is an attestation of some identity, coupled with a set of claims. I think that the *most* that we're able to claim with the current and proposed infrastructure is that the LLVM project is able to identify the person[1] who tampered with the builds, if someone later identifies that the builds do not come from the source tree that they claim. As a user, that doesn't seem like it's particularly valuable.

Given that LLVM is intrinsically a cross-compiler and can self-host for all of the platforms that we care about, if we want to improve this process *and* be able to make some useful claims, I'd propose that we move away from individuals building things on their own hardware and towards individuals (or groups) maintaining sysroots (or, ideally, scripts for fetching everything that goes into a sysroot from upstream) and we do all of the builds on a pristine VM. We can then automate the signing process as the next step in a pipeline that consumes the build artefacts. We may need to do Windows and macOS builds on those systems, but *BSD, Solaris, Linux, and so on, for any architecture, should all be able to build on any platform because all of the headers and libraries are available and free to redistribute.

David

[1] Using the term 'person' in the loosest possible sense of the word. We don't check ID or anything and on the Internet no one knows that you're a dog. I have a reasonable amount of confidence that I know who Dim is when he produces the FreeBSD builds, but I don't think the project as a whole - and Tom in particular - have any evidence that they could present about whether he is really who he claims to be. The most that we can really claim is that we can map from a binary to an email address and that's such a weak claim that I'm not really convinced that it merits the effort in cryptographic signing.

Hello David,

I have the same question I had when the last discussion of pointer
signing came up: what is the threat model?

The first doesn't seem to gain any benefit at all from the hash. This
could easily be computed on the server because anyone with the ability
to tamper with the distribution can also tamper with the hash.

The benefits here are not so much security, but it's more that it's a
correct upload of the binary from the testers side. It's slightly
worse than what we have today in terms of security, but a massive step
forward in convenience.

The second still doesn't really answer the question about what the
signature is for. A cryptographic signature is an attestation of some
identity, coupled with a set of claims. I think that the *most* that
we're able to claim with the current and proposed infrastructure is that
the LLVM project is able to identify the person[1] who tampered with the
builds, if someone later identifies that the builds do not come from the
source tree that they claim. As a user, that doesn't seem like it's
particularly valuable.

I agree with all of these points. We can't 100% know who is
identifying as that person currently. When we can meet physically
again a signing party could work for that, but that would potentially
exclude people that could not travel etc.

Now, I wrote this email to try to solve the problem of manual having
to upload releases, not so much trying to solve the security issue. I
am hoping that a solution can be found that can solve both problems.

Given that LLVM is intrinsically a cross-compiler and can self-host for
all of the platforms that we care about, if we want to improve this
process *and* be able to make some useful claims, I'd propose that we
move away from individuals building things on their own hardware and
towards individuals (or groups) maintaining sysroots (or, ideally,
scripts for fetching everything that goes into a sysroot from upstream)
and we do all of the builds on a pristine VM. We can then automate the
signing process as the next step in a pipeline that consumes the build
artefacts. We may need to do Windows and macOS builds on those systems,
but *BSD, Solaris, Linux, and so on, for any architecture, should all be
able to build on any platform because all of the headers and libraries
are available and free to redistribute.

Yeah this would be a good solution. The downside I see with this is
that the current model with testers uploading their own builds is that
they hopefully have some better control on the failing tests and that
the actual binaries that are uploaded is the ones that we are testing.
Having a CI do all of that requires a pretty big change in the
workflow - where the release manager would trigger the builds and then
the testers would download these and give them the good old testing
after that.

I think that solution would be ideal, but the question is how much
work that would be (quite a bit) and if this is a workflow that we
want to change to instead of just making the current workflow a bit
better and easier to automate.

Very much valid questions though.

Thanks,
Tobias

No problem here.

Neil Nelson

> * We could have the release testers upload a .sha256 file together
> with the distribution that contains a single line with the expected
> hash. We could then write a script that takes the sha, compares it and
> if it's correct signs it with the release key and uploads to github.
> This can either be automated to run on a cron schedule or something
> that Tom runs manually on his machine. The downside to this method is
> that we remove the separate channel for the sha256 transmission. So if
> someone would want to upload a malicious build he would "only" need to
> gain access to the sftp. I am not that worried about that at this
> moment, but something to consider.
>
> * The other more secure option is that the release testers actually
> sign the binaries with their own key. These key identities could be
> then be send async to Tom and now the script would check the signature
> against the list of known testers. This would solve any point of
> origin problems. But it would require a bit more on the release
> testers side. For my part I think it might be worth doing this, we
> could even write a script that could automate this on the testers side
> as well.
>
> I direct this question to the testers and the community at whole, what
> do you guys think about the extra work and the security tradeoffs
> here?

I have the same question I had when the last discussion of pointer signing came up: what is the threat model?

The first doesn't seem to gain any benefit at all from the hash. This could easily be computed on the server because anyone with the ability to tamper with the distribution can also tamper with the hash.

The second still doesn't really answer the question about what the signature is for. A cryptographic signature is an attestation of some identity, coupled with a set of claims. I think that the *most* that we're able to claim with the current and proposed infrastructure is that the LLVM project is able to identify the person[1] who tampered with the builds, if someone later identifies that the builds do not come from the source tree that they claim. As a user, that doesn't seem like it's particularly valuable.

The easiest option would be to have testers upload binaries directly to the
GitHub release page. Is this really any worse from a security perspective
than what we are doing now?

The main difference is that anyone with commit access can upload releases
to GitHub whereas with the current sftp uploads, we have to explicitly
grant people access.

-Tom

can we put stronger permissions on the specific repository? (or create
a separate repository to have those stronger permissions, if the
current one has it lumped in with other stuff we don't want the same
permissions on)

can we put stronger permissions on the specific repository? (or create
a separate repository to have those stronger permissions, if the
current one has it lumped in with other stuff we don't want the same
permissions on)

We can't put stronger permissions on the llvm/llvm-project repository,
so we would have to create a new one in order to limit access to specific
people.

One thing we could do in the llvm/llvm-project repository is to have a GitHub
action that is run after each upload to verify that the uploader is an
'approved uploader' and then delete any uploads from unapproved people.
However, anyone with commit access would be able to make changes to the
action, so I'm not sure if we gain anything from this.

-Tom

Releases go to the llvm-project/monorepo repository? Wouldn't that
make github checkouts for development rather large/expensive? I'd have
thought they would be going into some other top level repository like
www-releases? (though I guess not there, because you don't want them
to go live until you, the release manager, has signed off on them)

Maybe I'm misunderstanding something about what we're talking about
uploading...

Releases go to the llvm-project/monorepo repository? Wouldn't that
make github checkouts for development rather large/expensive? I'd have
thought they would be going into some other top level repository like
www-releases? (though I guess not there, because you don't want them
to go live until you, the release manager, has signed off on them)

Maybe I'm misunderstanding something about what we're talking about
uploading...

When you create a tag, GitHub gives you the option of turning the
tag into a release and then gives you a space to upload release
assets associated with that tag. So the uploads don't go into
the actual git repository, they just go somewhere into the cloud. e.g.

https://github.com/llvm/llvm-project/releases/tag/llvmorg-12.0.0

I think the confusing part is that upload permissions are the same
as commit permissions for the repository.

-Tom

> Releases go to the llvm-project/monorepo repository? Wouldn't that
> make github checkouts for development rather large/expensive? I'd have
> thought they would be going into some other top level repository like
> www-releases? (though I guess not there, because you don't want them
> to go live until you, the release manager, has signed off on them)
>
> Maybe I'm misunderstanding something about what we're talking about
> uploading...
>

When you create a tag, GitHub gives you the option of turning the
tag into a release and then gives you a space to upload release
assets associated with that tag. So the uploads don't go into
the actual git repository, they just go somewhere into the cloud. e.g.

https://github.com/llvm/llvm-project/releases/tag/llvmorg-12.0.0

I think the confusing part is that upload permissions are the same
as commit permissions for the repository.

Ooooh. OK - sorry for derailing the thread then. Thanks for explaining!

Yeah, I don't have any super great ideas then. If you don't mind
pruning the uploads later maybe that is the easiest option.

- Dave

Hello Tom,

I didn't really consider this option since it ends up with the
releases not being signed by you / LLVM.org and that more people had
access to upload binaries there. But this is of course an option and
is pretty easy for everyone involved.

-- Tobias

Hello,

Going to ping this again. To me there seems to be a short term fix
(reducing the overhead for the release manager) and the longer term
fix where we have a CI building the releases.

For the short-term it seems like the easiest solution is that we
switch from uploading to SFTP and just upload to github releases
directly.

The trade-offs against the current solution are:
* No signatures from one person
* All committers can upload and overwrite a release, note: this is
already possible since anyone can overwrite Tom's uploads already.

Are we ok with these trade-offs? In that case I think we should use
this for the LLVM 13 release.

I am also interested in seeing if we want to have "official" builds
from a CI (github actions?) where the testers would help make the
sysroots instead as David suggested in his email above. Is this
something we should pursue?

Thanks,
Tobias

I’ve got no particular stake in this myself, as I’m not involved in the LLVM release process, and we make our own releases for our downstream users. One thought I did have though was that we should be careful against malicious actors potentially swapping out the official binaries with some other executable (e.g. a virus or worse, a modified LLVM that inserts viruses/flaws into people’s code…). I’m not familiar enough with how the setup works etc, so I can’t comment on whether this is a real possibility or can easily be prevented one way or another.

From a security perspective, I don't see that using GitHub Releases is worse than the current process. I believe GitHub records which account has uploaded the release binaries, so there is an audit trail in case of tampering, which is the most that we can claim with the current process.

David

Hello,

Restarting this dicussion.

Going to ping this again. To me there seems to be a short term fix
(reducing the overhead for the release manager) and the longer term
fix where we have a CI building the releases.

For the short-term it seems like the easiest solution is that we
switch from uploading to SFTP and just upload to github releases
directly.

I would like to propose a variation of this idea:

I think we should have testers upload directly to GitHub, but keep the
rest of the process the same as now. So testers will still email me a
sha512 hash of the binaries they upload, and I'll still sign the binaries.

This means there will be a period of time where we have unsigned binaries on
the release page, but I think this is OK. People who care about the signatures
can just wait until I sign the packages, and people who don't care about the
signatures will get there builds faster.

For some added validation, we can create a release-testers team and automatically
delete any uploads from anyone not on that team.

- Tom