Hello to everyone in the llvm community!
I’m Derek, founder of ostif.org and we are interested in assisting the llvm project with meaningful security support for llvm.
We’d like to start some discussion here about how we can best use our resources to help without being burdensome on the maintainers or finding non-issues. It is extremely important to us that if we spend time and resources on llvm that it is relevant to you and actually improves your project.
I’ve tried to learn about llvm, the community, and the project’s general security posture through as much documentation and community discussion that I can find. I do have some early observations that I’d like to confirm.
-The project is running coverity scans regularly and addressing concerns as they are raised.
-The project is on Google ossfuzz but has a backlog of issues that are unaddressed and most of the scans appear to be a few years old. (We see this often. This is usually because the fuzzers are mostly spamming non-issues that are outside of the security model of the project.)
-I’ve heard (anecdotally) that because llvm is mostly run locally on trusted source, or it is run remotely on hermetic VMs that are spun up every run that reliability issues such as generic crashes and assert triggers are not considered to be security issues. This makes sense to me.
However, this introduces a problem, as triggering crashes is how fuzzing finds issues, which means that easily triggered crashes act as blockers for the fuzzers to find more serious problems underneath.
We’ve talked extensively to our advisory council about how to best help llvm and how to address this problem without being a burden on the community. What we’ve come up with is building and documenting a threat model for llvm. One that looks at risks for the various functionality that the llvm core provides, and specifically spells out classes of bugs that might be excluded from that threat model because they aren’t relevant to llvm. The goal here is to identify what a security problem in llvm actually looks like, and also to identify what a non-issue is to eliminate reports that outside of the scope of your threat model.
If there is any supporting documentation that the community has related to “what is a security bug in llvm” or even a previous threat model, it would really help us throughout this process. If not, we can build something from scratch and we have a lot of experience working from zero.
Once we have a threat model established, we’d like to work on the ossfuzz implementation to achieve multiple goals. We want to expand code coverage to the riskiest areas in “llvm core” if they are not currently covered, and build a stronger set of fuzzers that don’t report as many non issues. This would have to be coupled with our team working on crash fixes as merge requests to sidestep the fuzzing issues created by simple reliability issues like asset triggers, or low/informational issues discovered by the fuzzers. As coverage expands, if we find any serious issues, we would report those through the regular llvm security channels for addressing.
This seems to be the least burdensome solution that we can come up with without your feedback so far.
- Build a threat model
- Work on fuzzers with prioritization for the identified high risk functions
- Tighten up existing fuzzers to reduce reports that are non-issues
- Submit MRs where necessary to fix minor issues (following your contributing guidelines and regular processes of course)
4a. Any major security findings get reported through regular channels responsibly
- If this work is deemed successful and valuable by all parties involved, look into scheduling more follow up work to continue improving things.
I do have a couple of questions for the community, and I’m perfectly aware that there’s a lot of opinions here and there may be some disagreement on of the finer points, these questions are just intended to help guide us along our way to help do meaningful work.
What do we consider to be “llvm core?” Can you point us to the repos that everyone MUST install to do anything with llvm? If we want to go one step further, which repos would be installed with a “typical” setup of llvm? We can also do this on our own by following basic setup guides and looking at what people are using llvm for the most, but we don’t want to miss any common use cases or omit a repo that is important to most users.
Does this approach sound like it will help? If there are complications introduced by the idea, or if we create an undue burden on the maintainers through something we’ve missed, let us know so that we can revise the plan and make it more useful.
Has a threat model for llvm ever been created before? Or are there docs identifying what the security model for llvm looks like? This is crucial because it will help us eliminate all of the non-issues and spam reports that you could get from us, as we can screen them carefully and only report things that matter. This is important for both the fuzzer design and our manual reporting of more serious issues.
Do you have any questions for me? You can post them here, or reach out to me directly via email. I’m at firstname @ orgs url. We are open to all discussion and collaboration here, and anything we can do to help is on the table.
Thank you all for your tireless work on llvm. It is one of the best projects out there and i’m looking forward to helping if we can!