[RFC] Adding a new GH issue tag for fuzzer-generated issues

I believe I’m not the first to notice this issue, but I still want to bring it up.

We have a significant number of crash-related issues (not limited to just the Clang frontend, but also including LLVM IR and the backend), with new ones appearing almost every day. While this is not surprising for a large project like LLVM, I’ve realized that some of these issues may come from real projects, while others are not - they might be generated by fuzzers for academic research purposes.

A friend of mine privately shared some data with me today, which he collected partly out of curiosity and partly through some social engineering techniques. We were surprised to find that a considerable number of issues were filed for the purpose of publishing papers; some of these papers have already been published, and others might be under submission.

Although we are not opposed to using fuzzers to find bugs, and these issues are not necessarily harmful for us, however, they do waste the bandwidth and effort of we maintainers to some extent: they get triaged as crash-on-invalid, crash-on-valid or even just crash, mixing with and sometimes overwhelming issues from real users. AFAIK, currently on GitHub, we don’t have a handy approach to tell apart issues which are generated by fuzzers and which are not.

Since this data involves GitHub accounts and real identities/academic paper information, I’m only sharing some statistics here related to issues generated using fuzzers.

Meanwhile, we propose adding a new issue label e.g. “generated-by-fuzzers”, to distinguish whether the issue is from real users. Of course, this requires us to ask the OP about the source of the code - while not 100% reliable, it at least gives us a way to help maintainers filter out some low-priority issues.

Any feedback is appreciated.

7 Likes

I’m in favor of adding a label. I think there are three kinds of reports that are relevant in this context:

  • Miscompilation
  • Crash
  • Missed optimization

On the LLVM side, for miscompilations or crashes, we’ll usually not care a lot whether something is fuzzer-generated or not and fix it anyway, though there are exceptions. However, for missed optimizations, fixing fuzzer-generated reports is often not only a waste of time, but may be actively harmful to the project.

4 Likes

On the LLVM side, for miscompilations or crashes, we’ll usually not care a lot whether something is fuzzer-generated or not and fix it anyway, though there are exceptions.

Yeah, actually we don’t care much about the sources as well on the frontend side AFAICT.

However, following those github accounts I’m aware of, I could see some of the code snippets are quite ridiculous; for example, one of the cases I sifted through today (again, no github links because I don’t want to reprehend anyone):

#define d(b, c) b || b
#define aa(b, c) b || b
#define e(g, h, i, j, k, l) m(, g, 0, ) m(, g, h, ) m(, g, h, ) m(, g, 0, )
#define n(o, g, p, q, r)                                                       \
  if (d(__builtin_##g(p, q), ))                                                \
    if (__builtin_##g(p, q) || aa(__builtin_##g(p, q), ))
#define s(g, p, q, i, j, k, l, t, u, v, w, x, y, z, a, b, c, d, e)             \
  n(, g, 0, 0, ) n(, g, 0, q, ) n(, g, 0, 0, ) n(, g, 0, q, ) n(, g, p, q, )   \
      n(, g, p, 0, ) n(, g, p, q, ) n(, g, p, 0, ) n(, g, p, q, )
#define ab(b, c) b || b
#define m(o, g, h, r)                                                          \
  if (ab(__builtin_##g(h), ) || d(__builtin_##g(h), ))                         \
    if (d(__builtin_##g(h), ) || aa(__builtin_##g(h), ))                       \
      if (d(__builtin_##g(h), ) || aa(__builtin_##g(h), ))
#define ac(o, g, p, q, r)                                                      \
  if (ab(__builtin_##g(p, q), ) || d(__builtin_##g##f(p, q), ))                \
  d(__builtin_##g(p, q), ) || q
#define ad(g, p, q, i, j, k, l, t, u, v, w, x, y, z, a, b, c, d, e)            \
  ac(, g, 0, 0, );                                                             \
  ac(, g, 0, q, );                                                             \
  ac(, g, 0, 0, );                                                             \
  ac(, g, 0, q, );                                                             \
  ac(, g, p, q, );                                                             \
  ac(, g, p, q, )
void ae() {
  m(, cacos, 1, ) m(, cacos, 1, ) e(cacos, 2.34567F, , , , )
      e(casin, 2.34567F, , , , ) e(catan, 2.34567F, , , , ) m(, cacosh, 1, ) m(
          , cacosh, 1, ) e(cacosh, 2.34567F, , , , ) e(casinh, 2.34567F, , , , )
          e(catanh, 2.34567F, , , , ) e(csin, 2.34567F, , , , )
              e(ccos, 2.34567F, , , , ) e(ctan, 2.34567F, , , , )
                  e(csinh, 2.34567F, , , , ) e(ccosh, 2.34567F, , , , )
                      e(ctanh, 2.34567F, , , , ) m(, clog, 1, ) e(
                          clog, 2.34567F, , , , ) e(csqrt, 2.34567F, , , , )
                          s(cpow, 1, 0, , , , , , , , , , , , , , ,
                            , ) s(cpow, 1.F, 0, , , , , , , , , , , , , , , , )
                              ad(cpow, 2, 3, , , , , , , , , , , , , , , , );
2 Likes

Clang front-end maintainer here.

I have noticed fuzzer generated bugs reports in spurts over time and I have generally said we don’t really have the bandwidth to deal with a large amount of fuzzer generated bugs. Having said that we do welcome bugs that our users are likely to run into and will have a high impact if fixed. From time to time we do get some excellent fuzzer generated bug reports.

I worry a little bit that having a specific label is a form of tacit encouragement. If we go down that route then I think maybe we should document our policy on fuzzer generated bugs so that folks realize that while we don’t ban them we have limited bandwidth to process them.

2 Likes

Regarding miscompilation bugs, fuzzers are one of the more effective ways of actually finding these bugs. When real code behaves strangely, it is very hard to be sure that there actually is a miscompilation going on. However, fuzzer-generated code can be constructed in a way that the bug is guaranteed to be a miscompilation (rather than UB or non-determinism or something else). So I am somewhat concerned by the more-or-less explicit statement here that fuzzer-generated miscompilation bugs should be treated with lower priority – that means giving up on proactively finding such bugs, and instead only fixing them once they cause real pain.

Crash bugs are different, it is blindingly obvious that LLVM crashing is a bug in the frontend or backend. But I hope the policy can be set in a way that miscompilation bugs are still treated seriously no matter the source, or else I worry this will have a negative impact on the reliability of LLVM as a backend. It would be concerning to see tons of resources being thrown at development of new features while treating quality assurance with low priority.

Disclaimer: I am an academic researcher and have worked on fuzzing to find miscompilation bugs. That fuzzing campaign resulted in a total of 12 LLVM bugreports (the frontend that the fuzzer was targeting is out-of-tree so these were all using LLVM IR as the input), most of which are already fixed. We were hoping that the community (and the companies relying on LLVM as a critical piece of infrastructure) would appreciate our efforts of finding miscompilation bugs before they occur in production.

That is what I hoped for. :yellow_heart:

2 Likes

Right, for miscompilations fuzzer-generated reports are very much appreciated. They also have the nice benefit of being fairly small, which makes it easier to identify the root cause. You can often analyze and fix a fuzzer-generated miscompilation report within the hour, while reducing real-life miscompilations can take days, depending on how bad your luck is :slight_smile:

Knowing that an issue is fuzzer-generated can still be valuable information. For example, we have some well-known miscompilations that are both hard to fix and vanishingly unlikely to occur in real-world code (e.g. the LICM scalar promotion + stack coloring interaction). If another fuzzer re-discovers the issue, that’s not interesting. But if someone were to actually encounter that in real code, my prioritization of the problem would change significantly.


The main area where I would actively discourage fuzzer-generated reports are missed optimizations. If there is no evidence that a pattern can also be found in real-world code, then we should absolutely decline to fix it as a default policy.

4 Likes

On the Clang side of things, we get a fair number of fuzzer-generated bug reports, but the majority of the time they are assertion failure/crash bugs rather than miscompilations or usability issues.

Given Hyrum’s Law, catching those crashing bugs can be useful sometimes because users may be hitting them. However, a fair number (most?) of the generated test cases are not anything I would expect to see hit in the wild all that often.

I would support a tag to label a test case that was generated by a fuzzer. It helps to identify issues that may be of lower priority or it may help identify issues for new contributors to work on (tend to be low risk, small changes to fix the issues). That said, I would not want to see the tag used as a signal that an issue is not worth looking into at all.

1 Like

I would support a tag to label a test case that was generated by a fuzzer. It helps to identify issues that may be of lower priority or it may help identify issues for new contributors to work on…

I’d suggest making the tag settable by the user filing the bug, even if s/he doesn’t normally have permission to add tags (which seems to be the default).

I’d further suggest creating other priority-lowering tags and giving unprivileged users permission to add those as well. For instance, I’m currently running other vendors’ target-specific Lit tests against our fork. Test failures in such cases obviously aren’t necessarily bugs, and even assertion failures may be a low-priority mistaken choice in lieu of an error message. But some such bugs are worth filing, and it may make it easier for the maintainers if the filer can mark them suitably — segfaults may contribute to a security vulnerability, for example, and out-of-memory issues can be quite an inconvenience.

Other possible reasons for such tags are bugs found with sanitizers and/or extra verification options, and errors in --help text.

This is not something github allows and I don’t think we’re going to enable Triage role for all non-contributors.

This is not something github allows…

Ouch, thanks — one more reason I miss working with a real bug database.