Bugpoint Redesign

Hey all,

I wanted to share a proposal to revamp the current go-to IR debugging tool: Bugpoint. i’d love to hear any feedback or general thoughts.

Here’s the markdown version of the doc:

Hey all,

I wanted to share a proposal<https://docs.google.com/document/d/171ecPTeXw68fbCghdGw_NPBouWvmvUX8vePlbhhHEdA/edit?usp=sharing> to revamp the current go-to IR debugging tool: Bugpoint. i'd love to hear any feedback or general thoughts.

Here's the markdown version of the doc:

### Narrow focus: test-case reduction
The main focus will be a code reduction strategy to obtain much smaller test cases that still have the same property as the original one. This will be done via classic delta debugging and by adding some IR-specific reductions (e.g. replacing globals, removing unused instructions, etc), similar to what already exists, but with more in-depth minimization.

Granted, if the community differs on this proposal, the legacy code could still be present in the tool, but with the caveat of still being documented and designed towards delta reduction.

As Hal points out, there are really two dimensions of reduction you can do with bugpoint. One is delta debugging of passes, to figure out which pass is causing the problem, and another is regular delta debugging of the program itself. Supporting both use cases is important. My personal experience, however, has been to only use bugpoint for delta debugging of code, as I'm trying to work out what features of the input programming is causing the pass I'm working on to crash. I can't say what the relative balance of these two use cases are, and it may be that we can solve this by having two versions of bugpoint to solve the two different problems.

### Command-Line Options
We are proposing to reduce the plethora of bugpoint’s options to just two: an interesting-ness test and the arguments for said test, similar to other delta reduction tools such as CReduce, Delta, and Lithium; the tool should feel less cluttered, and there should also be no uncertainty about how to operate it.

I am /strongly/ in favor of going to an interesting-ness test approach in lieu of the current "try and guess what kind of bug you're looking for" in the current approach. Writing correct test detection scripts is definitely a challenge, but you can usually crib from a preexisting script.

A second note is that we can provide much of the current "automatic" functionality in bugpoint via a set of useful test scripts, one for each mode.

"Finkel, Hal J. via llvm-dev" <llvm-dev@lists.llvm.org> writes:

One concern that I have is that, from personal experience, the ability
for bugpoint to reduce the set of optimization passes applied in order
to reproduce a bug is extremely helpful. I understand your desire to
decouple the logic somewhat, and maybe there's some way to generalize
that functionality by enabling simultaneous delta reduction on some
secondary inputs (some of which may happen to be a pass list), but I'd
like to see us somehow retain that capability to isolate the
problematic set of transformations.

I wonder if this might be better as a separate tool. The functionality
is defintely useful. In fact I'd like to see it enhanced by using
DebugCounters when available. This will require some smarts for
DebugCounters to either report themselves to the tool or for passes to
report their DebugCounter-ness.

bugpoint currently has the ability to debug miscompiles by splitting
the code into a "known good" set of functions (which is puts into a
separate library) and the remainder of the code (which, in theory, is
the smallest part of the code necessary to reproduce the bug). This
has also proven useful in the past. Is this something you intend to
keep?

Yes, this is also very useful and would fit in with whatever tool does
the pass reduction mentioned above. The first step would be to find the
subset of code that is miscompiled. The second would be to do pass
reduction (with use of DebugCounters) over that miscompiled piece of
code. Isn't this how bugpoint basically operates in miscompile mode
now?

My largest set of problems with bugpoint has been that bugpoint's
logic for doing things like loop extraction will themselves often
crash, and also, there's no easy way to start with multiple input
files (e.g., what you start with from a program with multiple source
files).

I've always just generated .ll files and linked them manually before
starting the bugpoint process. I'd think it would be straightforward to
have bugpoint/whatever take a set of input files and do the link step
itself.

One other aspect of bugpoint/whatever that will become more important
since flang is now an official project is the ability to specify a
linker to use. Analyzing Fortran codes will require the tool to know
which Fortran compiler to link with since there is no standard Fortran
ABI or runtime interface. The copiler used to link must be the same one
used to generate the IR. When mixing Fortran and C/C++, the tool will
need to know to use the Fortran compiler to do the link (and also link
in the C/C++ runtimes).

                         -David

Diego Treviño via llvm-dev <llvm-dev@lists.llvm.org> writes:

If the test accepts any arguments (excluding the input ll/bc file), they are given via the following flag:
        `--test_args=<test_arguments>`

I worry a little bit about oddball test arguments, for example arguments
with spaces. Maybe that's a corner case we don't care about but to
handle it bugpoint could also support something like:

--test-arg <arg1> --test-arg <arg2> ...

                              -David

At the moment, bugpoint has three major use cases: crash reduction, miscompile reduction, and mutation fuzzing. Out of these, a huge proportion of the interface complexity comes from the miscompile handling.

I generally agree with removing the auto-detection logic. I’ve found it to be extraordinarily error prone and confusing.

Interface wise, I might suggest something in the spirit of sub-tools (i.e. git or svn). As possible example:
bugpoint crash-reduce
bugpoint miscompile-reduce
bugpoint mutate

In addition to these high-level commands, it may also be useful to expose individual reduction steps. I find myself frequently wanting to run only individual reduction steps (and have hacked up my local bugpoint to allow this) via a wrapper script. Having first class support for “bugpoint reduce-step functions <input.ll>” would be awesome.

Another idea would be to move all of the complexity of test formation into a separate command. Rather than having the tool detect which opt to use as part of reduction, instead have a generate command which generates a script which is then used for reduction. (i.e. make everything use the custom mode, while still proving helpers to generate). This is probably more natural for crash reduction instead of miscompile reduction, but maybe we could make it work for both? Or maybe if we split the two commands (and thus their interface) it doesn’t really matter.

Philip

p.s. Bugpoint is a fairly critical tool. If we start rewriting it, making sure it continues to work through the process will be critical. We don’t have much in the way of testing for it today, and that would need to change.

"Finkel, Hal J. via llvm-dev" <llvm-dev@lists.llvm.org> writes:

One concern that I have is that, from personal experience, the ability
for bugpoint to reduce the set of optimization passes applied in order
to reproduce a bug is extremely helpful. I understand your desire to
decouple the logic somewhat, and maybe there's some way to generalize
that functionality by enabling simultaneous delta reduction on some
secondary inputs (some of which may happen to be a pass list), but I'd
like to see us somehow retain that capability to isolate the
problematic set of transformations.

I wonder if this might be better as a separate tool. The functionality
is defintely useful. In fact I'd like to see it enhanced by using
DebugCounters when available. This will require some smarts for
DebugCounters to either report themselves to the tool or for passes to
report their DebugCounter-ness.

I certainly agree that integration with DebugCounters, or similar, would
be a useful enhancement. I think that this is especially true for
analysis results. I've done this several times manually in order to
narrow down a single problematic alias-analysis query result.

bugpoint currently has the ability to debug miscompiles by splitting
the code into a "known good" set of functions (which is puts into a
separate library) and the remainder of the code (which, in theory, is
the smallest part of the code necessary to reproduce the bug). This
has also proven useful in the past. Is this something you intend to
keep?

Yes, this is also very useful and would fit in with whatever tool does
the pass reduction mentioned above. The first step would be to find the
subset of code that is miscompiled. The second would be to do pass
reduction (with use of DebugCounters) over that miscompiled piece of
code. Isn't this how bugpoint basically operates in miscompile mode
now?

My largest set of problems with bugpoint has been that bugpoint's
logic for doing things like loop extraction will themselves often
crash, and also, there's no easy way to start with multiple input
files (e.g., what you start with from a program with multiple source
files).

I've always just generated .ll files and linked them manually before
starting the bugpoint process. I'd think it would be straightforward to
have bugpoint/whatever take a set of input files and do the link step
itself.

You linked them with llvm-link? That can change the result because of
inlining, etc. I've also created wrapper scripts to link in other .ll
files, etc. but the problem is that, often, I don't know in which .ll
file is the miscompiled code. Maybe I should have automated this a long
time ago, but I've always ended up writing shell scripts to try to
iterate over all of the .ll files and run bugpoint on each one in turn
(linking in the remaining ones) to try to find the one being
miscompiled. One problem, of course, is that this scales poorly - a
binary search would be better.

Thanks again,

Hal

"Finkel, Hal J. via llvm-dev" <llvm-dev@lists.llvm.org> writes:

I've always just generated .ll files and linked them manually before
starting the bugpoint process. I'd think it would be straightforward to
have bugpoint/whatever take a set of input files and do the link step
itself.

You linked them with llvm-link? That can change the result because of
inlining, etc.

True. It's been a very long time since I've done this (like a decade).

I've also created wrapper scripts to link in other .ll files, etc. but
the problem is that, often, I don't know in which .ll file is the
miscompiled code. Maybe I should have automated this a long time ago,
but I've always ended up writing shell scripts to try to iterate over
all of the .ll files and run bugpoint on each one in turn (linking in
the remaining ones) to try to find the one being miscompiled. One
problem, of course, is that this scales poorly - a binary search would
be better.

Yes, this would be nice functionality to have.

                       -David

At the moment, bugpoint has three major use cases: crash reduction, miscompile reduction, and mutation fuzzing. Out of these, a huge proportion of the interface complexity comes from the miscompile handling.

I generally agree with removing the auto-detection logic. I've found it to be extraordinarily error prone and confusing.

I'm not sure if I said this previously, but +1 to this. I don't recall a situation where I wasn't sure whether the problem was the compiler crashing or whether the problem was that the code was miscompiled. Even in a CI-type setup, a failure in the compiler step and in the run step can be distinguished by the relevant scripts.

-Hal

Interface wise, I might suggest something in the spirit of sub-tools (i.e. git or svn). As possible example:
bugpoint crash-reduce
bugpoint miscompile-reduce
bugpoint mutate

In addition to these high-level commands, it may also be useful to expose individual reduction steps. I find myself frequently wanting to run only individual reduction steps (and have hacked up my local bugpoint to allow this) via a wrapper script. Having first class support for "bugpoint reduce-step functions <input.ll>" would be awesome.

Another idea would be to move all of the complexity of test formation into a separate command. Rather than having the tool detect which opt to use as part of reduction, instead have a generate command which generates a script which is then used for reduction. (i.e. make everything use the custom mode, while still proving helpers to generate). This is probably more natural for crash reduction instead of miscompile reduction, but maybe we could make it work for both? Or maybe if we split the two commands (and thus their interface) it doesn't really matter.

Philip

p.s. Bugpoint is a fairly critical tool. If we start rewriting it, making sure it continues to work through the process will be critical. We don't have much in the way of testing for it today, and that would need to change.

Hi everyone,

Thanks so much for all the feedback, I’ll keep all your comments and suggestions in mind. So, for the moment I will focus on building the IR-Reduction tool, and once that’s done, I will work on integrating it to the existing bugpoint, either as a sub-tool (as Philip suggested) or as another debug strategy.
And once that’s finished I’ll be working on implementing all your thoughtful suggestions to improve the tool’s functionality.

Cheers,
Diego

I would also welcome to improve bugpoint to reduce a file more than it
currently does. For instance, the control flow of invoke is never
removed by bugpoint, attributes/metadata are not reduces, etc. This
required me to manually further edit a reduced file.

Thank you for working on improving bugpoint.

Michael

Hey all,

I wanted to share a proposal to revamp the current go-to IR debugging tool: Bugpoint. i’d love to hear any feedback or general thoughts.

Hi Diego,

This sounds super awesome, I’d love to see Bugpoint get a thorough rethink.

It sounds like you’re proposing two things though: 1) you’re making the internal design more principled and putting the various interesting-ness tests into a better structure, 2) You’re proposing removing of the ‘automatic’ stuff (which I agree is not great).

You’re likely to get pushback on this from people who like or use #2. Would it make sense to just deemphasize it by moving the auto feature under an explicit flag or something? That way, it is a strict improvement over what we have now, instead of regressing on a bit of functionality.

-Chris