[GSoC 2024] The 1001 thresholds in LLVM

Description

LLVM has lots of thresholds and flags to avoid “costly cases”. However, it is unclear if these thresholds are useful, their value is reasonable, and what impact they really have. Since there are a lot, we cannot do a simple exhaustive search. In some prototype work we introduced a C++ class that can replace hardcoded values and offers control over the threshold, e.g., you can increase the recursion limit via a command line flag from the hardcoded “6” to a different number. In this project we want to explore the thresholds, when they are hit, what it means if they are hit, how we should select their values, and if we need different “profiles”.

Expected outcomes

Statistical evidence on the impact of various thresholds inside of LLVM’s code base, including compile time changes, impact on transformations, and performance measurements.

Confirmed mentors and their contacts

Required / desired skills

Required:

  • Profiling skills and knowledge of statistical reasoning

Desired:

  • Good understanding of the LLVM code base and optimization flow

Size of the project.

medium

An easy, medium or hard rating if possible

easy

4 Likes

Whoah, this project sounds incredibly cool!

Where could one look at some starter code?

Starter code, hm.

I would suggest two things to start with:

  1. Take a look at the LLVM-IR ComPile dataset: llvm-ml/ComPile · Datasets at Hugging Face. One can use it to do large scale studies.
  2. Look at LLVM to determine different parameters one can “tune”, like command line options, pass ordering, hardcoded constants, etc.

If you want to run things, you can combine those two and do a small study, e.g., what’s the impact on compile time if you change some hardcoded constant to either 1 or 100000. That should give you an idea, though for the real project we need a proper plan, we’ll also run the programs, and we will look at statistic outputs and other signals.

2 Likes

Huh, I wasn’t too careful with watching the download size and left myself only 500MB on my home disk when downloading the dataset. :sweat_smile:

I’m guessing to churn through the whole dataset some cloud computing is in order, right?

If I am not wrong I am guessing that we would need to use some ML algorithms to predict the most optimal value for a hard-coded constant or whether such a hard-coded constant is of any value or not. But seeing the size of the data, we would first need to reduce the dataset possibly by feature selection using PCA or by removing the frequency of redundant data points. Afterward, I suppose it is left to us to benchmark these different thresholds. Can you please provide the link to the C++ class that you introduced, although it won’t be useful in the current context, looking at it would certainly allow me to explore more such thresholds. Also what exactly do you mean by the profiles here?

and if we need different “profiles”.

1 Like

That might be an option, but the question immediately is: How do you train such algorithm?

I don’t think there is “too much data”, you can always sample it.

I didn’t find it when I looked last time. Might be lost. It basically was a wrapper around a number that allowed you to compare it and increment/decrement it. Internally it stored what you did and if things like comparison ever yielded true and false (which means the threshold was hit). One can make them more specific to the use case, e.g., one for decreasing counters, one for upper bounds, etc. to take away some of the guesswork.

Maybe a counter is useful to reduce compile time but modifying it gives much better vectorization percentage. That would be a good candidate for some sort of user-facing trade-offs, aka. profiles. You can pick “compile fast” or “executable fast”, in addition to O1/2/3 etc.

I have been going through the Complile dataset and I realized that since we have information about the LLVM IR for different compiler flag usages, we can first profile a few compiler flags and check whether it is correct or not. We can then feed this labeled data into our algorithm and use it to classify other compiler flags ie. set thresholds for them. This does seem a little subjective to me at the moment, but I will refine it if u think this is heading in the correct direction.

Indeed, I can sample it for an initial study using hugging face's take or shuffle method but for a later study, where accuracy would be our priority, I think using the complete dataset would be prudent.

I see, so something that is working behind the scenes as the user tries different hardcoded values to set certain benchmarks which can then later be used to fine-tune the software for optimal performance. I think we can certainly use something like this for the project as well.

I understand what you mean here. Just one thing that I need clarification on. Since there are several things to fine-tune within LLVM, would you be providing a list of them (I guess there are 1001 from the name of the project)? Right now I am looking to fine tune these parameters :

optimization options —>

-O0, -O1, -O2, -O3
-Os
-Ofast
-mllvm
Debugging Options —>

-g, -gline-tables-only, -gline-directives
-ggdb
-Og

Code Generation Options —>

-shared, -static, -dynamic
-flto

Language Options —>

-std=xxx
-fxxx

Apart from that I have converted the bitcode to clang IR representation and have started a small study on the same.
Thanks

OpenTuner might be useful:

https://opentuner.org/tutorial/gettingstarted/

It’s small, easy to use and fits your use-case quite well from what I can tell. Although, I used it only once or twice, so I have no deeper knowledge about it.

Can this framework be extended to LLVM-IR as well because I think the project will mostly revolve around the Compile dataset? Perhaps we can create a separate tool like an open tuner that is compatible with IR. Perhaps this paper is more useful.

No need to extend OpenTuner, it is not specific to a language.
Here is how to tune the linux sleep command for the fastest sleep between 1s and 3s:

import opentuner
from opentuner import ConfigurationManipulator
from opentuner import IntegerParameter
from opentuner import MeasurementInterface
from opentuner import Result


class SleepTimeTuner(MeasurementInterface):

    def manipulator(self):
        """
        Define the search space by creating a
        ConfigurationManipulator
        """
        manipulator = ConfigurationManipulator()
        manipulator.add_parameter(IntegerParameter("sleep_time", 1, 3))
        return manipulator

    def run(self, desired_result, input, limit):
        """
        Compile and run a given configuration then
        return performance
        """
        cfg = desired_result.configuration.data
        sleep_cmd = f"sleep {cfg['sleep_time']}"
        run_result = self.call_program(sleep_cmd)
        assert run_result["returncode"] == 0
        return Result(time=run_result["time"])

    def save_final_config(self, configuration):
        """called at the end of tuning"""
        self.manipulator().save_to_file(configuration.data, "mmm_final_config.json")


if __name__ == "__main__":
    argparser = opentuner.default_argparser()
    SleepTimeTuner.main(argparser.parse_args())

Yes, it does seem that opentuner can get the job done and it uses statistical algorithms to reach at the optimal value for a threshold. @jdoerfert, can you confirm whether or not this is moving in the right direction, I am using the LLVM IR from compile, compiling it using llc, and benchmarking various flags provided using opentuner.

Things like:

or

Right, These can be easily tuned using open tuner. One last thing I need clarification on is when u gave the compile dataset, was I to finetune these values itself? If so, how? From the script that I wrote, I could only tune the value of command line flags.

Some of them are command line flags, other could be made compile time flags to support the exploration.

1 Like

Right. So now that I have a pretty good understanding of the problem and what needs to be done,

  1. The first thing to do is to figure out which part of the code needs to be tuned, ie. the part that you mentioned above. This will need an in-depth exploration of the llvm codebase. Though I am familiar with it, finding out these specific places would certainly require a thorough go-through.
  2. Tools Like OpenTooler can be used for exploration. This supports adding an algorithm also. I will search for more such profiling tools and define how the parameters need to be profiled.
  3. I will drop an initial study of profiling some of these values soon.

@jdoerfert , I am dropping an initial study. I faced a roadblock while using opt to get -stats for InstructionSimplification. I can’t seem to generate the bitcode file from cmake while building llvm that is required to use opt. I wrote some cpp code :

#include "llvm/ADT/Statistic.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/raw_ostream.h"
#define DEBUG_TYPE "mypassname"
using namespace llvm;

STATISTIC(NumFunctionCalls, "Number of function calls");
STATISTIC(NumInstructions, "Number of instructions processed");

void myFunction() {
  ++NumFunctionCalls;
  for (int i = 0; i < 10; ++i) {
    ++NumInstructions;
  }
}

int main(int argc, char **argv) {
  cl::ParseCommandLineOptions(argc, argv);
  for (int i = 0; i < 5; ++i) {
    myFunction();
  }
  errs() << "Statistics:\n";
  errs() << "  Function Calls: " << NumFunctionCalls << "\n";
  errs() << "  Instructions Processed: " << NumInstructions << "\n";
  return 0;
}

To benchmark recursion_limit and put it in the examples folder in llvm. While it builds and runs successfully, I can’t get its bitcode file. Can you please tell me what changes would be required in the CMakeLists.txt file in order to do so.

I’m a bit lost. What is this code about?
If you want IR, you can generate it from some C example via clang (e.g., -emit-llvm -O3), or you use code in ComPile.
The LLVM test suite allows you to get stats output, compile time and runtime numbers for a given compiler or compiler options.
That might be the easiest way to do some initial experimentation.
Large studies should go through ComPile.

1 Like