Expose More Inlining Opportunities

Hi Everyone,

During office hours with Johannes Doerfert we talked about splitting a function into smaller ones to expose more opportunities to function specialization and inlining. We do not have a fleshed out idea, this post is just to get feedback on what to explore. Let’s look at an example:

void foo(int x) {
  if (x > 10) {
    // lots of code
    ...
   
  } else {
    // little code
    a = 10;
  }

  // rest of foo
  ...
  return;
}


caller0(...) {
  foo(2);
  ...
  foo(3);
  ...
  foo(10);
}

Here, foo is too big to be inlined in each call site. Also note that specializing foo on x = 2 and x = 3 gives the same function, but the cost model in FunctionSpecialization will not know that.

Johannes came up with the following transformation:

static void foo_cond_true(int x) {
  // lots of code
  ...
}

static void rest_of_foo(int x) {
  ...
}

void foo(int x) {
  if (x > 10) {
    foo_cond_true(x);
    rest_of_foo(...);
  } else {
    // little code
    a = 10;
  }
  rest_of_foo(...);
  ...
  return;
}


void caller(...) {
  foo(2);
  foo(3);
  ...
  foo(15);
}

Since foo is small now, it can be inlined / specialized. The caller becomes:

void caller(int x) {
  a = 10;
  rest_of_foo(...);
  a = 10;
  rest_of_foo(...);
  ...
  foo_cond_true(x)
  rest_of_foo(...);
}

The point is that we are breaking up a big function into smaller ones and this creates more opportunities for the inliner and function specialization.

The above example is too simple and doesn’t illustrate all the trade offs / diffuculties involved, but hopefully it gives a rough idea of what we want to achieve. Our first question is:

Does the above seem like a useful transformation? Can we find / construct more test cases? Can we collect some statistics from benchmarks that would (perhaps indirectly) justify the need of this transformation?

UPDATE: There is a pass_split_functions in gcc (gcc/ipa-split.cc at master · gcc-mirror/gcc · GitHub) which, according to the description in the comments does something similar to what we want. I will collect the statistics how often gcc pass is doing something on SPEC. Also, I’ll try to measure performance change with that pass turned off in gcc.

Assuming the answer to (1) is “yes”, where should such transformation happen? These are the possibilities that come to mind:
-partial inliner
-inliner / specializer
-separate pass

Looking forward to your feedback and thanks for reading this!

1 Like

I though that instead of many independent passes, you could have very powerful passes with a global view.

There is the new module inliner. You could teach him to do outlining and specialisation.

module inliner