Implementing if-conversion as a GSoC 2015 project?

Hi,

Are you guys interested in implementing if-conversion as a GSoC 2015 project? Last year, I did a literature review about approaches of if-conversion and the if-conversion in LLVM. This was the undergraduate thesis of my bachelor degree. It seems that, the if-conversion used in LLVM is a very simple approach instead of following the literature. So I want to implement the approaches in the literature in LLVM and get support from Google for this. Are you guys interested?

Best,

Dear Xiang,

Can you briefly describe the limitations with the current if-conversion in LLVM and how you plan to improve it? I haven't read your thesis, and I (and others) are unlikely to have time to do so.

Regards,

John Criswell

OK, Let me describe. There is nothing wrong with if-conversion in LLVM. The algorithm implemented in LLVM can handle the if(???){do something} and if(???){do something}else{do something else} case very well. But it can handle complicated case like when there are a lot of gotos in the program. The more systematic way to do if-conversion is based on Hyperblock [Scott A. Mahlke et al 1992], which is set of basic blocks with one entrance and one or more exits. The algorithms now implemented in LLVM is very easy to understand, and very easy to implement. But it is not general, and it might not generate code with best efficient.

As we know, if-conversion is a double-edged sword, it might increase the efficiency, it might also make it worse. For example, if the program to be compiled have a code like:
if () {
execute 1000 statements
} else {
execute 1 statements
}
In the case when
is almost always true, it is very efficient because it avoid branching with only executing one more statement as punishment. The improvements of performance from eliminating the branching is always larger than 1 cpu cycle. However, in the case when is almost always false, doing if-conversion is a stupid decision because the improvements of eliminating the branch is very limited here, but the punish, that 1000 more statements have to be executed everytime, really hurt.

There are studies on how to make a balance in whether to do if-conversion or not. There are both static policy [David I. August et al 1997] and dynamic policy [Kim M. Hazelwood and Thomas Conte 2000]. What I want to do for GSoC is to implement these approaches into LLVM.

OK, Let me describe. There is nothing wrong with if-conversion in LLVM. The algorithm implemented in LLVM can handle the if(???){do something} and if(???){do something}else{do something else} case very well. But it can handle complicated case like when there are a lot of gotos in the program. The more systematic way to do if-conversion is based on Hyperblock [Scott A. Mahlke et al 1992], which is set of basic blocks with one entrance and one or more exits. The algorithms now implemented in LLVM is very easy to understand, and very easy to implement. But it is not general, and it might not generate code with best efficient.

Questions:

  • You are not going to get good performance out of hyperblock without global scheduling. Are you planning to adopt the MI schedule to handle HB? Will it be predicate aware?
  • What architecture are you targeting? HB makes more sense for ISA with lots of predicated registers.

Evan

It would help most if you had key examples/benchmarks where if conversion will provide (additional) performance benefits. Literature is a limited guide. For example, in hindsight many early papers on Itanium oversold the potential benefits - which is not necessarily their fault: branch predictors got better over time, global scheduling techniques improved over time etc. In the specific example of the Mahlke paper if-conversion turned out to improve global scheduling in their compiler. With a better global scheduler the result is different.

When you scope the design it will be helpful when the pass can be invoked on demand for a region by loop transformations and in the codegen. Specific example where if conversion enables e.g. vectorization or can eliminate targeted data dependent branches (on top of what can be done today) is where I think the benefits will be. The more specific examples you have the less likely you’ll get lost in the morass of compiler changes that get triggered by the ‘advanced’ conversion schemes.

-Gerolf

When dealing with if-conversion, there are two problems: the first is how to do if-conversion, the second is how to use if-conversion to make the code generated faster. When talking about the first problem, we think nothing about how if-conversion is going to be used.

The way that LLVM deal with the first problem is to consider the case of triangle, diamond, etc. This approach is not complete, i.e. there are cases this approach can’t deal with. A better way to do is the RK algorithm[Park 1992], which can deal convert any control flow graph without loop into a single basic block and “minimize both the number of predicates in use and the of defining operations necessary”. Figure 1 of Park paper is one example that LLVM’s algorithm can’t deal with but the RK algorithm can. Replacing the current algorithm with RK algorithm benefits everywhere when if-conversion is used, including vectorization.

For the second problem, if-conversion might improve efficiency in the following ways:

First, if-conversion can merge several basic block into one large basic block, then global scheduling problem reduces to a local scheduling problem. Actually there are approaches that take advantage of this feature of if-conversion to do global scheduling [NJ Water 1993]. I have no idea about how much improvements implementing if-conversion into it will make.

Second, properly using if-conversion can avoid punishment of failing to predict the branching. As Gerolf says, the improvement of branch predictor and global scheduler might make it not as attractive as in literature.

Third, if-conversion makes it possible to vectorize loops with branching inside. The vectorization has already been implemented in LLVM, so the benefit will be the benefit of RK algorithm.

Unfortunately I don’t have any specific example yet. Benchmark can’t be done before it is implemented. If unluckily the benchmark shows almost no improvement, all the work done to implement this part then becomes useless.

What I plan to do is:

  1. Replace the algorithm of if-conversion with RK algorithm.

  2. Adopt the scheduler to handle hyperblock, as Evan says.

Answer Evan’s question

  • What architecture are you targeting? HB makes more sense for ISA with lots of predicated registers.

Good question. The RK algorithm is architecture independent. And for the scheduler, I haven’t decided yet. It depends on the time table. I don’t know how much time it will take to do the coding, so it’s hard to say now. Candidates with priority might be: the one with most predicated registers, the one with poorest branch predictor, and the one that is most widely used.

Xiang Gao

It would help most if you had key examples/benchmarks where if conversion will provide (additional) performance benefits. Literature is a limited guide. For example, in hindsight many early papers on Itanium oversold the potential benefits - which is not necessarily their fault: branch predictors got better over time, global scheduling techniques improved over time etc. In the specific example of the Mahlke paper if-conversion turned out to improve global scheduling in their compiler. With a better global scheduler the result is different.

When you scope the design it will be helpful when the pass can be invoked on demand for a region by loop transformations and in the codegen. Specific example where if conversion enables e.g. vectorization or can eliminate targeted data dependent branches (on top of what can be done today) is where I think the benefits will be. The more specific examples you have the less likely you’ll get lost in the morass of compiler changes that get triggered by the ‘advanced’ conversion schemes.

-Gerolf