How to optimize conditional recusive functions?

ChuanqiXu · April 6, 2022, 2:57am

Here is the brief example: Compiler Explorer

Simply, GCC would optimize the following code successfully while clang wouldn’t:

int clang_failed(int x, int y) {
    if (x == 0)
        return y;
    else
        return clang_failed(x - 1, y * x);
}

GCC would optimize it into a simple form:

clang_failed(int, int):
        movl    %esi, %eax
        testl   %edi, %edi
        je      .L5
.L2:
        imull   %edi, %eax
        subl    $1, %edi
        jne     .L2
.L5:
        ret

while clang would translate it honestly.

Originally I thought there might be some optimization passes that clang is missing. But I found this is more interesting after I made some simple testing. For example. both GCC and clang would optimize following code:

int success1(int x, int y) {
    if (x == 0)
        return y;
    else
        return success1(x - 1, y++);
}

But clang would optimize following codes while gcc refused to do:

int gcc_failed(int x, int y) {
    if (x == 0)
        return y;
    else
        return gcc_failed(x - 1, y + x);
}

BTW, clang would optimize it as:

gcc_failed(int, int):                       # @gcc_failed(int, int)
        mov     eax, esi
        test    edi, edi
        je      .LBB2_2
        lea     ecx, [rdi - 1]
        lea     edx, [rdi - 2]
        imul    rdx, rcx
        imul    ecx, ecx
        shr     rdx
        add     eax, edi
        add     eax, ecx
        sub     eax, edx
.LBB2_2:
        ret

which looks a little bit longer than the first version of GCC does. I thought the two codes are homogenized.

So the problem looks interesting. Does anyone knows what’s the pass to optimize it in clang? And if anyone know why clang failed to optimize the first one but succeed with the last one?

Thanks,
Chuanqi

jdoerfert · April 6, 2022, 3:43am

It’s not the recursion. That is folded away for this trivial cases.
LLVM doesn’t seem to have a closed form for this recursive/iterative function: Compiler Explorer

ChuanqiXu · April 6, 2022, 3:52am

@jdoerfert

Oh, I guess I found the problem. It is related to the phase ordering. The optimization in clang is disabled by unrolling and vectorization. We could observe that clang could optimize it if we disabled the unrolling and vectorization: Compiler Explorer

Do you think it is an issue or a feature? I mean should we try to fix this one by adjusting the pipeline?

ChuanqiXu · April 6, 2022, 4:00am

Add @aeubanks. Maybe he could help with the phase ordering problem.

efriedma-quic · April 6, 2022, 10:40pm

I’m not sure what you think the problem is with clang’s output. I mean, the vectorizer is probably underestimating the cost of vectorizing a 32-bit multiply without SSE4.1. That doesn’t have anything to do with phase ordering, though…?

ChuanqiXu · April 7, 2022, 3:28am

Do you mean that the default output is better than the output with -fno-unroll-loops -fno-slp-vectorize -fno-vectorize? Oh, I didn’t do benchmarking for it… I’ve just feel the output of the later is shorter…

Topic		Replies	Views
Missing optimization case in not equal (!=) binary operator IR & Optimizations	0	239	December 8, 2022
Weird optimization Using Clang	3	58	February 23, 2015
Question about ExprConstant optimization of IR stage Clang Frontend	0	63	November 28, 2013
Optimize a specific function Clang Frontend	0	53	June 28, 2011
Clang optimizing an infinite loop Clang Frontend	2	323	October 19, 2021

How to optimize conditional recusive functions?

Related Topics