[RFC] Proposing optimization for special kind of switches on enums

I discovered that sometimes you can optimize switch statements
by reducing the number of bits they match on.

For example,

YOU CAN CHECK EXAMPLES ON COMPILER EXPLORER:

  1. unoptimized example
  2. optimized example
// let's say i want to group my enum "values" in groups:
enum groups {
    group_0 = 1 << 0,
    group_1 = 1 << 1,
    group_2 = 1 << 2,
    group_3 = 1 << 3
};

enum values {
    // structure of this enum is now as follows:
    // (number of element << bits reserved for groups)
    //   ... | any groups element belongs to

    // this is convenient as you can later just:
    
    // if (a & group_0)
    //     // to check if element belongs to a group

    // and this is also space efficient, and fast

    a = (1 << 4) | group_1,
    b = (2 << 4) | group_0 | group_3,
    c = (3 << 4) | group_0,
    d = (4 << 4) | group_1,
    e = (5 << 4) | group_2 | group_1,
    f = (6 << 4) | group_3,
};


void test(const char*); // dummy function for demo

void foo(values value) {
    // if for some reason i want to switch..case
    // by such elements, it becomes inefficient,
    // because actual values of elements are large,
    // and switch would generate a big jump table

    // so backend choses tree of branches instead

    // for example, this compiles to tree of ifs
    // on any optimization level on clang 16.0.0
    switch (value) {
    #define my_case(value) \
        case value: test("attempt 0: "#value); break;

        my_case(a)
        my_case(b)
        my_case(c)
        my_case(d)
        my_case(e)
        my_case(f)

    #undef my_case

        default:
            // notice this unreachable, it emphasizes
            // that value must be a valid enum value,
            // so one of: a, b, c, d, e, f.

            // no random integers casted to enum or
            // anything of the sort.
            
            __builtin_unreachable();
    }

    // but, this enum can be rewritten equivalently
    // noticing that all this values can be identified
    // by unique element numbers (because group ids
    // are actually just additional information, that's
    // convenient to use, nothing more, nothing less,
    // they are not very useful when matching values)

}

void bar(values value) {
    // so, to my understanding, THIS IS EQUIVALENT,
    // and considerably more space and time efficient:

    switch (value >> 4) {
    //               ^ number of groups
    
    #define my_case(value) \
        case value >> 4: test("attempt 1: " #value); break;
        //            ^ number of groups

        my_case(a)
        my_case(b)
        my_case(c)
        my_case(d)
        my_case(e)
        my_case(f)

    #undef my_case

        default:
            // notice this unreachable, it emphasizes
            // that value must be a valid enum value,
            // so one of: a, b, c, d, e, f.

            // no random integers casted to enum or
            // anything of the sort.
            
            __builtin_unreachable();
    }
}

This is the best-case scenario, where the defaulted branch is unreachable.

I believe, this optimization can be extended to some more general cases too,
especially if we assume on certain level of optimization, we can assume
that value with enum type can’t be equal to any other value, and the switch
is exhaustive. Or we can have a default check in every switch when the
alternative binary search tree is too large to be faster.

Do you find it interesting? If so, I believe, I can implement it.

As for me, I found this intresting. I believe it may be implemented on IR level, however, if we want to make sure that “default” case is unreachable, the only way I see is to write MLIR pass.

I didn’t quite follow the example above, but SimplifyCFG tries to reduce the range of switches where switch (value) is equivalent to switch (value >> N) in the ReduceSwitchRange() function. Perhaps that could be made smarter?

I looked through the code of llvm/lib/Transforms/Utils/SimplifyCFG.cpp and found the function you told about. There’s an interesting “FIXME” inside it:

  // FIXME: It's possible that optimizing a switch on powers of two might also
  // be beneficial - flag values are often powers of two and we could use a CLZ
  // as the key function.

I believe, @alexpaniman intended to make the same thing that “fixme” describes.
According to git blame, this fix was made on 2016-08-01, so maybe it’s time to fix it…