Handling native i16 types in clang and opt

Alex_Susu · May 21, 2017, 8:22am

Hello.
My target architecture supports natively 16 bit integers (i16).

Whenever I write in C programs using only short types, clang compiles the program to LLVM and converts the i16 data to i32 to perform arithmetic operations and then truncates the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass removes these conversions back and forth from i16, except for the (s)div LLVM IR operation.

Is there a way to avoid these conversion made by clang back and forth from i16 to i32, if my source program uses only short types?
Otherwise, how can I make the IC pass handle sdiv the way it does with add (sub), mul? (that is, if the input operands are i16, the add/mul operation will eventually be i16, with any unnecessary conversion back and forth from i32 removed).

Thank you,
Alex

topperc · May 21, 2017, 8:40am

Do you have a simple test case you can send? I’m having trouble replicating this on x86-64 with the simplest possible test.

unsigned short foo(unsigned short a, unsigned short b) {
return a + b;
}

This gives IR with no mention of i32. Maybe there’s somethings misconfigured for your target or I need a more complex test case.

Eli_Friedman · May 22, 2017, 5:27pm

sdiv in particular is special: it has undefined behavior on overflow. "sdiv i32 -32768, -1" produces "i32 32768", but "sdiv i16 -32768, -1" is undefined.

-Eli

nemanjai · May 29, 2017, 1:30pm

Just a shot in the dark here…

could this possibly be that Clang (or whatever is adding those trunc/ext’s into the IR) is considering your calling conventions in CallingConv.td?

We certainly get the same behaviour on PPC but I wonder if that’s due to lines like this in PPCCallingConv.td:
CCIfType<[i8], CCPromoteToType<i64>>

I don’t know enough about this code to really know whether the above has any relation to reality, but it seems related.

Alex_Susu · July 25, 2018, 3:12pm

Hello.
I come back to this older thread.
I'd also like to thank Peter Lawrence for the insightful answer (see below his email, if interested). Actually I would like to add that the C11 standard, Section 6.3.1.1, talks about integer promotions, which explains why the C language requires short arithmetic to be promoted to the size of int. See also c - Implicit type promotion rules - Stack Overflow .

     I would like to give an answer to Craig Topper: indeed I have a simple very interesting case where these promotions happen - the Floyd-Warshall algorithm, with the below program (also try the example at Integer Promotions in C - GeeksforGeeks) . But in all cases do give clang -O0 to emit unoptimized LLVM IR.
       #define SIZE 128
       short path[SIZE][SIZE];
       void FloydWarshall() {
         int i, j, k;

         for (k = 0; k < SIZE; k++) {
             for (i = 0; i < SIZE; ++i) {
                 short pik = path[i][k];
                 for (j = 0; j < SIZE; j++) {
                     path[i][j] = path[i][j] < pik + path[k][j] ?
                               path[i][j] : pik + path[k][j];
                 }
             }
         }
       }

     The innermost's loop body is translated to the following UNoptimized LLVM IR code - see lines with comment "IMPORTANT":
         for.body8: ; preds = %for.cond6
           %6 = load i32, i32* %j, align 4
           %idxprom9 = sext i32 %6 to i64
           %7 = load i32, i32* %i, align 4
           %idxprom10 = sext i32 %7 to i64
           %arrayidx11 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @path, i64 0, i64 %idxprom10
           %arrayidx12 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx11, i64 0, i64 %idxprom9
           %8 = load i16, i16* %arrayidx12
           %conv = sext i16 %8 to i32 ; IMPORTANT
           %9 = load i16, i16* %pik
           %conv13 = sext i16 %9 to i32 ; IMPORTANT
           %10 = load i32, i32* %j, align 4
           %idxprom14 = sext i32 %10 to i64
           %11 = load i32, i32* %k, align 4
           %idxprom15 = sext i32 %11 to i64
           %arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @path, i64 0, i64 %idxprom15
           %arrayidx17 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx16, i64 0, i64 %idxprom14
           %12 = load i16, i16* %arrayidx17, align 2, !dbg !61
           %conv18 = sext i16 %12 to i32
           %add = add nsw i32 %conv13, %conv18
           %add = add nsw i16 %9, %12 ; IMPORTANT
           %cmp19 = icmp slt i32 %conv, %add
           %cmp19 = icmp slt i16 %8, %add ; IMPORTANT
           br i1 %cmp19, label %cond.true, label %cond.false

Best regards,
Alex

Topic		Replies	Views
Handling native i16 types in clang and opt LLVM Dev List Archives	0	93	May 31, 2017
type promotion i16 -> i32 LLVM Dev List Archives	2	105	June 23, 2011
[RFC] implementation of _Float16 Clang Frontend	19	811	June 16, 2017
Use Smallest types in IR IR & Optimizations	12	405	December 1, 2024
_Float16 support LLVM Dev List Archives	13	665	January 25, 2019

Handling native i16 types in clang and opt

Related topics