Handling native i16 types in clang and opt

Hello.
     My target architecture supports natively 16 bit integers (i16).

     Whenever I write in C programs using only short types, clang compiles the program to LLVM and converts the i16 data to i32 to perform arithmetic operations and then truncates the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass removes these conversions back and forth from i16, except for the (s)div LLVM IR operation.

     Is there a way to avoid these conversion made by clang back and forth from i16 to i32, if my source program uses only short types?
     Otherwise, how can I make the IC pass handle sdiv the way it does with add (sub), mul? (that is, if the input operands are i16, the add/mul operation will eventually be i16, with any unnecessary conversion back and forth from i32 removed).

   Thank you,
     Alex

Do you have a simple test case you can send? I’m having trouble replicating this on x86-64 with the simplest possible test.

unsigned short foo(unsigned short a, unsigned short b) {
return a + b;
}

This gives IR with no mention of i32. Maybe there’s somethings misconfigured for your target or I need a more complex test case.

sdiv in particular is special: it has undefined behavior on overflow. "sdiv i32 -32768, -1" produces "i32 32768", but "sdiv i16 -32768, -1" is undefined.

-Eli

Just a shot in the dark here…

could this possibly be that Clang (or whatever is adding those trunc/ext’s into the IR) is considering your calling conventions in CallingConv.td?

We certainly get the same behaviour on PPC but I wonder if that’s due to lines like this in PPCCallingConv.td:
CCIfType<[i8], CCPromoteToType<i64>>

I don’t know enough about this code to really know whether the above has any relation to reality, but it seems related.

Hello.
     I come back to this older thread.
     I'd also like to thank Peter Lawrence for the insightful answer (see below his email, if interested). Actually I would like to add that the C11 standard, Section 6.3.1.1, talks about integer promotions, which explains why the C language requires short arithmetic to be promoted to the size of int. See also https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules .

     I would like to give an answer to Craig Topper: indeed I have a simple very interesting case where these promotions happen - the Floyd-Warshall algorithm, with the below program (also try the example at https://www.geeksforgeeks.org/integer-promotions-in-c/) . But in all cases do give clang -O0 to emit unoptimized LLVM IR.
       #define SIZE 128
       short path[SIZE][SIZE];
       void FloydWarshall() {
         int i, j, k;

         for (k = 0; k < SIZE; k++) {
             for (i = 0; i < SIZE; ++i) {
                 short pik = path[i][k];
                 for (j = 0; j < SIZE; j++) {
                     path[i][j] = path[i][j] < pik + path[k][j] ?
                               path[i][j] : pik + path[k][j];
                 }
             }
         }
       }

     The innermost's loop body is translated to the following UNoptimized LLVM IR code - see lines with comment "IMPORTANT":
         for.body8: ; preds = %for.cond6
           %6 = load i32, i32* %j, align 4
           %idxprom9 = sext i32 %6 to i64
           %7 = load i32, i32* %i, align 4
           %idxprom10 = sext i32 %7 to i64
           %arrayidx11 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @path, i64 0, i64 %idxprom10
           %arrayidx12 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx11, i64 0, i64 %idxprom9
           %8 = load i16, i16* %arrayidx12
           %conv = sext i16 %8 to i32 ; IMPORTANT
           %9 = load i16, i16* %pik
           %conv13 = sext i16 %9 to i32 ; IMPORTANT
           %10 = load i32, i32* %j, align 4
           %idxprom14 = sext i32 %10 to i64
           %11 = load i32, i32* %k, align 4
           %idxprom15 = sext i32 %11 to i64
           %arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @path, i64 0, i64 %idxprom15
           %arrayidx17 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx16, i64 0, i64 %idxprom14
           %12 = load i16, i16* %arrayidx17, align 2, !dbg !61
           %conv18 = sext i16 %12 to i32
           %add = add nsw i32 %conv13, %conv18
           %add = add nsw i16 %9, %12 ; IMPORTANT
           %cmp19 = icmp slt i32 %conv, %add
           %cmp19 = icmp slt i16 %8, %add ; IMPORTANT
           br i1 %cmp19, label %cond.true, label %cond.false

   Best regards,
     Alex