Thanks for the replies!
Regarding the 2nd example,
I would expected LLVM to use the smallest type for the computation (i16) so instead:
%16 = load i8, ptr %15, align 1, !dbg !41, !tbaa !42
%17 = zext i8 %16 to i32, !dbg !41
%19 = load i8, ptr %18, align 1, !dbg !45, !tbaa !42
%20 = zext i8 %19 to i32, !dbg !45
%21 = mul nuw nsw i32 %20, %17, !dbg !46
%22 = icmp samesign ugt i32 %21, %10, !dbg !48
%23 = select i1 %22, i32 %21, i32 0, !dbg !49
%25 = load i8, ptr %24, align 1, !dbg !50, !tbaa !42
%26 = zext i8 %25 to i32, !dbg !50
%27 = mul nuw nsw i32 %23, %26, !dbg !51
%28 = trunc i32 %27 to i16, !dbg !52
I would expect it to use i16 in all of the intermediate computations like so:
%16 = load i8, ptr %15, align 1, !dbg !41, !tbaa !42
%17 = zext i8 %16 to i32, !dbg !41
%19 = load i8, ptr %18, align 1, !dbg !45, !tbaa !42
%20 = zext i8 %19 to i16, !dbg !45
%21 = mul nuw nsw i16 %20, %17, !dbg !46
%22 = icmp samesign ugt i16 %21, %10, !dbg !48
%23 = select i1 %22, i16 %21, i16 0, !dbg !49
%25 = load i8, ptr %24, align 1, !dbg !50, !tbaa !42
%26 = zext i8 %25 to i32, !dbg !50
%zx = zext i16 %23 to i32,
%27 = mul nuw nsw i32 %zx, %26, !dbg !51
%28 = trunc i32 %27 to i16, !dbg !52
Perhaps this has some cost-model hidden since I prefer as much compute done on small types as my machine is SIMD so I can utilize the most if its vector registers.