Multiply i8 operands promotes to i32

Hi,

I am trying to complete the hardware multiplier option for MSP430 backend.

As the hardware multiplier in most of the MSP430 devices is for i8 and
i16 operands, with i16 and i32 result, I am lowering MUL_i8 and MUL_I16.
However, the front-end promotes the i8 argument to i32, executes 32-bit
multiplier and truncates to 16-bit, so I never lower MUL_I8 nor MUL_I16
but MUL_I32, wchich is lowered to an external libcall (__mulsi3) that I
don´t have.

What should I do in order to prevent the front-end from promote to
32-bit multiplier?
If that is not possible, how can I detect when lowering that it is
actually a MUL_I8 or MUL_I16 in order to do the correct lowering?

Thanks in advance,
Pedro

P.S: I add C code and corresponding LLVM code.

C code:
void
(const u_int16_t in_data, u_int16_t* out)
{
  u_int8_t kk = in_data&0xFF;
  u_int16_t kk16 = kk * kk;
  *out = kk16;
}

LLVM:
  %1 = load i8* %kk, align 1
  %conv2 = zext i8 %1 to i32
  %2 = load i8* %kk, align 1
  %conv3 = zext i8 %2 to i32
  %mul = mul nsw i32 %conv2, %conv3
  %conv4 = trunc i32 %mul to i16
  store i16 %conv4, i16* %kk16, align 2

Hi Pedro, what is the "front-end" you refer to? Clang?

Ciao, Duncan.

Hi,

I am trying to complete the hardware multiplier option for MSP430 backend.

As the hardware multiplier in most of the MSP430 devices is for i8 and
i16 operands, with i16 and i32 result, I am lowering MUL_i8 and MUL_I16.
However, the front-end promotes the i8 argument to i32, executes 32-bit
multiplier and truncates to 16-bit, so I never lower MUL_I8 nor MUL_I16
but MUL_I32, wchich is lowered to an external libcall (__mulsi3) that I
don´t have.

What should I do in order to prevent the front-end from promote to
32-bit multiplier?

(I'm assuming you're getting C code from clang.)

You can't, assuming your platform defines "int" to be 32 bits; clang
is just following the C standard. This may seem a little silly in
this case, but clang generally tries to generate math operations as
written.

If that is not possible, how can I detect when lowering that it is
actually a MUL_I8 or MUL_I16 in order to do the correct lowering?

At -O0, you don't. __mulsi3 is the obvious lowering, and you're doing
something wrong if your tools don't provide it.

There are probably some interesting issues with the current optimizers
at -O2, but it's hard to discuss that without specific examples.

-Eli

At -O0, you don't. __mulsi3 is the obvious lowering, and you're doing
something wrong if your tools don't provide it.

MSP430 is 16 bit target, so mulsi is a bit expensive there, mulhi /
mulqi can be implemented via hardware multiplier.

There are several problems wrt 16 bit support inside LLVM in general
and msp430 in particular:

1. In some places LLVM expectes 32 bit or 64 bit target (e.g. i32
arguments length argument of memcpy, etc.)
2. On MSP430 the multiplier is an external device, so you either need
to be sure that there are no muls, etc. inside interrupts, or disable
interrupts while accessing the multiplier.

I'm attaching some old proof-of-concept code for compiler-rt which
implementes "hi" and "qi" operations. Most probably it needs to be
modified to be fit into current compiler-rt codebase.

mulhi3hw_noint.S (509 Bytes)

mulhi3hw.S (541 Bytes)

mulqi3hw_noint.S (510 Bytes)

mulqi3hw.S (541 Bytes)

umulhi3hw_noint.S (511 Bytes)

umulhi3hw.S (542 Bytes)

umulqi3hw_noint.S (511 Bytes)

umulqi3hw.S (542 Bytes)

Hi,

I am generating an assembly file with llc. If I get MUL_I8 and MUL_I16,
with the cli option "msp430-hwmult-mode" it makes a libcall to the
functions Anton just attached.

However, those functions are not included in the assembler I use (nor is
the __mulsi3). I use the debian package for msp430. I am going to add
Anton's files and a new one for mulsi3. I was developing mine and I am
going to save time. The only thing I might have to change is the address
of MPY registers, as they depend on the MSP430 you are using
(MSP430f5438 MPY address is 0x4C0 for example)

Thanks,
Pedro