Mul & div support for wider-than-legal types


  1. Can mul and/or div support be added for big integer types like i256?
  2. What are the limits?
  3. If yes, how should it be done?
    I have experience only with X86 target and know that mul i128 works and div i128 is lowered to function call from compile-rt like library (what works only if you link with such library). Can that support be extended?
  • Paweł

mul can be inlined easily if necessary for arbitrary sizes, but div is
very expensive. Note that div can be expanded to calls even for smaller
sizes, since many (older) architectures don't have hardware division.

What's your use case?


mul can be inlined easily if necessary for arbitrary sizes, but div is very expensive.

Shall I file a bug for "implement FFT in LLVM"?


I didn't say it is the most efficient algorithm :slight_smile: That's why I asked
about the purpose.


I would be happy to have mul and div up to i512 on x86_64 target. My current solution is to replace that instructions with call to my helper functions.

If I understand correctly one big mul can be composed of 4 half-precision mul instructions? Would such solution be accepted by LLVM?

  • Paweł

Can you suggest an algorithm and implementation then?

For short multi-word multiplications, school book approach works fine.
For medium sizes problems, Karatsuba or Toom-Cook is better. For truely
larger problems, it boils down to FFT. Of course, all this needs careful
benchmarking if you want to select a specific one.


Can it be a first step?

  • Paweł

I played with legalizing mul instruction. I implemented naive expanding that requires 8 multiplication of half-size types ( This, of course, does not scale up very well, and it probably doable only up to i512. After this exercise, I have more questions than before.

Is information coming from TargetLowering::getOperationAction() related to mul type legalization? Where do that come from? Can a target request a “custom” lowering for that case?

Depending on a target, some muls divs and other operations are lowered to a call to a function from a “compiler-rt”-like library. The set of this built-in functions seems to be sealed. If I would like implement a support for e.g. mul i256, should I attack it in a similar manner? I.e. by adding a function to such library?

Another solution I think about is to inject a helper function to a module that requires long multiplication. That function should have a weak linkage to be possible merged later by a linker.

Any comments very welcome,
Paweł Bylica