[PATCH] Add mul_hi implementation [v2]

Everything except long/ulong is handled by just casting to the next larger type,
doing the math and then shifting/casting the result.

For 64-bit types, we break the high/low parts of each operand apart, and do
a FOIL-based multiplication.

v2:
  Discard the stack-overflow implementation due to copyright concerns.
  - The implementation is still FOIL-based, but discards the previous code.

Everything except long/ulong is handled by just casting to the next larger type,
doing the math and then shifting/casting the result.

For 64-bit types, we break the high/low parts of each operand apart, and do
a FOIL-based multiplication.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>