[RFC] carry-less multiplication instruction

To attempt to start the revival, let’s try to settle on answers to the open questions here

  1. What are people’s thoughts on llvm.clmul vs llvm.experimental.bitmanip.clmul?
  2. How do people feel about specifying n bit returns for n bit operands (requiring the user to zero extend if they want the full 2n bit product).