`nvvm.mbarrier.try_wait` behaviour

Hi all,

I was looking into adding an nvgpu.mbarrier.try_wait op (a version that takes the output token from nvgpu.mbarrier.arrive instead of a parity argument), but noticed something about the behaviour of the existing nvvm.mbarrier.try_wait.parity op that I think is incorrect.

The op takes a mandatory ticks argument to that is supposed to specify the timeout duration before the try_wait returns. However, when the op is lowered to LLVM, a loop is inserted around the op (making the ticks argument redundant). The behaviour doesn’t match the op description for nvgpu.mbarrier.try_wait.parity.

I think the solution is as follows:

  1. Remove the loop around mbarrier.try_wait and add a boolean output to the op indicating whether or not the wait succeeded.
  2. Make the ticks argument optional (it is optional in PTX, defaulting to some system-dependent limit).
  3. (Possible follow-up) Add an nvgpu.mbarrier.wait op that is blocking (matching the existing behaviour of nvgpu.mbarrier.try_wait).

Obviously, these changes break backwards compatibility. Is it possible to make changes like this? I don’t want to waste my time preparing these changes if they won’t be accepted.

Thanks, Chris

P.S. Another weird inconsistency is the naming… mbarrier.test.wait vs mbarrier.try_wait. Can that sort of thing be fixed?