Support for xchg opcodes?

This is probably diving a bit deeper in to register allocator internals than
any sane person should ever want to go. but I'm curious how one would go
about teaching LLVM of a register swap instruction, for an architecture
where swaps are as cheap as moves.

It would of course be an easy enough pattern to add in a PostRegAlloc pass,
except that at that point most opportunities are gone.

In particular, many of my test functions for the backend I'm on see code
like this:

  ; %bb.0: ; %entry
  mov r0, r4
  mov r2, r0

Where the allocator is trying to get parameter r2 of the function in to r0
(result), and is using r4 as a temporary for the displaced parameter r0.

This could be replaced by a single:

  exch r0, r2

Which saves a register, reduces code size, and provides a speed boost - but
I'm unsure where one would even begin to look for adding support for this
even as an academic exercise. Can anyone provide any guidance here? Assuming
LLVM doesn't support it already via a little documented TII callback of