Parallel stable sort

Hello friends,

I am now working on the parallel stable sort backend. I’ve looked at the sort implemented by the TBB backend and it seems to depend on TBB in complex ways. I’m not really sure that I could translate that into reasonable OpenMP.

However, I have found a parallel stable three-way quicksort algorithm that has sample code from Intel. This algorithm looks reasonably straightforward, and it appears to have good performance.

If there are no objections I would like to go forward with this implementation. Or, if there is another algorithm that is preferable, please let me know.

Thank you!