Parallel Sort Operations¶
Key RAJA features shown in this section:
RAJA::sort
operationRAJA::sort_pairs
operationRAJA::stable_sort
operationRAJA::stable_sort_pairs
operation- RAJA comparators for different types of sorts; e.g., less, greater
Below, we present examples of RAJA sequential, OpenMP, and CUDA sort operations and show how different sort orderings can be achieved by passing different RAJA comparators to the RAJA sort template methods. Each comparator is a template type, where the template argument is the type of the values it compares. For a summary of RAJA sort functionality, please see Parallel Sort Operations.
Note
RAJA sort operations use the same execution policy types that
RAJA::forall
loop execution templates do.
Each of the examples below uses the same integer arrays for input and output values. We set the input array and print them as follows:
//
// Define array length
//
const int N = 20;
//
// Allocate and initialize vector data
//
int* in = memoryManager::allocate<int>(N);
int* out = memoryManager::allocate<int>(N);
unsigned* in_vals = memoryManager::allocate<unsigned>(N);
unsigned* out_vals = memoryManager::allocate<unsigned>(N);
std::iota(in , in + N/2, 0);
std::iota(in + N/2, in + N , 0);
std::shuffle(in , in + N/2, std::mt19937{12345u});
std::shuffle(in + N/2, in + N , std::mt19937{67890u});
std::fill(in_vals , in_vals + N/2, 0);
std::fill(in_vals + N/2, in_vals + N , 1);
This generates the following sequence of values in the in
array:
6 7 2 1 0 9 4 8 5 3 4 9 6 3 7 0 1 8 2 5
and the following sequence of (key, value) pairs in the in
and in_vals
arrays:
(6,0) (7,0) (2,0) (1,0) (0,0) (9,0) (4,0) (8,0) (5,0) (3,0)
(4,1) (9,1) (6,1) (3,1) (7,1) (0,1) (1,1) (8,1) (2,1) (5,1)
Unstable Sorts¶
A sequential unstable sort operation is performed by:
RAJA::sort<RAJA::seq_exec>(RAJA::make_span(out, N));
Since no comparator is passed to the sort method, the default less operation
is applied and the result generated in the out
array is non-decreasing sort
on the out
array. The resulting out
array contains the values:
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
We can be explicit about the operation used in the sort by passing the less operator to the sort method:
RAJA::sort<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
The result in the out
array is the same.
An unstable parallel sort operation using OpenMP multi-threading is accomplished similarly by replacing the execution policy type:
RAJA::sort<RAJA::omp_parallel_for_exec>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
As is commonly done with RAJA, the only difference between this code and the previous one is that the execution policy is different. If we want to run the sort on a GPU using CUDA, we would use a CUDA execution policy. This will be shown shortly.
Stable Sorts¶
A sequential stable sort (less) operation is performed by:
RAJA::stable_sort<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
This generates the following sequence of values in the output array:
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
Note that the stable sort result is the same as the unstable sort in this case because we are sorting integers. We will show an example of sorting pairs later where this is not the case.
Running the same sort operation on a GPU using CUDA is done by:
RAJA::stable_sort<RAJA::cuda_exec<CUDA_BLOCK_SIZE>>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
Note that we pass the number of threads per CUDA thread block as the template argument to the CUDA execution policy as we do in other cases.
Other Comparators¶
Using a different comparator allows sorting in a different order. Here is a sequential stable sort that uses the greater operator:
RAJA::stable_sort<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::operators::greater<int>{});
This generates the following sequence of values in non-increasing order in the output array:
9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0
Note that the only operators provided by RAJA that are valid to use in sort because they form a strict weak ordering of elements for arithmetic types are less and greater. Also note that the the cuda sort backend only supports RAJA’s operators less and greater.
Sort Pairs¶
Sort Pairs operations generate the same results as the sort operations we have just described. However, an additional array of values is also permuted to match the sorted array so two arrays are passed to sort pairs methods.
Here is a sequential unstable sort pairs that uses the less operator:
RAJA::sort_pairs<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::less<int>{});
This generates the following sequence in the output array:
(0,0) (0,1) (1,0) (1,1) (2,0) (2,1) (3,0) (3,1) (4,0) (4,1)
(5,1) (5,0) (6,1) (6,0) (7,0) (7,1) (8,0) (8,1) (9,1) (9,0)
Note that some of the pairs with equivalent keys stayed in the same order
they appeared in the unsorted arrays like (8,0) (8,1)
, while others are
reversed like (9,1) (9,0)
.
Here is a sequential stable sort pairs that uses the greater operator:
RAJA::stable_sort_pairs<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::greater<int>{});
This generates the following sequence in the output array:
(9,0) (9,1) (8,0) (8,1) (7,0) (7,1) (6,0) (6,1) (5,0) (5,1)
(4,0) (4,1) (3,0) (3,1) (2,0) (2,1) (1,0) (1,1) (0,0) (0,1)
Note that all pairs with equivalent keys stayed in the same order that they appeared in the unsorted arrays.
As you may expect at this point, running an stable sort pairs operation using OpenMP is accomplished by:
RAJA::stable_sort_pairs<RAJA::omp_parallel_for_exec>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::greater<int>{});
This generates the following sequence in the output array (as we saw earlier):
(9,0) (9,1) (8,0) (8,1) (7,0) (7,1) (6,0) (6,1) (5,0) (5,1)
(4,0) (4,1) (3,0) (3,1) (2,0) (2,1) (1,0) (1,1) (0,0) (0,1)
and the only difference is the execution policy template parameter.
Lastly, we show a parallel unstable sort pairs operation using CUDA:
RAJA::sort_pairs<RAJA::cuda_exec<CUDA_BLOCK_SIZE>>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::greater<int>{});
Note
RAJA sorts for the HIP back-end are similar to those for CUDA.
The file RAJA/examples/tut_sort.cpp
contains the complete
working example code.