Parallel Sort Operations¶
This section contains an exercise file RAJA/exercises/sort.cpp
for you to work through if you wish to get some practice with RAJA. The
file RAJA/exercises/sort_solution.cpp
contains complete
working code for the examples discussed in this section. You can use the
solution file to check your work and for guidance if you get stuck. To build
the exercises execute make sort
and make sort_solution
from the build directory.
Key RAJA features shown in this section are:
RAJA::sort
,RAJA::sort_pairs
,RAJA::stable_sort
, andRAJA::stable_sort_pairs
operations and execution policiesRAJA comparators for different types of sorts; e.g., less, greater
We show examples of RAJA sort operations using multiple RAJA execution back-ends and describe how different sort orderings can be achieved by passing different RAJA comparators to the RAJA sort template methods. Each comparator is a template type, where the template argument is the type of the values it compares. For a summary of available RAJA sorts, please see Sort Operations.
Note
RAJA sort operations use the same execution policy types that
RAJA::forall
loop execution templates do.
Note
RAJA sort operations take ‘span’ arguments to express the sequential
index range of array entries used in the sort. Typically, these
span objects are created using the RAJA::make_span
method
as shown in the examples below.
Each of the examples below uses the same integer arrays for input and output values. We set the input array and print them as follows:
//
// Define array length
//
constexpr int N = 20;
//
// Allocate and initialize vector data
//
int* in = memoryManager::allocate<int>(N);
int* out = memoryManager::allocate<int>(N);
unsigned* in_vals = memoryManager::allocate<unsigned>(N);
unsigned* out_vals = memoryManager::allocate<unsigned>(N);
std::iota(in , in + N/2, 0);
std::iota(in + N/2, in + N , 0);
std::shuffle(in , in + N/2, std::mt19937{12345u});
std::shuffle(in + N/2, in + N , std::mt19937{67890u});
std::fill(in_vals , in_vals + N/2, 0);
std::fill(in_vals + N/2, in_vals + N , 1);
std::cout << "\n in keys...\n";
printArray(in, N);
std::cout << "\n in (key, value) pairs...\n";
printArray(in, in_vals, N);
std::cout << "\n";
This produces the following sequence of values in the in
array:
6 7 2 1 0 9 4 8 5 3 4 9 6 3 7 0 1 8 2 5
and the following sequence of (key, value) pairs shown as pairs of values
in the in
and in_vals
arrays, respectively:
(6,0) (7,0) (2,0) (1,0) (0,0) (9,0) (4,0) (8,0) (5,0) (3,0)
(4,1) (9,1) (6,1) (3,1) (7,1) (0,1) (1,1) (8,1) (2,1) (5,1)
Note
In the following sections, we discuss stable and unstable sort operations. The difference between them is that a stable sort preserves the relative order of equal elements, with respect to the sort comparator operation, while an unstable sort may not preserve the relative order of equal elements. For the examples below that use integer arrays, there is no way to tell by inspecting the output whether relative ordering is preserved for unstable sorts. However, the preservation of relative ordering can be seen in the sort pairs examples below.
Unstable Sorts¶
A sequential unstable sort operation is performed by:
std::copy_n(in, N, out);
RAJA::sort<RAJA::seq_exec>(RAJA::make_span(out, N));
Since no comparator is passed to the sort method, the default ‘less’ operator
RAJA::operators::less<int>
is applied and the result generated in the
out
array is a non-decreasing sequence of values from the in
array;
i.e.,:
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
We can be explicit about the operation used in the sort operation by passing the ‘less’ operator to the sort method manually:
RAJA::sort<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
The result in the out
array is the same as before:
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
An unstable parallel sort operation using OpenMP multithreading is accomplished similarly by replacing the execution policy type with and OpenMP policy:
RAJA::sort<RAJA::omp_parallel_for_exec>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
As is common with RAJA, the only difference between this code and the previous one is that the execution policy is different. If we want to run the sort on a GPU using CUDA or HIP, we would use a CUDA or HIP execution policy. This is shown in examples that follow.
Stable Sorts¶
A sequential stable sort (less) operation is performed by:
RAJA::stable_sort<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
This generates the following sequence of values in the output array as expected based on the examples we discussed above:
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
Note that the stable sort result is the same as the unstable sort in this case because we are sorting an array of integers. We will show an example of sorting pairs later where this is not the case.
Running the same sort operation on a GPU using CUDA is done by:
RAJA::stable_sort<RAJA::cuda_exec<CUDA_BLOCK_SIZE>>(RAJA::make_span(out, N),
RAJA::operators::less<int>{});
Note that we pass the number of threads per CUDA thread block as the template
argument to the CUDA execution policy as we do when using RAJA::forall
.
Other Comparators¶
Using a different comparator operator allows sorting in a different order.
Here is a sequential stable sort that uses the ‘greater’ operator
RAJA::operators::greater<int>
:
RAJA::stable_sort<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::operators::greater<int>{});
and similarly for HIP:
RAJA::stable_sort<RAJA::hip_exec<HIP_BLOCK_SIZE>>(
RAJA::make_span(d_out, N),
RAJA::operators::greater<int>{});
Both of these sorts generate the following sequence of values in non-increasing order in the output array:
9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0
Note
The only operators provided by RAJA that are valid to use in sort because they enforce a strict weak ordering of elements for arithmetic types are ‘less’ and ‘greater’. Users may provide other operators for different sorting operations.
Also the RAJA CUDA sort back-end only supports RAJA operators ‘less’ and ‘greater’ because it uses the NVIDIA CUB library.
Sort Pairs¶
Sort pairs operations generate the same results as the sort operations we have just described. Additionally, a second array of values is reordered using the ordering of the first sorted array so two arrays are passed to sort pairs methods.
Note
For RAJA::sort_pairs
algorithms, two arrays are passed. The
first array (keys) will be sorted according to the given
comparator operator. The elements in the second array (values)
will be reordered based on the final order of the first sorted array.
Here is a sequential unstable sort pairs operation that uses the ‘less’ operator:
RAJA::sort_pairs<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::less<int>{});
This generates the following sequence in the output array:
(0,0) (0,1) (1,0) (1,1) (2,0) (2,1) (3,0) (3,1) (4,0) (4,1)
(5,1) (5,0) (6,1) (6,0) (7,0) (7,1) (8,0) (8,1) (9,1) (9,0)
Note that some of the pairs with equivalent keys stayed in the same order
that they appeared in the unsorted arrays like (8,0) (8,1)
, while others are
reversed like (9,1) (9,0)
. This illustrates that relative ordering of
equal elements may not be preserved in an unstable sort.
Here is a sequential stable sort pairs that uses the greater operator:
RAJA::stable_sort_pairs<RAJA::seq_exec>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::greater<int>{});
This generates the following sequence in the output array:
(9,0) (9,1) (8,0) (8,1) (7,0) (7,1) (6,0) (6,1) (5,0) (5,1)
(4,0) (4,1) (3,0) (3,1) (2,0) (2,1) (1,0) (1,1) (0,0) (0,1)
Note that all pairs with equivalent keys stayed in the same order that they appeared in the unsorted input arrays.
As you may expect at this point, running an stable sort pairs operation using OpenMP is accomplished by:
RAJA::stable_sort_pairs<RAJA::omp_parallel_for_exec>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::greater<int>{});
This generates the following sequence in the output array (as we saw earlier):
(9,0) (9,1) (8,0) (8,1) (7,0) (7,1) (6,0) (6,1) (5,0) (5,1)
(4,0) (4,1) (3,0) (3,1) (2,0) (2,1) (1,0) (1,1) (0,0) (0,1)
and the only difference is the execution policy template parameter.
Lastly, we show a parallel unstable sort pairs operation using CUDA:
RAJA::sort_pairs<RAJA::cuda_exec<CUDA_BLOCK_SIZE>>(RAJA::make_span(out, N),
RAJA::make_span(out_vals, N),
RAJA::operators::greater<int>{});
Note
RAJA sorts for the HIP back-end are similar to those for CUDA. The only difference is that a HIP execution policy template parameter type is used.