Iteration Spaces: IndexSets and Segments¶
Key RAJA features shown in this example:
RAJA::forall
loop execution templateRAJA::RangeSegment
(i.e.,RAJA::TypedRangeSegment
) iteration space constructRAJA::TypedListSegment
iteration space constructRAJA::IndexSet
iteration construct and associated execution policies
The example uses a simple daxpy kernel and its usage of RAJA is similar to previous simple loop examples. The example focuses on how to use RAJA index sets and iteration space segments, such as index ranges and lists of indices. These features are important for applications and algorithms that use indirection arrays for irregular array accesses. Combining different segment types, such as ranges and lists in an index set allows a user to launch different iteration patterns in a single loop execution construct (i.e., one kernel). This is something that is not supported by other programming models and abstractions and is unique to RAJA. Applying these concepts judiciously can increase performance by allowing compilers to optimize for specific segment types (e.g., SIMD for range segments) while providing the flexibility of indirection arrays for general indexing patterns.
Note
For the following examples, it is useful to remember that all
RAJA segment types are templates, where the type of the index
value is the template argument. So for example, the basic RAJA
range segment type is RAJA::TypedRangeSegment<T>
. The type
RAJA::RangeSegment
used here (for convenience) is a type alias
for RAJA::TypedRangeSegment<RAJA::Index_type>
, where the
template parameter is a default index type that RAJA defines.
For a summary discussion of RAJA segment and index set concepts, please see Indices, Segments, and IndexSets.
RAJA Segments¶
In previous examples, we have seen how to define a contiguous range of loop
indices [0, N) with a RAJA::RangeSegment
object and use it in a RAJA
loop execution template to run a loop kernel over the range. For example:
RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, N), [=] (IdxType i) {
a[i] += b[i] * c;
});
We can accomplish the same result by enumerating the indices in a
RAJA::TypedListSegment
object. Here, we assemble the indices in a standard
vector, create a list segment from that, and then pass the list segment to the
forall execution template:
std::vector<IdxType> idx;
for (IdxType i = 0; i < N; ++i) {
idx.push_back(i);
}
ListSegType idx_list( &idx[0], idx.size(), host_res );
RAJA::forall<RAJA::seq_exec>(idx_list, [=] (IdxType i) {
a[i] += b[i] * c;
});
Note that we are using the following type aliases:
using IdxType = RAJA::Index_type;
using ListSegType = RAJA::TypedListSegment<IdxType>;
Recall from discussion in Indices, Segments, and IndexSets that RAJA::Index_type
is
a default index type that RAJA defines and which is used in some RAJA
constructs as a convenience for users who want a simple mechanism to apply
index types consistently.
It is important to understand what happens when using list segments. During loop execution, indices stored in the list segment are passed to the loop body one-by-one, effectively mimicking an indirection array except that the indirection does not appear in the loop body. For example, we can reverse the order of the indices, run the loop with a new list segment object, and get the same result since the loop is data-parallel:
std::reverse( idx.begin(), idx.end() );
ListSegType idx_reverse_list( &idx[0], idx.size(), host_res );
RAJA::forall<RAJA::seq_exec>(idx_reverse_list, [=] (IdxType i) {
a[i] += b[i] * c;
});
Alternatively, we can also use a RAJA strided range segment to run the loop in reverse by giving it a stride of -1. For example:
RAJA::forall<RAJA::seq_exec>(RAJA::RangeStrideSegment(N-1, -1, -1),
[=] (IdxType i) {
a[i] += b[i] * c;
});
The fact that RAJA always passes loop index values to lambdas in a kernel explains why we can run a kernel with multiple segment types in a single RAJA construct as we discuss next.
RAJA IndexSets¶
The RAJA::TypedIndexSet
template is a container that can hold
any number of segments, of the same or different types. An index set object
can be passed to a RAJA loop execution method, just like a segment, to
run a loop kernel. When the loop is run, the execution method iterates
over the segments and the loop indices in each segment. Thus, the loop
iterates can be grouped into different segments to partition the iteration
space and iterate over the loop kernel chunks (defined by segments), in
serial, in parallel, or in some specific dependency ordering. Individual
segments can be executed in serial or parallel.
When an index set is defined, the segment types it may hold must be specified as template arguments. For example, here we create an index set that can hold list segments. Then, we add the list segment we created earlier to it and run the loop:
RAJA::TypedIndexSet<ListSegType> is1;
is1.push_back( idx_list ); // use list segment created earlier.
RAJA::forall<SEQ_ISET_EXECPOL>(is1, [=] (IdxType i) {
a[i] += b[i] * c;
});
You are probably wondering: What is the ‘SEQ_ISET_EXECPOL’ type used for the execution policy?
Well, it is similar to execution policy types we have seen up to this point, except that it specifies a two-level policy – one for iterating over the segments and one for executing the iterates defined by each segment. In the example, we specify that we should do each of these operations sequentially by defining the policy as follows:
using SEQ_ISET_EXECPOL = RAJA::ExecPolicy<RAJA::seq_segit,
RAJA::seq_exec>;
Next, we perform the daxpy operation by partitioning the iteration space into two range segments:
RAJA::TypedIndexSet<RAJA::RangeSegment> is2;
is2.push_back( RAJA::RangeSegment(0, N/2) );
is2.push_back( RAJA::RangeSegment(N/2, N) );
RAJA::forall<SEQ_ISET_EXECPOL>(is2, [=] (IdxType i) {
a[i] += b[i] * c;
});
The first range segment is used to run the index range [0, N/2) and the second is used to run the range [N/2, N).
We can also break up the iteration space into three segments, 2 ranges and 1 list:
//
// Collect indices in a vector to create list segment
//
std::vector<IdxType> idx1;
for (IdxType i = N/3; i < 2*N/3; ++i) {
idx1.push_back(i);
}
ListSegType idx1_list( &idx1[0], idx1.size(), host_res );
RAJA::TypedIndexSet<RAJA::RangeSegment, ListSegType> is3;
is3.push_back( RAJA::RangeSegment(0, N/3) );
is3.push_back( idx1_list );
is3.push_back( RAJA::RangeSegment(2*N/3, N) );
RAJA::forall<SEQ_ISET_EXECPOL>(is3, [=] (IdxType i) {
a[i] += b[i] * c;
});
The first range segment runs the index range [0, N/3), the list segment enumerates the indices in the interval [N/3, 2*N/3), and the second range segment runs the range [2*N/3, N). Note that we use the same execution policy as before.
Before we end the discussion of these examples, we demonstrate a few more index set execution policy variations. To run the previous three segment code by iterating over the segments sequentially and executing each segment in parallel using OpenMP multithreading, we would use this policy definition:
using OMP_ISET_EXECPOL1 = RAJA::ExecPolicy<RAJA::seq_segit,
RAJA::omp_parallel_for_exec>;
If we wanted to iterate over the segments in parallel using OpenMP multi-threading and execute each segment sequentially, we would use the following policy:
using OMP_ISET_EXECPOL2 = RAJA::ExecPolicy<RAJA::omp_parallel_for_segit,
RAJA::seq_exec>;
Finally, to iterate over the segments sequentially and execute each segment in parallel on a GPU using either CUDA or HIP kernel, we would use a policy, such as:
using CUDA_ISET_EXECPOL = RAJA::ExecPolicy<RAJA::seq_segit,
RAJA::cuda_exec<CUDA_BLOCK_SIZE>>;
or:
using HIP_ISET_EXECPOL = RAJA::ExecPolicy<RAJA::seq_segit,
RAJA::hip_exec<HIP_BLOCK_SIZE>>;
The file RAJA/examples/tut_indexset-segments.cpp
contains working code
for these examples.