Cooking with Reductions¶
Please see the following section for overview discussion about RAJA reductions:
Reductions with RAJA::forall¶
Here is the setup for a simple reduction example:
const int N = 1000;
int vec[N];
for (int i = 0; i < N; ++i) {
vec[i] = 1;
}
Here a simple sum reduction is performed in a for loop:
int vsum = 0;
// Run a kernel using the reduction objects
for (int i = 0; i < N; ++i) {
vsum += vec[i];
}
The results of these operations will yield the following values:
vsum == 1000
RAJA uses policy types to specify how things are implemented.
The forall execution policy specifies how the loop is run by the RAJA::forall
method. The following discussion includes examples of several other RAJA execution policies that could be applied.
For example RAJA::seq_exec
runs a C-style for loop sequentially on a CPU. The
RAJA::cuda_exec_rec_for_reduce<256>
runs the loop as a CUDA GPU kernel with
256 threads per block and other CUDA kernel launch parameters, like the
number of blocks, optimized for performance with reducers.:
using exec_policy = RAJA::seq_exec;
// using exec_policy = RAJA::omp_parallel_for_exec;
// using exec_policy = RAJA::omp_target_parallel_for_exec<256>;
// using exec_policy = RAJA::cuda_exec_rec_for_reduce<256>;
// using exec_policy = RAJA::hip_exec_rec_for_reduce<256>;
// using exec_policy = RAJA::sycl_exec<256>;
The reduction policy specifies how the reduction is done and must match the
execution policy. For example RAJA::seq_reduce
does a sequential reduction
and can only be used with sequential execution policies. The
RAJA::cuda_reduce_atomic
policy uses atomics, if possible with the given
data type, and can only be used with cuda execution policies. Similarly for other RAJA execution back-ends, such as HIP and OpenMP. Here are example RAJA reduction policies whose names are indicative of which execution policies they work with:
using reduce_policy = RAJA::seq_reduce;
// using reduce_policy = RAJA::omp_reduce;
// using reduce_policy = RAJA::omp_target_reduce;
// using reduce_policy = RAJA::cuda_reduce_atomic;
// using reduce_policy = RAJA::hip_reduce_atomic;
// using reduce_policy = RAJA::sycl_reduce;
Here a simple sum reduction is performed using RAJA:
RAJA::ReduceSum<reduce_policy, int> vsum(0);
RAJA::forall<exec_policy>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {
vsum += vec[i];
});
The results of these operations will yield the following values:
vsum.get() == 1000