Cooking with Reductions

Please see the following section for overview discussion about RAJA reductions:

Reductions with RAJA::forall

Here is the setup for a simple reduction example:

const int N = 1000;

int vec[N];

for (int i = 0; i < N; ++i) {

  vec[i] = 1;

}

Here a simple sum reduction is performed in a for loop:

int vsum = 0;

// Run a kernel using the reduction objects
for (int i = 0; i < N; ++i) {

  vsum += vec[i];

}

The results of these operations will yield the following values:

  • vsum == 1000

RAJA uses policy types to specify how things are implemented.

The forall execution policy specifies how the loop is run by the RAJA::forall method. The following discussion includes examples of several other RAJA execution policies that could be applied. For example RAJA::seq_exec runs a C-style for loop sequentially on a CPU. The RAJA::cuda_exec_with_reduce<256> runs the loop as a CUDA GPU kernel with 256 threads per block and other CUDA kernel launch parameters, like the number of blocks, optimized for performance with reducers.:

using exec_policy = RAJA::seq_exec;
// using exec_policy = RAJA::omp_parallel_for_exec;
// using exec_policy = RAJA::omp_target_parallel_for_exec<256>;
// using exec_policy = RAJA::cuda_exec_with_reduce<256>;
// using exec_policy = RAJA::hip_exec_with_reduce<256>;
// using exec_policy = RAJA::sycl_exec<256>;

The reduction policy specifies how the reduction is done and must match the execution policy. For example RAJA::seq_reduce does a sequential reduction and can only be used with sequential execution policies. The RAJA::cuda_reduce_atomic policy uses atomics, if possible with the given data type, and can only be used with cuda execution policies. Similarly for other RAJA execution back-ends, such as HIP and OpenMP. Here are example RAJA reduction policies whose names are indicative of which execution policies they work with:

using reduce_policy = RAJA::seq_reduce;
// using reduce_policy = RAJA::omp_reduce;
// using reduce_policy = RAJA::omp_target_reduce;
// using reduce_policy = RAJA::cuda_reduce_atomic;
// using reduce_policy = RAJA::hip_reduce_atomic;
// using reduce_policy = RAJA::sycl_reduce;

Here a simple sum reduction is performed using RAJA:

RAJA::ReduceSum<reduce_policy, int> vsum(0);

RAJA::forall<exec_policy>( RAJA::RangeSegment(0, N),
  [=](RAJA::Index_type i) {

  vsum += vec[i];

});

The results of these operations will yield the following values:

  • vsum.get() == 1000

Another option for the execution policy when using the cuda or hip backends are the base policies which have a boolean parameter to choose between the general use cuda/hip_exec policy and the cuda/hip_exec_with_reduce policy.:

// static constexpr bool with_reduce = ...;
// using exec_policy = RAJA::cuda_exec_base<with_reduce, 256>;
// using exec_policy = RAJA::hip_exec_base<with_reduce, 256>;

Another option for the reduction policy when using the cuda or hip backends are the base policies which have a boolean parameter to choose between the atomic cuda/hip_reduce_atomic policy and the non-atomic cuda/hip_reduce policy.:

// static constexpr bool with_atomic = ...;
// using reduce_policy = RAJA::cuda_reduce_base<with_atomic>;
// using reduce_policy = RAJA::hip_reduce_base<with_atomic>;