Reduction Types and Kernels with Multiple Reductions

This section contains an exercise file RAJA/exercises/reductions.cpp for you to work through if you wish to get some practice with RAJA. The file RAJA/exercises/reductions_solution.cpp contains complete working code for the examples discussed in this section. You can use the solution file to check your work and for guidance if you get stuck. To build the exercises execute make reductions and make reductions_solution from the build directory.

Key RAJA features shown in this section are:

  • RAJA::forall loop execution template and execution policies

  • RAJA::TypedRangeSegment iteration space construct

  • RAJA reduction types and reduction policies

In the Sum Reduction: Vector Dot Product exercise, we showed how to use the RAJA sum reduction type. The following example uses all supported RAJA reduction types: min, max, sum, min-loc, max-loc.

Note

RAJA ‘min-loc’ and ‘max-loc’ reductions determine the min and max reduction value, respectively, along with an iteration index at which the main/max value is found.

Note

Multiple RAJA reductions can be combined in any RAJA loop kernel execution method, and reduction operations can be combined with any other kernel operations.

Note

Each RAJA reduction type requires a reduction policy that must be compatible with the execution policy for the kernel in which it is used.

We start by allocating an array and initializing its values in a manner that makes the example mildly interesting and able to show what the different reduction types do. Specifically, the array is initialized to a sequence of alternating values (‘1’ and ‘-1’). Then, two values near the middle of the array are set to ‘-100’ and ‘100’:

//
// Define array length
//
  constexpr int N = 1000000;

//
// Allocate array data and initialize data to alternating sequence of 1, -1.
//
  int* a = memoryManager::allocate<int>(N);

  for (int i = 0; i < N; ++i) {
    if ( i % 2 == 0 ) {
      a[i] = 1;
    } else {
      a[i] = -1; 
    }
  }

//
// Set min and max loc values
//
  constexpr int minloc_ref = N / 2;
  a[minloc_ref] = -100;

  constexpr int maxloc_ref = N / 2 + 1;
  a[maxloc_ref] = 100;

We also define a range segment to iterate over the array:

  RAJA::TypedRangeSegment<int> arange(0, N);

With these parameters and data initialization, the code example presented below will generate the following results:

  • the sum will be zero

  • the min will be -100

  • the max will be 100

  • the min loc will be N/2

  • the max loc will be N/2 + 1

A sequential kernel that exercises all RAJA sequential reduction types along with operations after the kernel to print the reduced values is:

  using EXEC_POL1   = RAJA::seq_exec;
  using REDUCE_POL1 = RAJA::seq_reduce;
 
  RAJA::ReduceSum<REDUCE_POL1, int> seq_sum(0);
  RAJA::ReduceMin<REDUCE_POL1, int> seq_min(std::numeric_limits<int>::max());
  RAJA::ReduceMax<REDUCE_POL1, int> seq_max(std::numeric_limits<int>::min());
  RAJA::ReduceMinLoc<REDUCE_POL1, int> seq_minloc(std::numeric_limits<int>::max(), -1);
  RAJA::ReduceMaxLoc<REDUCE_POL1, int> seq_maxloc(std::numeric_limits<int>::min(), -1);

  RAJA::forall<EXEC_POL1>(arange, [=](int i) {
    
    seq_sum += a[i];

    seq_min.min(a[i]);
    seq_max.max(a[i]);

    seq_minloc.minloc(a[i], i);
    seq_maxloc.maxloc(a[i], i);

  });

  std::cout << "\tsum = " << seq_sum.get() << std::endl;
  std::cout << "\tmin = " << seq_min.get() << std::endl;
  std::cout << "\tmax = " << seq_max.get() << std::endl;
  std::cout << "\tmin, loc = " << seq_minloc.get() << " , " 
                               << seq_minloc.getLoc() << std::endl;
  std::cout << "\tmax, loc = " << seq_maxloc.get() << " , " 
                               << seq_maxloc.getLoc() << std::endl;

Note that each reduction object takes an initial value at construction. Also, within the kernel, updating each reduction is done via an operator or method that is basically what you would expect for the type of reduction (e.g., ‘+=’ for sum, ‘min()’ for min, etc.). After the kernel executes, the reduced value computed by each reduction object is retrieved after the kernel by calling a ‘get()’ method on the reduction object. The min-loc/max-loc index values are obtained using ‘getLoc()’ methods.

For parallel multithreading execution via OpenMP, the exercise can be run with the execution and reduction policies:

  using EXEC_POL2   = RAJA::omp_parallel_for_exec;
  using REDUCE_POL2 = RAJA::omp_reduce;

Similarly, the kernel containing the reductions can be run in parallel on a GPU using CUDA policies:

  using EXEC_POL3   = RAJA::cuda_exec<CUDA_BLOCK_SIZE>;
  using REDUCE_POL3 = RAJA::cuda_reduce;

or HIP policies:

  using EXEC_POL3   = RAJA::hip_exec<HIP_BLOCK_SIZE>;
  using REDUCE_POL3 = RAJA::hip_reduce;