.. ## .. ## Copyright (c) Lawrence Livermore National Security, LLC and other .. ## RAJA Project Developers. See top-level LICENSE and COPYRIGHT .. ## files for dates and other details. No copyright assignment is required .. ## to contribute to RAJA. .. ## .. ## SPDX-License-Identifier: (BSD-3-Clause) .. ## .. _feat-multi-reductions-label: ========================= MultiReduction Operations ========================= RAJA provides multi-reduction types that allow users to perform a runtime number of reduction operations in kernels launched using ``RAJA::forall``, ``RAJA::kernel``, and ``RAJA::launch`` methods in a portable, thread-safe manner. Users may use as many multi-reduction objects in a loop kernel as they need. If a small fixed number of reductions is required in a loop kernel then standard RAJA reduction objects can be used. Available RAJA multi-reduction types are described in this section. .. note:: All RAJA multi-reduction types are located in the namespace ``RAJA``. Also .. note:: * Each RAJA multi-reduction type is templated on a **multi-reduction policy** and a **reduction value type** for the multi-reduction variable. The **multi-reduction policy type must be compatible with the execution policy used by the kernel in which it is used.** For example, in a CUDA kernel, a CUDA multi-reduction policy must be used. * Each RAJA multi-reduction type accepts an **initial reduction value or values** at construction (see below). * Each RAJA multi-reduction type has a 'get' method to access reduced values after kernel execution completes. Please see the following sections for a description of reducers: * :ref:`feat-reductions-label`. Please see the following cook book sections for guidance on policy usage: * :ref:`cook-book-multi-reductions-label`. -------------------- MultiReduction Types -------------------- RAJA supports three common multi-reduction types: * ``MultiReduceSum< multi_reduce_policy, data_type >`` - Sum of values. * ``MultiReduceMin< multi_reduce_policy, data_type >`` - Min value. * ``MultiReduceMax< multi_reduce_policy, data_type >`` - Max value. and two less common bitwise multi-reduction types: * ``MultiReduceBitAnd< multi_reduce_policy, data_type >`` - Bitwise 'and' of values (i.e., ``a & b``). * ``MultiReduceBitOr< multi_reduce_policy, data_type >`` - Bitwise 'or' of values (i.e., ``a | b``). .. note:: ``RAJA::MultiReduceBitAnd`` and ``RAJA::MultiReduceBitOr`` reduction types are designed to work on integral data types because **in C++, at the language level, there is no such thing as a bitwise operator on floating-point numbers.** ----------------------- MultiReduction Examples ----------------------- Next, we provide a few examples to illustrate basic usage of RAJA multi-reduction types. Here is a simple RAJA multi-reduction example that shows how to use a sum multi-reduction type:: const int N = 1000; const int B = 10; // // Initialize an array of length N with all ones, and another array to // integers between 0 and B-1 // int vec[N]; int bins[N]; for (int i = 0; i < N; ++i) { vec[i] = 1; bins[i] = i % B; } // Create a sum multi-reduction object with a size of B, and initial // values of zero RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(B, 0); // Run a kernel using the multi-reduction object RAJA::forall( RAJA::RangeSegment(0, N), [=](RAJA::Index_type i) { vsum[bins[i]] += vec[i]; }); // After kernel is run, extract the reduced values int my_vsums[B]; for (int bin = 0; bin < B; ++bin) { my_vsums[bin] = vsum[bin].get(); } The results of these operations will yield the following values: * my_vsums[0] == 100 * my_vsums[1] == 100 * my_vsums[2] == 100 * my_vsums[3] == 100 * my_vsums[4] == 100 * my_vsums[5] == 100 * my_vsums[6] == 100 * my_vsums[7] == 100 * my_vsums[8] == 100 * my_vsums[9] == 100 Here is the same example but using values stored in a container:: const int N = 1000; const int B = 10; // // Initialize an array of length N with all ones, and another array to // integers between 0 and B-1 // int vec[N]; int bins[N]; for (int i = 0; i < N; ++i) { vec[i] = 1; bins[i] = i % B; } // Create a vector with a size of B, and initial values of zero std::vector my_vsums(B, 0); // Create a multi-reducer initalized with size and values from my_vsums RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(my_vsums); // Run a kernel using the multi-reduction object RAJA::forall( RAJA::RangeSegment(0, N), [=](RAJA::Index_type i) { vsum[bins[i]] += vec[i]; }); // After kernel is run, extract the reduced values back into my_vsums vsum.get_all(my_vsums); The results of these operations will yield the following values: * my_vsums[0] == 100 * my_vsums[1] == 100 * my_vsums[2] == 100 * my_vsums[3] == 100 * my_vsums[4] == 100 * my_vsums[5] == 100 * my_vsums[6] == 100 * my_vsums[7] == 100 * my_vsums[8] == 100 * my_vsums[9] == 100 Here is an example of a bitwise-or multi-reduction:: const int N = 128; const int B = 8; // // Initialize an array of length N to integers between 0 and B-1 // int bins[N]; for (int i = 0; i < N; ++i) { bins[i] = i % B; } // Create a bitwise-or multi-reduction object with initial value of '0' RAJA::MultiReduceBitOr< RAJA::omp_multi_reduce, int > vor(B, 0); // Run a kernel using the multi-reduction object RAJA::forall( RAJA::RangeSegment(0, N), [=](RAJA::Index_type i) { vor[bins[i]] |= i; }); // After kernel is run, extract the reduced values int my_vors[B]; for (int bin = 0; bin < B; ++bin) { my_vors[bin] = vor[bin].get(); } The results of these operations will yield the following values: * my_vors[0] == 120 == 0b1111000 * my_vors[1] == 121 == 0b1111001 * my_vors[2] == 122 == 0b1111010 * my_vors[3] == 123 == 0b1111011 * my_vors[4] == 124 == 0b1111100 * my_vors[5] == 125 == 0b1111101 * my_vors[6] == 126 == 0b1111110 * my_vors[7] == 127 == 0b1111111 The results of the multi-reduction start at 120 and increase to 127. In binary representation (i.e., bits), :math:`120 = 0b1111000` and :math:`127 = 0b1111111`. The bins were picked in such a way that all the integers in a bin had the same remainder modulo 8 so their last 3 binary digits were all the same while their upper binary digits varied. Because bitwise-or keeps all the set bits, the upper bits are all set because at least one integer in that bin set them. The last 3 bits were the same in all the integers so the last 3 bits are the same as the remainder modulo 8 of the bin number. ----------------------- MultiReduction Policies ----------------------- For more information about available RAJA multi-reduction policies and guidance on which to use with RAJA execution policies, please see :ref:`multi-reducepolicy-label`.