MultiReduction Operations¶
RAJA provides multi-reduction types that allow users to perform a runtime number
of reduction operations in kernels launched using RAJA::forall, RAJA::kernel,
and RAJA::launch methods in a portable, thread-safe manner. Users may
use as many multi-reduction objects in a loop kernel as they need. If a small
fixed number of reductions is required in a loop kernel then standard RAJA reduction objects can be
used. Available RAJA multi-reduction types are described in this section.
Note
All RAJA multi-reduction types are located in the namespace RAJA.
Also
Note
Each RAJA multi-reduction type is templated on a multi-reduction policy and a reduction value type for the multi-reduction variable. The multi-reduction policy type must be compatible with the execution policy used by the kernel in which it is used. For example, in a CUDA kernel, a CUDA multi-reduction policy must be used.
Each RAJA multi-reduction type accepts an initial reduction value or values at construction (see below).
Each RAJA multi-reduction type has a ‘get’ method to access reduced values after kernel execution completes.
Please see the following sections for a description of reducers:
Please see the following cook book sections for guidance on policy usage:
MultiReduction Types¶
RAJA supports three common multi-reduction types:
MultiReduceSum< multi_reduce_policy, data_type >- Sum of values.MultiReduceMin< multi_reduce_policy, data_type >- Min value.MultiReduceMax< multi_reduce_policy, data_type >- Max value.
and two less common bitwise multi-reduction types:
MultiReduceBitAnd< multi_reduce_policy, data_type >- Bitwise ‘and’ of values (i.e.,a & b).MultiReduceBitOr< multi_reduce_policy, data_type >- Bitwise ‘or’ of values (i.e.,a | b).
Note
RAJA::MultiReduceBitAnd and RAJA::MultiReduceBitOr reduction types are designed to work on integral data types because in C++, at the language level, there is no such thing as a bitwise operator on floating-point numbers.
MultiReduction Examples¶
Next, we provide a few examples to illustrate basic usage of RAJA multi-reduction types.
Here is a simple RAJA multi-reduction example that shows how to use a sum multi-reduction type:
const int N = 1000;
const int B = 10;
//
// Initialize an array of length N with all ones, and another array to
// integers between 0 and B-1
//
int vec[N];
int bins[N];
for (int i = 0; i < N; ++i) {
vec[i] = 1;
bins[i] = i % B;
}
// Create a sum multi-reduction object with a size of B, and initial
// values of zero
RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(B, 0);
// Run a kernel using the multi-reduction object
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {
vsum[bins[i]] += vec[i];
});
// After kernel is run, extract the reduced values
int my_vsums[B];
for (int bin = 0; bin < B; ++bin) {
my_vsums[bin] = vsum[bin].get();
}
The results of these operations will yield the following values:
my_vsums[0] == 100
my_vsums[1] == 100
my_vsums[2] == 100
my_vsums[3] == 100
my_vsums[4] == 100
my_vsums[5] == 100
my_vsums[6] == 100
my_vsums[7] == 100
my_vsums[8] == 100
my_vsums[9] == 100
Here is the same example but using values stored in a container:
const int N = 1000;
const int B = 10;
//
// Initialize an array of length N with all ones, and another array to
// integers between 0 and B-1
//
int vec[N];
int bins[N];
for (int i = 0; i < N; ++i) {
vec[i] = 1;
bins[i] = i % B;
}
// Create a vector with a size of B, and initial values of zero
std::vector<int> my_vsums(B, 0);
// Create a multi-reducer initalized with size and values from my_vsums
RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(my_vsums);
// Run a kernel using the multi-reduction object
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {
vsum[bins[i]] += vec[i];
});
// After kernel is run, extract the reduced values back into my_vsums
vsum.get_all(my_vsums);
The results of these operations will yield the following values:
my_vsums[0] == 100
my_vsums[1] == 100
my_vsums[2] == 100
my_vsums[3] == 100
my_vsums[4] == 100
my_vsums[5] == 100
my_vsums[6] == 100
my_vsums[7] == 100
my_vsums[8] == 100
my_vsums[9] == 100
Here is an example of a bitwise-or multi-reduction:
const int N = 128;
const int B = 8;
//
// Initialize an array of length N to integers between 0 and B-1
//
int bins[N];
for (int i = 0; i < N; ++i) {
bins[i] = i % B;
}
// Create a bitwise-or multi-reduction object with initial value of '0'
RAJA::MultiReduceBitOr< RAJA::omp_multi_reduce, int > vor(B, 0);
// Run a kernel using the multi-reduction object
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {
vor[bins[i]] |= i;
});
// After kernel is run, extract the reduced values
int my_vors[B];
for (int bin = 0; bin < B; ++bin) {
my_vors[bin] = vor[bin].get();
}
The results of these operations will yield the following values:
my_vors[0] == 120 == 0b1111000
my_vors[1] == 121 == 0b1111001
my_vors[2] == 122 == 0b1111010
my_vors[3] == 123 == 0b1111011
my_vors[4] == 124 == 0b1111100
my_vors[5] == 125 == 0b1111101
my_vors[6] == 126 == 0b1111110
my_vors[7] == 127 == 0b1111111
The results of the multi-reduction start at 120 and increase to 127. In binary representation (i.e., bits), \(120 = 0b1111000\) and \(127 = 0b1111111\). The bins were picked in such a way that all the integers in a bin had the same remainder modulo 8 so their last 3 binary digits were all the same while their upper binary digits varied. Because bitwise-or keeps all the set bits, the upper bits are all set because at least one integer in that bin set them. The last 3 bits were the same in all the integers so the last 3 bits are the same as the remainder modulo 8 of the bin number.
MultiReduction Policies¶
For more information about available RAJA multi-reduction policies and guidance on which to use with RAJA execution policies, please see MultiReduction Policies.