Atomic Operations: Computing a Histogram¶
This section contains an exercise file RAJA/exercises/atomic-histogram.cpp
for you to work through if you wish to get some practice with RAJA. The
file RAJA/exercises/atomic-histogram_solution.cpp
contains complete
working code for the examples discussed in this section. You can use the
solution file to check your work and for guidance if you get stuck. To build
the exercises execute make atomic-histogram
and make atomic-histogram_solution
from the build directory.
Key RAJA features shown in this exercise are:
RAJA::forall
kernel execution template and execution policies
RAJA::TypedRangeSegment
iteration space constructRAJA atomic add operation and RAJA atomic operation policies
The example uses an integer array of length ‘N’ randomly initialized with values in the interval [0, M).
constexpr int M = 20;
constexpr int N = 100000;
int* array = memoryManager::allocate<int>(N);
int* hist = memoryManager::allocate<int>(M);
for (int i = 0; i < N; ++i) {
array[i] = rand() % M;
}
Each kernel iterates over the array and accumulates the number of occurrences of each value in [0, M) in another array named ‘hist’. The kernels use atomic operations for the accumulation, which allow one to update a memory location referenced by a specific address in parallel without data races. The example shows how to use RAJA portable atomic operations and that they are used similarly for different programming model back-ends.
Note
Each RAJA atomic operation requires an atomic policy type parameter that must be compatible with the execution policy for the kernel in which it is used. This is similar to the reduction policies we described in Sum Reduction: Vector Dot Product.
For a complete description of supported RAJA atomic operations and atomic policies, please see Atomic Operations.
All code snippets described below use the stride-1 iteration space range:
RAJA::TypedRangeSegment<int> array_range(0,N);
Here is the OpenMP version:
RAJA::forall<RAJA::omp_parallel_for_exec>(array_range, [=](int i) {
RAJA::atomicAdd<RAJA::omp_atomic>(&hist[array[i]], 1);
});
One is added to a slot in the ‘bins’ array when a value associated
with that slot is encountered. Note that the RAJA::atomicAdd
operation uses an OpenMP atomic policy, which is compatible with the OpenMP
kernel execution policy.
The CUDA and HIP versions are similar:
RAJA::forall<RAJA::cuda_exec<CUDA_BLOCK_SIZE>>(array_range, [=] RAJA_DEVICE (int i) {
RAJA::atomicAdd<RAJA::cuda_atomic>(&hist[array[i]], 1);
});
and:
RAJA::forall<RAJA::hip_exec<HIP_BLOCK_SIZE>>(array_range, [=] RAJA_DEVICE (int i) {
RAJA::atomicAdd<RAJA::hip_atomic>(&hist[array[i]], 1);
});
Here, the atomic add operations uses CUDA and HIP atomic policies, which are compatible with the CUDA and HIP kernel execution policies.
Note that RAJA provides an auto_atomic
policy for easier usage and
improved portability. This policy will choose the proper atomic operation
for the execution policy used to run the kernel. Specifically, when OpenMP
is enabled, the OpenMP atomic policy will be used, which is correct in a
sequential or OpenMP execution context. Otherwise, the sequential atomic
policy will be applied. Similarly, if it is encountered in a CUDA or HIP
execution context, the corresponding GPU back-end atomic policy
will be applied.
For example, here is the CUDA version that uses the ‘auto’ atomic policy:
RAJA::forall<RAJA::cuda_exec<CUDA_BLOCK_SIZE>>(array_range, [=] RAJA_DEVICE (int i) {
RAJA::atomicAdd<RAJA::auto_atomic>(&hist[array[i]], 1);
});
and the HIP version:
RAJA::forall<RAJA::hip_exec<HIP_BLOCK_SIZE>>(array_range, [=] RAJA_DEVICE (int i) {
RAJA::atomicAdd<RAJA::auto_atomic>(&hist[array[i]], 1);
});
The same CUDA and HIP kernel execution policies as in the previous examples are used.