Atomic Operations¶
RAJA provides portable atomic operations that can be used to update values at arbitrary memory locations while avoiding data races. They are described in this section.
Note
All RAJA atomic operations are in the namespace RAJA
.
Note
Each RAJA atomic operation is templated on an atomic policy type, which must be compatible with the execution policy used by the kernel in which it is used. For example, in a CUDA kernel, a CUDA atomic policy type must be used.
For more information about available RAJA atomic policies, please see Atomic Policies.
Note
RAJA support for CUDA atomic operations may be specific to the compute architecture for which the code is compiled. Please see CUDA Atomics Architecture Dependencies for more information.
RAJA currently supports two different implementations of atomic operations via the same basic interface. The default implementation is the original one developed in RAJA and which has been available for several years. Alternatively, one can choose an implementation based on DESUL at compile time. Please see DESUL Atomics Support for more information. Eventually, we plan to deprecate the original RAJA implementation and provide only the DESUL implementation. The RAJA atomic interface is expected to change when we switch over to DESUL atomic support. Specifically, the atomic policy noted above will no longer be used.
Please see the following tutorial sections for detailed examples that use RAJA atomic operations:
Atomic Operations¶
RAJA atomic support the most common atomic operations.
Note
Each atomic method described below returns the value of the potentially modified argument (i.e., *acc) immediately before the atomic operation is applied, in case a user requires it.
Arithmetic¶
atomicAdd< atomic_policy >(T* acc, T value)
- Addvalue
to\*acc
.atomicSub< atomic_policy >(T* acc, T value)
- Subtractvalue
from\*acc
.
Min/max¶
atomicMin< atomic_policy >(T* acc, T value)
- Set\*acc
to min of\*acc
andvalue
.atomicMax< atomic_policy >(T* acc, T value)
- Set\*acc
to max of\*acc
andvalue
.
Increment/decrement¶
atomicInc< atomic_policy >(T* acc)
- Add 1 to\*acc
.atomicDec< atomic_policy >(T* acc)
- Subtract 1 from\*acc
.atomicInc< atomic_policy >(T* acc, T compare)
- Add 1 to\*acc
if\*acc
<compare
, else set\*acc
to zero.atomicDec< atomic_policy >(T* acc, T compare)
- Subtract 1 from\*acc
if\*acc
!= 0 and\*acc
<=compare
, else set\*acc
tocompare
.
Bitwise operations¶
atomicAnd< atomic_policy >(T* acc, T value)
- Bitwise ‘and’ equivalent: Set\*acc
to\*acc
&value
. Only works with integral data types.atomicOr< atomic_policy >(T* acc, T value)
- Bitwise ‘or’ equivalent: Set\*acc
to\*acc
|value
. Only works with integral data types.atomicXor< atomic_policy >(T* acc, T value)
- Bitwise ‘xor’ equivalent: Set\*acc
to\*acc
^value
. Only works with integral data types.
Replace¶
atomicExchange< atomic_policy >(T* acc, T value)
- Replace\*acc
withvalue
.atomicCAS< atomic_policy >(T* acc, Tcompare, T value)
- Compare and swap: Replace\*acc
withvalue
if and only if\*acc
is equal tocompare
.
Here is a simple example that shows how to use an atomic operation to compute an integral sum on a CUDA GPU device:
//
// Use CUDA UM to share data pointer with host and device code.
// RAJA mechanics work the same way if device data allocation
// and host-device copies are done with traditional cudaMalloc
// and cudaMemcpy.
//
int* sum = nullptr;
cudaMallocManaged((void **)&sum, sizeof(int));
cudaDeviceSynchronize();
*sum = 0;
RAJA::forall< RAJA::cuda_exec<BLOCK_SIZE> >(RAJA::TypedRangeSegment<int>(0, N),
[=] RAJA_DEVICE (int i) {
RAJA::atomicAdd< RAJA::cuda_atomic >(sum, 1);
});
After this kernel executes, the value reference by ‘sum’ will be ‘N’.
AtomicRef¶
RAJA also provides an interface similar to the C++20 std::atomic_ref
,
but which works for arbitrary memory locations. The class
RAJA::AtomicRef
provides an object-oriented interface to the
atomic methods described above. For example, after the following operations:
double val = 2.0;
RAJA::AtomicRef<double, RAJA::omp_atomic > sum(&val);
sum++;
++sum;
sum += 1.0;
the value of ‘val’ will be 5.
CUDA Atomics Architecture Dependencies¶
The implementations for RAJA atomic operations may vary depending on which CUDA architecture is available and/or specified when RAJA is configured for compilation. The following rules apply when the following CUDA architecture level is chosen:
CUDA architecture is lower than `sm_35`
Certain atomics will be implemented using CUDA atomicCAS (Compare and Swap).
CUDA architecture is `sm_35` or higher
CUDA native 64-bit unsigned atomicMin, atomicMax, atomicAnd, atomicOr, atomicXor are used.
CUDA architecture is `sm_60` or higher
CUDA native 64-bit double atomicAdd is used.
DESUL Atomics Support¶
RAJA provides the ability to use DESUL Atomics as an alternative to the default implementation of RAJA atomics. DESUL atomics are considered an experimental feature in RAJA at this point and may impact the performance of some atomic functions. While DESUL atomics typically yields better or similar performance to RAJA default atomics, some atomic operations may perform worse when using DESUL.
To enable DESUL atomics, pass the option to CMake when configuring a RAJA
build: -DRAJA_ENABLE_DESUL_ATOMICS=On
.
Enabling DESUL atomics alters RAJA atomic functions to be wrapper-functions for their DESUL counterparts. This removes the need for user code changes to switch between DESUL and RAJA implementations for the most part. The exception to this is when RAJA atomic helper functions are used instead of the backward-compatible API functions specified by Atomic Operations. By helper functions, we mean the RAJA atomic methods which take an atomic policy object as the first argument, instead of specifying the atomic policy type as a template parameter.
DESUL atomic functions are compiled with the proper back-end implementation
based on the scope in which they are called, which removes the need to specify
atomic policies for target back-ends. As a result, atomic policies such as
RAJA::cuda_atomic
or RAJA::omp_atomic
are ignored when DESUL is
enabled, but are still necessary to pass in as parameters to the RAJA API.
This will likely change in the future when we switch to use DESUL atomics
exclusively and remove the default RAJA atomic operations.