Local Array

This section introduces RAJA local arrays. A RAJA::LocalArray is an array object with one or more dimensions whose memory is allocated when a RAJA kernel is executed and only lives within the scope of the kernel execution. To motivate the concept and usage, consider a simple C example in which we construct and use two arrays in nested loops:

for(int k = 0; k < 7; ++k) { //k loop

 int a_array[7][5];
 int b_array[5];

  for(int j = 0; j < 5; ++j) { //j loop
    a_array[k][j] = 5*k + j;
    b_array[j] = 7*j + k;
  }

  for(int j = 0; j < 5; ++j) { //j loop
    printf("%d %d \n",a_array[k][j], b_array[j]);
  }

}

Here, two stack-allocated arrays are defined inside the outer ‘k’ loop and used in both inner ‘j’ loops.

This loop pattern may be also be written using RAJA local arrays in a RAJA::kernel_param kernel. We show this next, and then discuss its constituent parts:

//
// Define two local arrays
//

using RAJA_a_array = RAJA::LocalArray<int, RAJA::Perm<0, 1>, RAJA::SizeList<5,7> >;
RAJA_a_array kernel_a_array;

using RAJA_b_array = RAJA::LocalArray<int, RAJA::Perm<0>, RAJA::SizeList<5> >;
RAJA_b_array kernel_b_array;


//
// Define the kernel execution policy
//

using POL = RAJA::KernelPolicy<
              RAJA::statement::For<1, RAJA::seq_exec,
                RAJA::statement::InitLocalMem<RAJA::cpu_tile_mem, RAJA::ParamList<0, 1>,
                  RAJA::statement::For<0, RAJA::seq_exec,
                    RAJA::statement::Lambda<0>
                  >,
                  RAJA::statement::For<0, RAJA::seq_exec,
                    RAJA::statement::Lambda<1>
                  >
                >
              >
            >;


//
// Define the kernel
//

RAJA::kernel_param<POL> ( RAJA::make_tuple(RAJA::TypedRangeSegment<int>(0,5),
                                           RAJA::TypedRangeSegment<int<(0,7)),
                          RAJA::make_tuple(kernel_a_array, kernel_b_array),

  [=] (int j, int k, RAJA_a_array& kernel_a_array, RAJA_b_array& kernel_b_array) {
    a_array(k, j) = 5*k + j;
    b_array(j) = 5*k + j;
  },

  [=] (int j, int k, RAJA_a_array& a_array, RAJA_b_array& b_array) {
    printf("%d %d \n", kernel_a_array(k, j), kernel_b_array(j));
  }

);

The RAJA version defines two RAJA::LocalArray types, one two-dimensional and one one-dimensional and creates an instance of each type. The template arguments for the RAJA::LocalArray types are:

  • Array data type

  • Index striding order (see View and Layout for details)

  • Array dimensions

The local array instances are passed to the kernel in a tuple after the iteration space tuple.

The kernel policy is a two-level nested loop policy (see Complex Loops (RAJA::kernel) for information about RAJA kernel policies) with a statement type RAJA::statement::InitLocalMem inserted between the nested ‘For’ statements, which allocates the memory for the local arrays when the kernel executes. The InitLocalMem statement type has two parameters. One for the memory type RAJA::cpu_tile_mem, and one for specifying which parameter tuple entries correspond to the local arrays RAJA::ParamList<0, 1>. The local array initialization is done in the first lambda expression, and the local array values are printed in the second lambda expression.

Note

RAJA::LocalArray types support arbitrary dimensions and extents in each dimension.

Memory Policies

RAJA::LocalArray supports CPU stack-allocated memory and CUDA or HIP GPU shared memory and thread private memory. See Local Array Memory Policies for a discussion of available memory policies.