Iteration Spaces: IndexSets and Segments

Key RAJA features shown in this example:

  • RAJA::forall loop execution template
  • RAJA::RangeSegment (i.e., RAJA::TypedRangeSegment) iteration space construct
  • RAJA::TypedListSegment iteration space construct
  • RAJA::IndexSet iteration construct and associated execution policies

The example uses a simple daxpy kernel and its usage of RAJA is similar to previous simple loop examples. The example focuses on how to use RAJA index sets and iteration space segments, such as index ranges and lists of indices. These features are important for applications and algorithms that use indirection arrays for irregular array accesses. Combining different segment types, such as ranges and lists in an index set allows a user to launch different iteration patterns in a single loop execution construct (i.e., one kernel). This is something that is not supported by other programming models and abstractions and is unique to RAJA. Applying these concepts judiciously can increase performance by allowing compilers to optimize for specific segment types (e.g., SIMD for range segments) while providing the flexibility of indirection arrays for general indexing patterns.

Note

For the following examples, it is useful to remember that all RAJA segment types are templates, where the type of the index value is the template argument. So for example, the basic RAJA range segment type is RAJA::TypedRangeSegment<T>. The type RAJA::RangeSegment used here (for convenience) is a type alias for RAJA::TypedRangeSegment<RAJA::Index_type>, where the template parameter is a default index type that RAJA defines.

For a summary discussion of RAJA segment and index set concepts, please see Indices, Segments, and IndexSets.

RAJA Segments

In previous examples, we have seen how to define a contiguous range of loop indices [0, N) with a RAJA::RangeSegment object and use it in a RAJA loop execution template to run a loop kernel over the range. For example:

  RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, N), [=] (IdxType i) {
    a[i] += b[i] * c;
  });

We can accomplish the same result by enumerating the indices in a RAJA::TypedListSegment object. Here, we assemble the indices in a standard vector, create a list segment from that, and then pass the list segment to the forall execution template:

  std::vector<IdxType> idx;
  for (IdxType i = 0; i < N; ++i) {
    idx.push_back(i); 
  } 

  ListSegType idx_list( &idx[0], idx.size(), host_res );

  RAJA::forall<RAJA::seq_exec>(idx_list, [=] (IdxType i) {
    a[i] += b[i] * c;
  });

Note that we are using the following type aliases:

using IdxType = RAJA::Index_type;
using ListSegType = RAJA::TypedListSegment<IdxType>;

Recall from discussion in Indices, Segments, and IndexSets that RAJA::Index_type is a default index type that RAJA defines and which is used in some RAJA constructs as a convenience for users who want a simple mechanism to apply index types consistently.

It is important to understand what happens when using list segments. During loop execution, indices stored in the list segment are passed to the loop body one-by-one, effectively mimicking an indirection array except that the indirection does not appear in the loop body. For example, we can reverse the order of the indices, run the loop with a new list segment object, and get the same result since the loop is data-parallel:

  std::reverse( idx.begin(), idx.end() ); 

  ListSegType idx_reverse_list( &idx[0], idx.size(), host_res );

  RAJA::forall<RAJA::seq_exec>(idx_reverse_list, [=] (IdxType i) {
    a[i] += b[i] * c;
  });

Alternatively, we can also use a RAJA strided range segment to run the loop in reverse by giving it a stride of -1. For example:

  RAJA::forall<RAJA::seq_exec>(RAJA::RangeStrideSegment(N-1, -1, -1), 
    [=] (IdxType i) {
    a[i] += b[i] * c;
  });

The fact that RAJA always passes loop index values to lambdas in a kernel explains why we can run a kernel with multiple segment types in a single RAJA construct as we discuss next.

RAJA IndexSets

The RAJA::TypedIndexSet template is a container that can hold any number of segments, of the same or different types. An index set object can be passed to a RAJA loop execution method, just like a segment, to run a loop kernel. When the loop is run, the execution method iterates over the segments and the loop indices in each segment. Thus, the loop iterates can be grouped into different segments to partition the iteration space and iterate over the loop kernel chunks (defined by segments), in serial, in parallel, or in some specific dependency ordering. Individual segments can be executed in serial or parallel.

When an index set is defined, the segment types it may hold must be specified as template arguments. For example, here we create an index set that can hold list segments. Then, we add the list segment we created earlier to it and run the loop:

  RAJA::TypedIndexSet<ListSegType> is1;

  is1.push_back( idx_list );  // use list segment created earlier.
  
  RAJA::forall<SEQ_ISET_EXECPOL>(is1, [=] (IdxType i) {
    a[i] += b[i] * c;
  });

You are probably wondering: What is the ‘SEQ_ISET_EXECPOL’ type used for the execution policy?

Well, it is similar to execution policy types we have seen up to this point, except that it specifies a two-level policy – one for iterating over the segments and one for executing the iterates defined by each segment. In the example, we specify that we should do each of these operations sequentially by defining the policy as follows:

  using SEQ_ISET_EXECPOL = RAJA::ExecPolicy<RAJA::seq_segit,
                                            RAJA::seq_exec>;

Next, we perform the daxpy operation by partitioning the iteration space into two range segments:

  RAJA::TypedIndexSet<RAJA::RangeSegment> is2;
  is2.push_back( RAJA::RangeSegment(0, N/2) );
  is2.push_back( RAJA::RangeSegment(N/2, N) );
  
  RAJA::forall<SEQ_ISET_EXECPOL>(is2, [=] (IdxType i) {
    a[i] += b[i] * c;
  });

The first range segment is used to run the index range [0, N/2) and the second is used to run the range [N/2, N).

We can also break up the iteration space into three segments, 2 ranges and 1 list:

//
// Collect indices in a vector to create list segment
//
  std::vector<IdxType> idx1;
  for (IdxType i = N/3; i < 2*N/3; ++i) {
    idx1.push_back(i);
  }

  ListSegType idx1_list( &idx1[0], idx1.size(), host_res );

  RAJA::TypedIndexSet<RAJA::RangeSegment, ListSegType> is3;
  is3.push_back( RAJA::RangeSegment(0, N/3) );
  is3.push_back( idx1_list );
  is3.push_back( RAJA::RangeSegment(2*N/3, N) );
 
  RAJA::forall<SEQ_ISET_EXECPOL>(is3, [=] (IdxType i) {
    a[i] += b[i] * c;
  });

The first range segment runs the index range [0, N/3), the list segment enumerates the indices in the interval [N/3, 2*N/3), and the second range segment runs the range [2*N/3, N). Note that we use the same execution policy as before.

Before we end the discussion of these examples, we demonstrate a few more index set execution policy variations. To run the previous three segment code by iterating over the segments sequentially and executing each segment in parallel using OpenMP multithreading, we would use this policy definition:

  using OMP_ISET_EXECPOL1 = RAJA::ExecPolicy<RAJA::seq_segit,
                                             RAJA::omp_parallel_for_exec>;

If we wanted to iterate over the segments in parallel using OpenMP multi-threading and execute each segment sequentially, we would use the following policy:

  using OMP_ISET_EXECPOL2 = RAJA::ExecPolicy<RAJA::omp_parallel_for_segit,
                                             RAJA::seq_exec>;

Finally, to iterate over the segments sequentially and execute each segment in parallel on a GPU using either CUDA or HIP kernel, we would use a policy, such as:

  using CUDA_ISET_EXECPOL = RAJA::ExecPolicy<RAJA::seq_segit,
                                             RAJA::cuda_exec<CUDA_BLOCK_SIZE>>;

or:

  using HIP_ISET_EXECPOL = RAJA::ExecPolicy<RAJA::seq_segit,
                                            RAJA::hip_exec<HIP_BLOCK_SIZE>>;

The file RAJA/examples/tut_indexset-segments.cpp contains working code for these examples.