The DASH Multidimensional Array (NArray)
Instantiation
dash::NArray
is an N-dimensional array container class template. Its
constructor requires at least two template arguments, one for the
element type (int
, double
, ...) and one for the dimension (N
).
The following example creates a two-dimensional integer matrix with 40 rows and 30 columns:
dash::NArray<int, 2> matrix(40, 30); // 1200 elements
The number of required runtime constructor arguments depends on the
dimension of the dash::NArray
. Except for a default constructed
dash::NArray
object, which requires no arguments and is used for
delayed allocation (see [Delayed Allocation]), we need to specify at
least the extents in each dimension. This can be achieved by either
passing an object of type dash::SizeSpec<N>
or by giving the extent
in each dimension individually. For example:
// use SizeSpec
dash::SizeSpec<3> sspec(20, 30, 40); // 20x30x40 elements
dash::NArray<int, 3> mat1(sspec);
dash::NArray<int, 3> mat2(dash::SizeSpec<3>(5, 5, 5));
// specify extents directly, 10000 elem in a 4D cube
dash::NArray<int, 4> mat3(10, 10, 10, 10);
// unallocated matrices, separate .allocate() call needed
dash::NArray<int, 3> mat4, mat5;
// .allocate() accepts similar inputs as the constructor
mat4.allocate(20, 40, 10); xxx
mat5.allocate(sspec); xxx
Further optional template arguments to specify the index type and
storage order (also called memory arrangement) can be provided
when instantiating the dash::NArray
object. An example
multidimensional array with column-major layout and long as the index
type is shown below:
dash::NArray<int, 5, dash::COL_MAJOR, long>
matrix(100, 100, 100, 100, 100) ; // 10^10 or approx. 2^33.2 elements
In this example, the number of elements exceeds the range of a 32 bit
index type (the default is ssize_t
which can be changed using a
build option in DASH) and a 64 bit index type is thus required.
Additionally, DASH offers both column-major and row-major storage for the elements (row-major is the default). These options determine the way in which the multi-dimensional index space is linearized and mapped on to the one-dimensional memory that computers work with.
For two-dimensional arrays this linearization can either happen row by row (aka. row-major storage) or column by column (aka. column-major storage). For arbitrary dimensions these definitions can be suitably extended and row-major then means that elements $(i, j,..., n)$ and $(i, j,..., n + 1)$ are stored next to each other, while in the case of column-major storage $(i, j,..., n)$ and $(i + 1, j, ...)$ are stored next to each other (not taking into account any possible data distribution among multiple units).
Fig. xxx visualizes the layout of elements in a two-dimensional
dash::NArray
, both with row-major storage. A final optional
template argument specifies the type of the data distribution pattern
used by the dash::NArray
to determine the mapping of elements to
units and finding their storage location. The following example shows
the most general form, where all template parameters are explicitly
specified.
using T = int; // element value type
const int NDim = 2; // number of dimensions
const auto Arr = dash::COL_MAJOR; // memory arrangement
using PatT = dash::Pattern<NDim>; // pattern type
using IndexT = PatT::index_type; // index type
dash::NArray<T, NDim, Arr, IndexT, PatT> mat;
A dash::NArray
is always allocated over a dash::Team
, i.e., a
group of units that contribute storage to hold the data for the
container. A team can be specified as an optional last constructor
argument. If no team is explicitly specified, it defaults to
dash::Team::All()
, the group of all units that exits when the
program starts.
dash::Team & t = dash::Team::All().split (2);
// 100x100 elements allocated over t:
dash::NArray<int,2> mat(100,100,t);
Data Distribution
DASH offers a wide range of data distribution schemes using the
dash::Pattern
class template. Each NArray
has an associated
pattern, which can be queried with the .pattern()
method. The
pattern is set when the NArray
is created and a user-defined pattern
can be specified explicitly:
dash::Pattern<2> pat = ...
dash::Matrix<2,int> mat(pat);
A pattern's job is to provide the mapping between local and global
index spaces, taking into account data distribution and memory
layout. This functionality is needed by the NArray
and the other
DASH containers for accessing and iterating over their elements. Note
that In contrast to a container, a pattern is independent of an
element type. Its construction is not a collective operation and
does not cause any global memory allocation.
DASH offers several types of predefined data distribution
patterns. The most basic pattern is the blocked pattern, where each
unit holds one or more contiguous blocks of elements. A blocked
pattern can be instantiated using the dash::Pattern
class template.
// 2D pattern, blocked in the first dimension
dash::Pattern<2> pat1(16, 10, dash::BLOCKED, dash::NONE);
// 2D pattern, blocked in the second dimension
dash::Pattern<2> pat2(16, 10, dash::NONE, dash::BLOCKED);
// 2D default blocked pattern
dash::Pattern<2> pat3(16,10); // same as pat1
Per default the pattern is blocked (dash::BLOCKED
) in the first
dimension and there is no distribution in the other dimensions
(dash::NONE
). Two examples for a blocked pattern are shown in
Fig. \ref{fig:blocked_pattern}. In both
cases, the pattern specifies a two-dimensional $16\times 10$ index
space that is distributed over four units. In the left example, the
first dimension is specified as dash::BLOCKED
while no distribution
is requested in the second dimension, so each unit receives 4 complete
rows. In the second example, the blocking is done over the second
dimension. Note that the block size is determined by dividing the
number of columns by the number of units and rounding up to the next
integer. So the number of block columns in this example is $\left
\lceil{10/4}\right \rceil = 3$ and the last unit (unit 3) is
underfilled.
More details on the DASH pattern concept can be found in Sect. xxx.
Accessing Elements
The individual elements of a NArray
can be accessed in one of the
following ways:
- By specifying
N
coordinate values, e.g.,mat[i][j][k]
andmat(i,j,k)
. - By using a linear index, e.g.,
mat.elem(33)
. - By dereferencing an iterator, e.g.,
*(mat.begin())
.
Access using Coordinates
An element in an N-dimensional array mat
can be accessed by
specifying its N-dimensional coordinate vector
$(i_0,i_1,\dots,i_{N-1})$. Naturally, all coordinate values must be
within the allowed range for each dimension, i.e., $0\leq{}i_k<$
mat.extent(k)
.
To compute the sum of all elements in a three-dimensional matrix one could for example use the following loop nest:
for( auto i=0; i<mat.extent(0); i++ ) {
for( auto j=0; j<mat.extent(1); j++ ) {
for( auto k=0; k<mat.extent(2); k++ ) {
sum+=mat[i][j][k];
}
}
}
Note that mat[i][j][k]
is interpreted as ((mat[i])[j])[k]
. From
left to right, each application of a subscript operator returns a
matrix view with a dimension reduced by one until the last subscript
operates on a one-dimensional view and returns a global reference to
the requested element (see Sect. xxx for a discussion of global
references).
Another and potentially more efficient way to access an element is to
directly specify the coordinates in Fortran style by using round
brackets in the form mat(i, j, k)
, which yields the same element as
mat[i][j][k]
. Besides N individual coordinates, the coordinate
vector can be also be specified as a std::array
of length N or by
using an initializer list with N arguments. The same forms of input
are also supported for the .at()
method. The difference is that
.at
performs a range check on each access and throws an xxx
exception if an out-of-bounds condition is encountered. The following
example illustrates these options to access a matrix using
coordinates.
dash::Matrix<int, 3> mat(10, 10, 10);
int i=2, j=3, k=5;
std::array<int, 3> coords = {i, j, k};
auto r1 = mat[i][j][k]; // chained subscripts
auto r2 = mat.at(i, j, k); // N arguments
auto r3 = mat.at(coords); // std::array
auto r4 = mat.at({i, j, k}); // initializer list
auto r5 = mat(i, j, k); // N arguments
auto r6 = mat(coords); // std::array
auto r7 = mat({i, j, k}); // initializer list
Access with Linear Indices and Iterators
The elements in a DASH NArray
mat
can be accessed in their
linearized storage order by using the mat.elem(gidx)
method. gidx
is a global linear index with $0\leq{}$ gidx
$<$ mat.size()
. The
following example initializes all elements of a two-dimensional
NArray
to their position in the linear storage order.
// 2D pattern, 8 rows and 5 columns
dash::Pattern<2, dash::ROW_MAJOR> pat1(8, 5); // row major
dash::Pattern<2, dash::COL_MAJOR> pat2(8, 5); // column major
dash::Matrix <int, 2, ...> mat1(pat1);
dash::Matrix <int, 2, ...> mat2(pat2);
if( myid==0 ) {
// set the i-th element in the respective storage order to 'i'
for(int i=0; i<mat1.size(); ++i ) { mat1.elem(i)=i; } @\warn@
for(int i=0; i<mat2.size(); ++i ) { mat2.elem(i)=i; } @\warn@
}
dash::barrier();
// mat1: mat2:
// 0 1 2 3 4 0 8 16 24 32
// 5 6 7 8 9 1 9 17 25 33
// 10 11 12 13 14 2 10 18 26 34
// 15 16 17 18 19 3 11 19 27 35
// 20 21 22 23 24 4 12 20 28 36
// 25 26 27 28 29 5 13 21 29 37
// 30 31 32 33 34 6 14 22 30 38
// 35 36 37 38 39 7 15 23 31 39
A similar linear iteration over all elements in a matrix can be achieved using the iterator interface. In analogy to the container classes offered by the STL the DASH containers offer iterator types and the ability to walk over the elements contained in the matrix. For example:
using matrix_t = dash::Matrix<int, 2>;
matrix_t mat(20, 10);
// search for '42' staring from the back
bool found = false;
for (matrix_t::reverse_iterator it = mat.rbegin();
it != mat.rend(); ++it) { @\warn@
if ((*it) == 0) {
found = true; break;
}
}
// set all elements to 0, starting from the front
for (matrix_t::iterator it = mat.begin(); it != mat.end(); ++it) {
(*it) = 0;
}
Working with Local Data
DASH provides distributed data structures where each processing unit can conveniently access any data element. However, accessing remote data may require significantly more time and energy than working on locally stored data. This is true in distributed memory systems, where data has to be transmitted over a network connection, but the same consideration also holds on shared memory NUMA systems, where memory access times depend on data placement. To efficiently support both NUMA and distributed memory systems, DASH thus offers a strong notion of data locality. Each element in a container has a well defined owning unit and it is straightforward to implement the so-called owner-computes pattern, where several units collectively operate on a container, each one handling its locally stored elements. Data distribution patterns (see Sec. xxx ) define the home location for each element when a DASH container is instantiated.
To determine if an element is local to the calling unit, the method
is_local()
is available. is_local()
can be used with global
references and global pointers and returns whether the referenced
object resides in the calling unit's local memory. For example, in the
following code fragment all units iterate over all elements in an NArray
matrix, but only the owning unit prints a line stating its
ownership.
auto myid = dash::myid();
dash::Matrix<int, 2> matrix(8, 15);
for (int i = 0; i < matrix.extent(0); i++) {
for (int j = 0; j < matrix.extent(1); j++) {
if (matrix(i, j).is_local()) {
cout << "Element (" << i << "," << j << ") is "
<< "owned by me! (unit " << myid << ")" << std::endl;
}
}
}
// prints the following:
// Element (0,0) is owned by me! (unit 0)
// Element (2,0) is owned by me! (unit 1)
// Element (4,0) is owned by me! (unit 2)
// ...
Note that while this approach realizes the owner-computes pattern, it
is not very efficient, since each unit inspects all elements and only
then finds out whether to print a message or not. A more effective way
to exploit data locality is to only iterate over the locally stored
elements in the first place. In DASH this can be achieved constructing
a container's local view using the .local specifier. Using
.local` on a container restricts the access to only those
elements that are local to a unit and, since global pointers and
references are no longer needed in the purely local setting, instead
of returning a global reference, a regular C++ reference to the
element is returned when using the local view:
dash::Matrix <double, 2> mat(8,15);
mat.local(0,0) = 33; // set element at local coordinates (0,0) to 33
auto loc = mat.local; // this unit's local view on the matrix
loc(1,1) = foo(); // set local elem (1,1) to result of foo()
cout << loc.elem(5); // print the sixth local element
// get the address of the first local element
double *ptr = &(loc.elem(0));
It is important to note that when using the local view specifier, all
coordinates and indices used to access elements have local meaning as
well. I.e., mat.local.elem(0)
is the first element stored locally,
mat.local.at({3,5})
is the element with local coordinates $(3,5)$,
etc. Other than the different semantics, the same element access
operations that are available for global addressing can also be used
for the local view, including iterator-based access.
Since iterating over all locally stored elements is a common
operation, the dash::NArray
template class offers a shorthand to get
the local begin and end iterators: mat.lbegin()
is short
for mat.local.begin()
and mat.lend()
is short
for mat.local.end()
. Naturally these iterators can be used
with C++ 11 range-based for loops:
// set all my local elements to my DASH unit ID
for( auto it=mat.lbegin(); it!=mat.lend(); ++it ) {
(*it)=dash::myid();
}
// print local portion of the matrix using a range-based loop
for( auto el : mat.local )
cout<<el<<" ";
cout<<endl;
Fig.~xxx shows an example for global and local addressing. Here, a $(7
\times 4)$ matrix of char
elements is distributed cyclic by rows
among three units. The global view is identical for all units and
allows element access by global coordinates or global index. For
example mat(1,3)
is the same as mat.elem(7)
and resolves to
element 'h'
. The local view is different for each unit so that
mat.local(1,3)
takes on the values 'p'
, 't'
, or 'x'
for unit
0, 1, and 2, respectively.
Should it become necessary to manually map between local and global
indices or coordinates, the container's pattern can be used. For
example, the global index for a unit's first local matrix element can
be determined as follows: gidx = mat.pattern().global(0)
. See
Sect.~xxx for more information on the mapping
functions offered by a pattern.