Scientific I/O in HDF5
We will now show how HDF5 can be used to store the following three types of datasets commonly found in scientific computing:
- rectilinear 3D grids with balanced, cubic partitioning;
- rectilinear 3D grids with unbalanced, rectangular partitioning; and
- contiguous but size-varying arrays, such as those found in adaptive mesh refinement.
Balanced 3D Grids
Assume we have a 3D grid of known dimension grid_size[3] and a partitioning partition[3] of the grid into equally sized blocks that are distributed across MPI tasks. We can write a task's block into an HDF5 file using the following code:
/* map the 1D MPI rank to a block in the 3D grid */
grid_index[0] = mpi_rank % partition[0];
grid_index[1] = (mpi_rank / partition[0]) % partition[1];
grid_index[2] = mpi_rank / (partition[0] * partition[1]);
/* calculate the size of the block (same for all tasks) */
block_size[0] = grid_size[0] / partition[0];
block_size[1] = grid_size[1] / partition[1];
block_size[2] = grid_size[2] / partition[2];
/* the file will hold the entire grid */
filespace = H5Screate_simple(3, grid_size, NULL);
/* while the hyperslab specifies only the block */
offset[0] = grid_index[0] * block_size[0];
offset[1] = grid_index[1] * block_size[1];
offset[2] = grid_index[2] * block_size[2];
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL, block_size, NULL);
Unbalanced 3D Grids
Now suppose that not all tasks share the same block size. If we are only given the dimension of the block on each task as block_size[3], we will need to gather the block sizes from all tasks to be able to calculate the offsets. The offsets for a given block are simply the sum of block sizes that precede the block in each dimension. These can be calculated with the following code:
MPI_Allgather(block_size, 3, MPI_INT, all_block_sizes, 3, MPI_INT, MPI_COMM_WORLD);
for (i=0; i<grid_indx[0]; i++) {
offset[0] += all_block_sizes[3*get_rank_of_block(i, grid_index[1], grid_index[2]) + 0];
}
for (i=0; i<grid_indx[1]; i++) {
offset[1] += all_block_sizes[3*get_rank_of_block(grid_index[0], i, grid_index[2]) + 1];
}
for (i=0; i<grid_indx[2]; i++) {
offset[2] += all_block_sizes[3*get_rank_of_block(grid_index[0], grid_index[1], i) + 2];
}
Note that we can calculate the MPI rank of a given grid index using the formula:
get_rank_of_block(i,j,k) = i + j * partition[0] + k * partition[0] * partition[1]
1D AMR Arrays
Storing a flattened 1D AMR array is similar to the unbalanced 3D grid case. If each MPI task has nelems in its local array, then MPI_Scan can be used to find the offset into the array on disk as follows:
memspace = H5Screate_simple(1, &nelems, NULL);
MPI_Allreduce(&nelems, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
filespace = H5Screate_simple(1, &sum, NULL);
MPI_Scan(&nelems, &offset, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, &offset, NULL, &nelems, NULL);