NERSCPowering Scientific Discovery for 50 Years

Introduction to Scientific I/O

The HDF5 Library

The Hierarchical Data Format v5 (HDF5) library is a portable I/O library used for storing scientific data in a database-like organization. HDF5's 'object database' data model enables users to focus on high-level concepts of relationships between data objects rather than descending into the details of the specific layout of every byte in the data file. Additional information can be found in the HDF5 Tutorial from the HDF Group.

An HDF5 file has a root group / under which you can add groups, datasets of various shapes, single-value attributes, and links among groups and datasets. The HDF5 library provides a 'logical view' of the file's contents as a graph, as in Figure 3.1. The flexibility to create arbitrary graphs of objects allows HDF5 to express a wide variety of data models. The library transparently handles how these logical objects of the file map to the actual bytes in the file. In many ways, HDF5 provides an abstracted filesystem-within-a-file that is portable to any system with the HDF5 library, regardless of the underlying storage type, filesystem, or conventions about binary data (i.e. 'endianness').

Figure 3.1. An example HDF5 file showing an attribute, a group, and two datasets. Datasets are stored as flattened arrays within the file, and attribute and group information is part of the HDF5 metadata (not to be confused with the filesystem's metadata for the file).

 

 

 

 

 

 

 

 

 

 

 

 

The example file shown in Figure 3.1 could be created with the following C code:

#include <hdf5.h>

/* create the file */
file_id = H5Fcreate("myfile.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

/* create attribute0 */
space_id = H5Screate(H5S_SCALAR);
attr_id = H5Acreate(file_id, "attribute0", H5T_NATIVE_INT32, space_id, H5P_DEFAULT);
H5Awrite(attr_id, H5T_NATIVE_INT32, 42);
H5Aclose(attr_id);
H5Sclose(space_id);

/* create dataset0 */
space_id = H5Screate_simple(rank, dims, maxdims);
dset_id = H5Dcreate(file_id, "dataset0", H5T_NATIVE_FLOAT, space_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
H5Dwrite(dset_id, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, somedata0);
H5Dclose(dset_id);
H5Sclose(space_id);

/* create group0 */
group_id = H5Gcreate(file_id, "/group0", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
/* and dataset1 */
space_id = H5Screate_simple(rank, dims, maxdims);
dset_id = H5Dcreate(group_id, "dataset1", H5T_NATIVE_FLOAT, space_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
H5Dwrite(dset_id, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, somedata1);
H5Dclose(dset_id);
H5Sclose(space_id);
H5Gclose(group_id);

/* finished! */
H5Fclose(file_id);