A Standard for Neuroscience Data
Berkeley Lab researchers have developed a computational framework for standardizing neuroscience data worldwide
December 16, 2014
Contact: Linda Vu, +1 510 495 2402, lvu@lbl.gov
Thanks to standardized image file formats—like JPEG, PNG or TIFF—which store information every time you take a digital photo, you can easily share selfies and other pictures with anybody connected to a computer, mobile phone or the Internet. Nobody needs to download any special software to see your picture.
But in many science fields—like neuroscience—sharing data isn’t that simple because no standard data format exists. So in November 2014, the Neurodata without Borders initiative—which is supported by the Kavli Foundation, GE, Janelia Farm, Allen Institute for Brain Science and the International Neuroinformatics Coordinating Facility (INCF)—hosted a hackathon to consolidate ideas for designing and implementing a standard neuroscience file format. BRAINformat, a neuroscience data standardization framework developed at the Lawrence Berkeley National Laboratory (Berkeley Lab), is among the candidates selected for further investigation. It is now a strong contender to contribute to and develop a community-wide data format and storage standard for the neuroscience research community. (BRAINformat is free to use and can be downloaded from the web.)
“This issue of standardizing data formats and sharing files isn’t unique to neuroscience. Many science areas, including the global climate community, have grappled with this,” says Oliver Ruebel, Berkeley Lab Computational Scientist who developed BRAINformat. “Sharing data allows researchers to do larger, more comprehensive studies. This in-turn increases confidence in scientific results and ultimately leads to breakthroughs.”
In conjunction with this work, Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC) is also working with Jeff Teeters and Fritz Sommer of the Redwood Center for Theoretical Neuroscience at UC Berkeley on the Collaborative Research Computational Neuroscience (CRCNS) data-sharing portal, which will allow neuroscience researchers worldwide to easily share files without having to download any special software.
Both BRAINformat and CRCNS are being developed as part of a tri-institutional partnership between Berkeley Lab, UC Berkeley and UC San Francisco (UCSF). The computational tools could also help facilitate the White House’s Brain Research through Advancing Innovative Neurotechnologies Initiative, the BRAIN Initiative.
Dealing With the Deluge of Brain Data
In 2013, President Barack Obama challenged the neuroscience community to gain fundamental insights into how the mind develops and functions, and discover new ways to address brain diseases and trauma. He called this, the BRAIN Initiative.
This work is expected to generate a deluge of data for the neuroscience community. After all, measuring activity from a fraction of neurons in the brain of a single mouse could generate almost as much data as the 17-mile-long Large Hadron Collider. So before researchers can even begin taking measurements, they must first develop a standard format for labeling and organizing data, sharing files, and scaling up analytical and visualization methods and software to handle massive amounts of information.
“Neuroscience is currently a field of individual principle investigators, doing individual experiments, and analyzing that data on customized software. This means that data is stored in many different formats and described in different ways, which hinders community access to data,” says Kristofer Bouchard, a neuroscientist at Berkeley Lab. “As data volumes grow, we are going to need more people to look at the same data in different ways.”
Berkeley Lab is actively seeking ways to expand its contribution to the BRAIN Initiative, and as a scientist in the Computational Research Division (CRD) Ruebel is familiar with helping scientists from a variety of disciplines organize, store, access, analyze, share and massive complex datasets.
To come up with a convention for labeling, organizing, storing and accessing neuroscience data, Ruebel worked closely with Bouchard for applications from UCSF neurosurgeon Edward Chang and Berkeley Lab physicist Peter Denes to design BRAINformat using open source Hierarchical Data Format (HDF) technologies. Over the last 15 years, HDF has helped a variety of scientific disciplines organize and share their data. One prominent user of HDF is NASA's Earth Observing System, the primary data repository for understanding global climate change.
In addition to data format standardization, HDF is also optimized to run on supercomputers. So by building BRAINformat on this technology, neuroscientists will be able to use supercomputers to process and analyze their massive datasets.
“This work really highlights the unique strength of a Berkeley Lab, UC Berkeley and UCSF partnership,” says Denes. “UCSF is renowned for its clinical and experimental neuroscience experience with in vivo cortical electrophysiology; UC Berkeley contributes world-class expertise in theoretical neuroscience, statistical learning and data analysis; and Berkeley Lab brings supercomputing and applied mathematics expertise together with electronics and micro- and nano-fabrication.”
Denes heads Berkeley Lab’s contingent of the tri-institutional partnership to develop instrumentation and computational methods for recording neuroscience data. In addition to developing tools to deal with the data deluge, the BRAIN Initiative is also going to require new hardware to collect more data at higher-resolution, and process it in real-time. Researchers will also need novel algorithms for analyzing data. The tri-institutional partnership is also leveraging tools and expertise from different areas of science to tackle these challenges as well.
“Berkeley Lab’s strength has always been in science of scale,” says Prabhat, Berkeley Lab computational scientist. “Over the years, many science areas have struggled with issues of file format standardization, as well as managing and sharing massive datasets, and our staff built similar infrastructures for them. This isn’t a new problem, with BRAINformat and the CRCNS portal we’ve just extended these solutions to the field of neuroscience.”
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.