Deep Sky Project Provides a Portal into Data Universe
March 30, 2009
Every night approximately 3,000 astronomical files flow to the National Energy Research Scientific Computing (NERSC) Center from automated sky scanning systems all over the world for archiving. After a decade of collecting, the center currently holds over 8 million images, making this one of the largest troves of ground-based celestial images available.
Now, a multidisciplinary team of astronomers, computer scientists, and engineers from NERSC are collaborating to develop a user-friendly database system and interface to instantly serve up high-resolution cosmic reference images to astronomers around the globe. Called the Deep Sky project, team members say the tools and infrastructure used for this database could eventually help other scientific disciplines share massive datasets as well.
“The whole concept of this project is to efficiently streamline access to massive amounts of data on NERSC’s computers, basically providing a nice portal to astronomy data that we have acquired over the years,” says Peter Nugent, a staff scientist in the Scientific Computing Group within the Computational Research Division (CRD) and the Analytics Group at NERSC. He is also project lead for Deep Sky.
As an astronomer and mathematician, Nugent outlined a strategy for organizing and delivering the data to the astronomical research community, and also developed the algorithms that facilitate the celestial database’s search capabilities. He notes that astronomy is just one of many disciplines supported by NERSC that generate a large amount of data either through simulation, observation or experiments, and these users similarly want to share their data with collaborators all over the world. Nugent believes that these researchers can use the Deep Sky project as a model for building their own extreme data-serving systems in the future.
“One of NERSC’s overriding goals is to help scientists store, process, manage, search and retrieve the ever-vaster data sets that are being produced in almost all areas of science,” says Cecilia Aragon, software infrastructure lead for the Deep Sky project. Aragon is a member of the NERSC Analytics Group and the Computational Research Division’s (CRD’s) Data Intensive Systems Group.
Infrastructure to Serve Data
The automated QUEST Camera, formerly mounted on the 48-inch Oschin Telescope at Palomar Observatory in Southern California, has been one of the biggest contributors to NERSC’s astronomical archive so far. Over the past nine years, QUEST has been snapping pictures of the northern sky nightly. After each observing run, the camera’s observations were sent to the center’s High Performance Storage System (HPSS), where much of it was archived on tape.
Because it takes time to mount and seek tape data, scientists agree that it is not useful for analyzing large datasets. Once an astronomer retrieved the raw tape data, s/he then had to process it, meaning they had to correct the images for distortions caused by the camera and Earth’s atmosphere before they could analyze it.
To streamline this effort and make observations instantly accessible and useful to researchers, members of the Deep Sky team developed a system to automatically copy all of the archived raw data and process it. Then, they incorporated the processed data into the NERSC Global File System, where the information is stored on disks. Similar to the way an MP3 device is able to instantly search a database of music files and immediately play a song, this system allows users to query the Deep Sky database and instantly pull up processed observations for analysis.
“As an astronomer, instant access to the Deep Sky dataset is extremely valuable. I can search the images for potential events – supernovae or gamma ray bursts – by looking for dots that appear for a while and then disappear,” says Nugent. “I can then follow up on any oddities with even more powerful tools like NASA’s Hubble Space Telescope to find out what is really going on.”
According to Randy Kersnick, a member of NERSC’s Software Integration Group who designed the interface, all of the tools used to design and create the Deep Sky database system and interface were open source software that was adapted to suit the needs of this user community. By researching many open source database and interface software tools, members of the Deep Sky team now have substantial knowledge of what can be used and tweaked to serve different scientific datasets to users across a diverse set of research disciplines. In the future, he notes that this will help NERSC instantly serve data to scientists studying everything from climatology to high energy physics.
“The Deep Sky project is not so much about archiving data; rather, the point is to actually make data accessible to the whole astronomical community,” says Janet Jacobsen, the project’s database infrastructure lead and member of the NERSC Analytics Team and CRD Visualization Group.
Astronomical Impact
The true value of a cosmic image lies in the number of photons, or light, that it contains. The further away and object is, the longer it will take for its light to reach Earth. Astronomers learn more about objects in deep space by capturing more of their photons with a variety of telescopes. If multiple images are taken of the same object, or patch of sky, astronomers will typically digitally layer the images, one on top of another – essentially combining the photons of a variety of images to get one clear picture. This process is called “co-adding.”
According to Nugent, the temporal coverage in the NERSC archive is completely unique because the telescopes that contribute to this database have captured tens to hundreds of pictures of the same patch of sky over 9 years. All of these images are co-added before entering the Deep Sky database on the NERSC global file system, allowing researchers to instantly access high-resolution, high-quality reference images.
“Co-adding a variety of pictures of the same piece of sky not only allows us to better see faint objects, it also allows us to spot changes in the sky. If a bright spot has moved from one image to the next, it could be an asteroid, comet or planet; if a spot appears in a few images and then disappears, it could be cosmic event like a supernova,” says Nugent.
“The Deep Sky system is going to be quite important as the Palomar Transient Factory (PTF) progresses because it shows us everything that was observed before and allows us to follow up on what is new,” says Robert Quimby, software lead of the Palomar Transient Factory and postdoctoral fellow at the California Institute of Technology (Caltech).
The PTF is an automated camera that replaces QUEST on the Oschin Telescope at Palomar Observatory. It will scan the northern sky nightly in search of fleeting cosmic events, like supernova explosions and gamma ray bursts. In addition to the PTF, the Synoptic All-Sky Infrared Survey (SASIR) will also add to Deep Sky’s collection of astronomical data. SASIR telescopes will search for the source of the universe’s expansion. The QUEST camera which was formerly mounted at Palomar Observatory is currently en route to the La Silla Schmidt telescope in Chile, where it will start up again observing the southern skies and continue funneling nightly observations to NERSC.
A prototype of the Deep Sky system is currently available. The entire production will be launched later this year. The Deep Sky Project was partially funded by the Scientific Discovery through Advanced Computing (SciDAC) Computational Astrophysicis Consortium, a project led by Stan Woosley of the University of California at Santa Cruz.
Most of the astronomical data archived at NERSC was collected by the Nearby Supernova Factory (SNfactory), a project that seeks to measure the accelerating expansion of the universe with Type Ia supernovae. As part of its collaborations with the Near Earth Asteroid Tracking (NEAT) team at JPL/Caltech from 1999 to 2003 and the Palomar-QUEST Consortium from 2004 to 2008 -- the SNfactory project worked with DOE's Energy Sciences Network (ESnet) and the High Energy Research and Education Network (HPWREN), which is supported by the National Science Foundation, to establish a high-speed network connection between the Palomar Observatory and NERSC. For almost a decade, this link enabled nightly observations to be funneled into the center's mass storage system for archiving. Future missions, like PTF, will also be able utilize the Palomar-NERSC connection to send thousands of cosmic images directly into the center's archive.
ESnet is a high-speed network that connects more than 40 DOE research facilities across the US and thousands of scientists around the world. It is a highly-reliable, capability-class, feature-rich networking infrastructure specifically tailored to the needs of science. All data stored at NERSC traverses ESnet on its way to researchers all over the globe.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.