The LSST and big data science

A new kind of telescope will need a new kind of astronomer.
By | Published: December 15, 2017 | Last updated on May 18, 2023
A depiction of what the completed LSST observatory will look like atop El Peñon summit, Chile.
Construction of the Large Synoptic Survey Telescope (LSST) in Chile is about halfway between first brick and first light. Its 3-ton camera, built with National Science Foundation support, will be the largest digital instrument ever built for ground-based astronomy and will take pictures fast enough to capture the entire southern sky every three nights.  According to a TED talk by Andy Connolly, Professor of Astronomy at the University of Washington and Team Lead for LSST Simulations, the Hubble Space Telescope would need 120 years to image an equivalent area of sky.

Imaging at this rate will generate about 15 terabytes (15 trillion bytes) of raw data per night and 30 petabytes over its 10-year survey life. (A petabyte is approximately the amount of data in 200,000 movie-length DVDs.) Even after processing, that’s still a 15 PB (15,000 TB) store.

Such huge datasets will give astronomers a ten-year time-lapse “movie” of the southern sky, yielding new subject matter for time-domain studies and a deeper understanding of the dynamic behavior of the universe.  It will also change the way science is done – astronomer-and-telescope is giving way to astronomer-and-data as an engine of new knowledge.

Preparing the information

The LSST’s biggest strength may be its ability to capture transients – rare or changing events usually missed in narrow-field searches and static images. The good news is that software will alert astronomers almost immediately when a transient is detected to enable fast follow-up observations by other instruments. The not-so-good news is that up to 10 million such events are possible each night. With detection rates like these, good data handling is essential.

An innovative method developed by the LSST Data Management team will allow the storage of large volumes of data for rapid access.
The LSST Data Management Team is designing user tools that can operate on a variety of computing systems without the need for large downloads, all based on open-source software.  Their system includes two basic types of products: those produced for nightly observing and those produced for annual science releases.

Nightly processing will subtract two exposures of the each image field to quickly highlight changes. The data stream from the camera will be pipeline processed and continuously updated in real time, with a transient alert triggered within 60 seconds of completing an image readout.

Data complied into scheduled science releases will get considerable reprocessing to ensure that all contents are consistent, that false detections are filtered and that faint signal sources are confirmed.  Reprocessing will also classify objects using both standard categories (position, movement, brightness, etc.) and dimensions derived mathematically from the data themselves. Products will be reprocessed at time intervals from nightly to annually, which means that their quality will improve as additional observations are accumulated.

Preparing the science

The LSST program includes Science Collaborations, teams of scientists and technical experts that work to grow the observatory’s science agendas. There are currently eight collaborations in such areas as galaxies, dark energy and active galactic nuclei. One of the most unique, however, is the Informatics and Statistics Science Collaboration (ISSC) which, unlike other teams, doesn’t focus on a specific astronomy topic but cuts across them all. New methods will be needed to handle heavy computational loads, to optimize data representations, and to guide astronomers through the discovery process. The ISSC focus is on such new approaches to ensure that astronomers realize the best return from the anticipated flood of new data.

“Data analysis is changing because of the volume of data we’re facing,” says Kirk Borne, an astrophysicist and data scientist with Booz Allen Hamilton, and a core member of the ISSC. “Traditional data analysis is more about fitting a physical model to observed data. When I was growing up, we didn’t have sample sizes like this. We were trying to understand a particular phenomenon with our small sample sets. Now, it’s more unsupervised. Instead of asking ‘tell me about my model,’ you ask ‘tell me what you know.’ Data become the model, which means that more is different.”  

LSST data will almost certainly expand the chances for surprise. “When we start adding different measurement domains like gravitational wave physics and neutrino astrophysics for exploration,” adds Borne, “we start seeing these interesting new associations. Ultraluminous infrared galaxies are connected with colliding starbursting galaxies, for example, but it was a discovery made by combining optical radiation with infrared. Quasars were discovered when people compared bright radio observations of galaxies with optical images of galaxies.”

A depiction of the observatory interior.
Preparing the people

The LSST Data Management Team is starting to orient the astronomy community to what’s coming with a series of conferences and workshops. “We try to cover as many meetings as we can, giving talks and hosting hack sessions,” says William O’Mullane, the team Project Manager.

Science notebooks, which allow users to collaborate, analyze data and publish their results online, will be an integral tool for LSST research communities and one that’s being introduced early. “We rolled out Jupyterlab [an upgraded type of science notebook] at a recent workshop,” he adds, “which is a much faster way to get people working with the stack [the image manipulation code set].”

The next generation of big data astronomers is also being groomed through graduate curricula and a special fellowship program. “Getting students involved early is a very good thing, both for the field and for them,” says Mario Juric, Associate Professor of Astronomy at the University of Washington, and the LSST Data Management System Science Team Coordinator. “Students need to understand early on what it’s like to do large-scale experiments, to design equipment and software, and to collaborate with very large teams. Astronomy today is entering the age of big data just like particle physics did 20 or 30 years ago.

“We also have a Data Science Fellowship Program,” adds Juric, “a cooperative effort a few of us initiated in 2015 to educate the next generation of astronomer data scientists through a two-year series of workshops.” The program is funded by the LSST Corporation, a non-profit organization dedicated to enabling science with the telescope, and student interest has been intense. Only about a dozen people were admitted from among 200 applicants in a recent selection cycle.

Artist’s conception of telescope optics, with people shown for scale.
Telescope data are being packaged for a wide audience, too. The LSST Education and Public Outreach (EPO) program is working to involve classrooms, citizen scientists and the general public as deeply in big data astronomy as they want (or dare) to go. Primary EPO goals are to help educators integrate real LSST data into classrooms and introductory astronomy courses, and to help non-specialists access LSST data in ways similar to those of professional astronomers.  Working through platforms like Zooniverse, almost anyone will be able to conduct serious research projects.  “Citizen volunteers should be thought of as members of the science collaboration,” says Amanda Bauer, Head of LSST EPO.

The future IS the data

The LSST will cement an age where software is as critical to astronomy as the telescope. “When I was in graduate school,” says Juric, “I worked on the Sloan Digital Sky Survey (SDSS) and I didn’t touch a telescope; I did all my research out of a database. I know many students who have done the same. So we’re already seeing that kind of migration.”

O’Mullane would agree. “Large surveys like SDSS, Gaia and now LSST provide enough data for a different approach,” he says. “Astronomers are not always reaching for a telescope. In fact, missions like LSST basically only offer you the archive; you can’t even request the observatory to make a specific observation.”

Observatory construction on the El Peñon summit, Chile as of November 2017.
Given the enormous information streams that LSST will deliver, it soon won’t be possible for scientists to directly look at even a representative fraction of available data. Instead, they’ll increasingly rely on skillful manipulation of algorithms to examine relationships within the entirety of a dataset. The best insights will be obtained by those who ask the best questions of all those numbers.  

And, because more people will have ready access to those data, the biggest discoveries may come not only from the professionals, but from dedicated amateurs working at home on their laptops.