From the September 2018 issue

Seeking the unknown in cosmic data

In an era of large-scale surveys, citizen science projects, and new machine-learning techniques, unexpected discoveries should be expected.
By | Published: September 24, 2018
Using the Australian Square Kilometre Array Pathfinder, Ray Norris plans to carry out the Evolutionary Map of the Universe (EMU) survey to investigate radio sources in the sky. Although 2.5 million radio sources are already known, EMU expects to take on another 70 million more.
SKA organization

In 1964, astronomers Arno A. Penzias and Robert W. Wilson found themselves cleaning pigeon poop out of the Holmdel Horn Antenna, a radio telescope in New Jersey. The data from the instrument had weird, persistent noise that they couldn’t get rid of. They tried looking for places where errant radiation signals could sneak in, and even redesigned a part of the telescope, but the noise endured. When nothing else seemed to work, they trapped two pigeons that had taken up roost in the telescope and scrubbed out their droppings, yet the noise remained. Unbeknownst to them, they were trying to remove a fundamental signature of our universe — the cosmic microwave background.

Just over a decade later, Wilson and Penzias won the Nobel Prize in Physics for their serendipitous discovery of cosmic microwave background (CMB) radiation. Although the pair initially had been searching for a halo around the Milky Way, they instead found the first light of the universe, left over from right after the Big Bang, when photons of light were just bursting forth.

Wilson and Penzias aren’t alone in making fortuitous breakthroughs. Indeed, unexpected discoveries are almost a hallmark of astronomy. William Herschel discovered Uranus in 1781 while looking for binary stars, originally identifying the planet as a comet. Engineer Linda Morabito found volcanoes on Io while fiddling with image contrast to better see the background stars behind the jovian moon. Physicist Karl Jansky stumbled upon radio waves emanating from the center of the Milky Way while trying to improve trans-Atlantic phone calls. A U.S. spy satellite originally detected gamma-ray bursts while looking for covert nuclear bomb explosions in the 1960s. And these are just a few examples.

While astronomy has progressed through dogged reconciliation of theory and observation, it has also greatly benefited from things no one could have expected at the time. Two decades on, the Hubble Space Telescope has fulfilled its key goals, but it has also discovered proplyds (a type of planet-forming disk around a young star), unveiled dark energy, and showed that a seemingly empty section of the night sky was actually burgeoning with untold numbers of galaxies. Nobody expected these discoveries when Hubble launched.

The 15-meter Holmdel Horn Antenna was built in 1959 at Bell Laboratories in Holmdel, New Jersey, with the goal of performing pioneering work related to satellite communications. While attempting to use the antenna for research in 1964, radio astronomers Robert Wilson and Arno Penzias accidentally made a discovery worthy of the Nobel Prize in Physics.

With rapid advances in technology, astronomy is emerging into an era of big surveys. Ultra-high-resolution imaging and new collection techniques now allow for unprecedented amounts of data to be recorded and stored. Astronomers are already becoming overwhelmed with more data than they have time to process. When scientists barely have the time to search the data for what they’re looking for, how can they be expected to catch the paradigm-shifting details no one could have imagined?

This new era of big-data astronomy requires a new way of looking at data, and astronomers are developing ways to improve their chances of discovering the unexpected. Perhaps the way to remain on the cutting-edge of astronomy is to look where no one has looked before.

Leaving no stone unturned

Many unexpected discoveries in astronomy were possible simply because of new technology. Galileo’s telescope allowed for unprecedented views of the sky, uncovering shocking surprises. Similarly, the Hubble Space Telescope allowed astronomers to look deeper into the universe than ever before, revealing unimaginable phenomena. Now, Ray Norris — an astronomer at Western Sydney University and the Commonwealth Scientific and Industrial Research Organisation in Australia — will tackle an under-observed section of the universe with a new survey using radio waves.

The Evolutionary Map of the Universe (EMU) survey will use the Australian Square Kilometre Array Pathfinder to study radio sources in the night sky. Its goal is to combine breadth and depth to seek out fainter sources spread across a wider field of view than previously attempted. Currently, 2.5 million radio sources are known. EMU expects to find 70 million more.

Radio sources are often among the most energetic and explosive objects in the sky. Black holes, supernovae, and rapidly rotating neutron stars (pulsars) are all known to emit radio waves. The EMU survey expects to find many objects in the early universe — some of known types and some new — that can tell us how the first stars and galaxies formed. And Norris has spent a lot of time thinking about how best to make those unexpected discoveries.

“As telescopes develop, we’ll be getting more and more data,” says Norris. “The problem is finding the things you don’t expect, which hide among the things you can recognize and the noise in the data.”


Background and instrumental noise, as Penzias and Wilson know all too well, can be hard to quantify. As telescopes and instruments become increasingly complex, it becomes harder to understand the signatures they leave in the data. But being able to distinguish between noise and small, unexpected signals is key in the search for the unknown.

To tease out the unexpected discoveries, Norris is helping develop a project known as the Widefield ouTlier Finder, or WTF. With the specific goal of aiding in unexpected discoveries, WTF will use complex algorithms and cloud computing to pull out unusual signals in the data and reduce the huge quantities of data to manageable amounts.

Seek and you will find

If you can’t find somewhere new to look, you can try looking harder than anyone else. This method paid off for astronomer Jocelyn Bell. While a graduate student at the University of Cambridge, Bell was charged with studying data from quasars — distant active galaxies — coming from a radio telescope. Amid all the signals, she noticed a source that varied too fast to be a quasar. She had discovered a new type of star: pulsars.

This technique is the basis of surveys like the Large Synoptic Survey Telescope (LSST), which will survey the entire sky every few nights from its location in north-central Chile.

“There are some quantifiable reasons why everyone believes LSST to be a major revolution for very rare objects and events,” says Željko Ivezić, project scientist of LSST and professor of astronomy at the University of Washington in Seattle. “The volume, the high dimensionality of measurements, and the measurement precision all bode well for unexpected discoveries and discoveries of rare objects and events.”

A comparison of the field of views for a typical 8-meter-class telescope and the uniquely designed Large Synoptic Survey Telescope (LSST).
LSST Corporation; Astronomy: Roen Kelly

Set to begin watching the night sky in 2021, LSST will provide continual surveillance of the heavens in an exhaustive manner. With a wide field of view the size of 40 Full Moons, LSST will take images at multiple wavelengths ranging from visible to near-infrared. Every clear night, it will log as much as 30 terabytes of data. Over its 10-year expected lifetime, LSST plans on imaging each section of the sky a thousand times, creating more than 30 trillion observations of 40 billion celestial objects.

This style of observation will naturally single out objects that change in brightness, such as pulsars, supernovae, and distant quasars, as well as moving objects, like asteroids and other small bodies in our solar system. The scientists hope it may also help identify new variable activity in the night sky.

Large-scale surveys have been carried out before, but never to the extent that LSST will go. Previously, the most extensive survey was the Sloan Digital Sky Survey (SDSS), which imaged only a quarter of the sky. LSST will use a telescope nearly three times larger, providing twice the resolution across a wider range of wavelengths of light, and it will view a greater portion of the cosmos.

With the immense amounts of data LSST will produce nightly, it will be essential for researchers to stay on top of the imagery. For this, LSST scientists have developed systems that automatically process images by looking for differences between two exposures of the same section of the sky that were taken at different times. This automation will allow any changes to be noticed within a minute. However, the road to understanding the flagged events will still require hours upon hours of analysis.

“We need to develop these tools so that they can operate on these quadrillions of numbers,” says Ivezić. “Today we have these tools if you want to apply them to millions or even billions of objects. But if you want to scale them up by a factor of a thousand, it’s not a trivial thing. These tools can mean the difference between amorphous piles of ones and zeros and potentially paradigm-shifting discoveries.”

In 2014, NASA released this updated version of the iconic Hubble Ultra Deep Field image. The original zoomed in on a tiny section of apparently empty sky in the Southern Hemisphere using both visible and near-infrared light. For Hubble Ultra Deep Field 2014, astronomers collected and included ultraviolet data, which helps reveal the youngest, largest, and hottest stars in the universe.
NASA, ESA, H. Teplitz and M. Rafelski (IPAC/Caltech), A. Koekemoer and Z. Levay (STScI), R. Windhorst (Arizona State University)
Crowdsourcing astronomy

Astronomers have determined that as many as 400 billion stars exist in our galaxy, while likely hundreds of billions of galaxies exist in the observable universe. And, with the help of new large-scale surveys, these numbers could keep growing. Thanks to computers, scientists no longer have to hand-count dots on photographic plates. But even with machines, there is still far more data out there than any scientific cohort, no matter how dedicated, can tackle.

Enlisting the public’s help in scientific endeavours dates back more than a century to birdwatchers tracking aviary migration patterns across North America. But it wasn’t until the rise of the internet and the online gaming culture that citizen science projects really took off. The idea is simple: Engage the public by having them explore real images to identify simple objects or patterns in a fun, gamelike way. With citizen science, the types of routine analyses that would typically require months of work by a few scientists can now be done by many more science enthusiasts at their leisure.

One of the first groups to enlist the public’s help at the data processing stage was a team of scientists at NASA’s Ames Research Center. Using data collected by the Viking orbiters, which were sent to Mars in the 1970s, the team developed ClickWorkers, an online site where the public could identify and map craters on the martian surface, in 2000. The initial results showed the public was both enthusiastic about helping and capable of performing tasks accurately. Soon after, the project was expanded.

“The majority of people participated because they wanted to be a part of research,” says Lucy Fortson, an astrophysicist at the University of Minnesota who has worked extensively with citizen science projects. “They felt that they wanted to do something meaningful with their extra time.”

Today there are numerous citizen science projects in astronomy, such as CosmoQuest, Milky Way Project, and perhaps most famously, Galaxy Zoo. In Galaxy Zoo, the public is asked to identify the type of galaxy shown: Is it a disk? Is it edge-on? Is there a central bulge? These features can be quickly identified by eye, but natural variations can make them exceedingly difficult for computers to recognize and categorize.

“Humans are actually very well designed to picking out serendipitous discoveries in image datasets,” Fortson says. “By virtue of evolution, humans have developed this amazing visual cortex that can differentiate the unknown unknowns from the knowns.”

Of course, using the untrained public doesn’t come without its challenges. People make mistakes. Luckily, the large number of people involved in the identification can be used to create averages and a group consensus, which, over the long run, can be even more accurate than a single scientist’s identification. In Galaxy Zoo, 40 different individuals examine each galaxy to create a trusted identification. By carefully processing the results, individual people can even be weighted differently depending on their identification success rate. In this way, people whose identifications generally don’t agree with the group consensus can be flagged for rejection, so they don’t skew the end results.

Rise of the machines

Once the masses have identified and categorized thousands of images, significant work remains to analyze the data. This is where computers finally come in. These machines are the heavy lifters, allowing for complex calculations and comparisons that the human brain would be hard-pressed to match on its own. While machines historically can only do exactly what they are told, a subset of computers are being taught to think on their own.

Astronomers are using a type of artificial intelligence, called machine learning, to get computers to teach themselves how to find patterns in the data. A specific method of machine learning known as artificial neural networks was designed based on how the brain functions. These neural networks draw connections in vast webs of data, just as the human brain does. To create these networks, a scientist starts by showing the computer a “training set,” which is a series of examples containing what the computer is looking for — such as spiral galaxies. Over time, and with enough examples, the computer will become adept at identifying spiral galaxies, despite their wide range of appearances. At this point, the scientist can provide the computer with a sample of unidentified galaxies, and the machine will return those that fit the criteria it has assessed.


Machines can also be taught a much more difficult task: assessing how objects and their characteristics relate to one another. For example, scientists have used artificial neural networks to investigate how galaxies form clusters and how that grouping affects the numbers of stars the galaxies produce. Only with the assistance of computers are the scientists able to compare the many physical properties at play, such as galaxy mass, distance between galaxies, and previous interactions between galaxies. And by comparing many hundreds of thousands of galaxies, scientists are able to make broad conclusions about our universe that are unbiased by small irregularities.

When encoded properly, artificial neural networks can provide profound insight to scientists; however, they can also be easily misused. For example, if the training set is not extensive enough, the computer will draw the wrong conclusions. Or, as astronomers are fond of repeating, “Garbage in, garbage out.”

The other drawback to artificial neural networks is that they require vast datasets to “learn” from. Luckily, in the era of large-scale surveys, vast datasets are common. This means that artificial neural networks can quickly turn the problem of too much data into an advantage. The larger the training set — which citizen scientists can help bolster — the better the results.

The future of unexpected discoveries

“Our ability to collect these humongous datasets is developing in parallel with our ability to interpret these huge datasets,” says Ivezić. “Both directions are important — people who collect data and people who develop tools to analyze and interpret. Otherwise we’d just be stuck with a huge pile of zeros and ones we couldn’t make sense out of.”

With the combination of large-scale surveys, a legion of citizen scientists, and new machine learning techniques, it seems many new unexpected discoveries will soon emerge from the darkness. But as for the nature of those discoveries? Only time can tell.

Mara Johnson-Groh is a science writer and photographer who writes about everything under the Sun, and even things beyond it.