Searching for reality
They say that every architect has, stuck in his desk drawer, a plan for the world's tallest skyscraper; probably every computer company similarly has a plan for the world's fastest supercomputer. At one time, that particular contest was always won by Seymour Cray. Currently, the world's fastest computer is Tianhe-1A, in China. But one day soon, it's going to be Blue Waters, an IBM-built machine filling 9,000 square feet at the National Center for Supercomputing Applications at the University of Illinois at Champaign-Urbana.
It's easy to forget - partly because Champaign-Urbana is not a place you visit by accident - how mainstream-famous NCSA and its host, UIUC, used to be. NCSA is the place from which Mosaic emerged in 1993. UIUC was where Arthur C. Clarke's HAL was turned on, on January 12, 1997. Clarke's choice was not accidental: my host, researcher Robert McGrath tells me that Clarke visited here and saw the seminal work going on in networking and artificial intelligence. And somewhere he saw the first singing computer, an IBM 7094 haltingly rendering "Daisy Bell." (Good news for IBM: at that time they wouldn't have had to pay copyright clearance fees on a song that was, in 1961, 69 years old.)
So much was invented here: Telnet, for example.
"But what have they done for us lately?" a friend in London wondered.
NCSA's involvement with supercomputing began when Larry Smarr, having worked in Europe and admired the access non-military scientists had to high-performance computers, wrote a letter to the National Science Foundation proposing that the NSF should fund a supercomputing center for use by civilian scientists. They agreed, and the first version of NCSA was built in 1986. Typically, a supercomputer is commissioned for five years; after that it's replaced with the fastest next thing. Blue Waters will have more than 300,000 8-core processors and be capable of a sustained rate of 1 petaflop and a peak rate of 10 petaflops. The transformer room underneath can provide 24 megawatts of power - as energy-efficiently as possible. Right now, the space where Blue Waters will go is a large empty white space broken up by black plug towers. It looks like a set from a 1950s science fiction film.
On the consumer end, we're at the point now where a five-year-old computer pretty much answers most normal needs. Unless you're a gamer or a home software developer, the pressure to upgrade is largely off. But this is nowhere near true at the high end of supercomputing.
"People are never satisfied for long," says Tricia Barker, who showed us around the facility. "Scientists and engineers are always thinking of new problems they want to solve, new details they want to see, and new variables they want to include." Planned applications for Blue Waters include studying storms to understand why some produce tornadoes and some don't. In the 1980s, she says, the data points were kilometers apart; Blue Waters will take the mesh down to 10 meters.
"It's why warnings systems are so hit and miss," she explains. Also on the list are more complete simulations to study climate change.
Every generation of supercomputers gets closer to simulating reality and increases the size of the systems we can simulate in a reasonable amount of time. How much further can it go?
They speculate, she said, about how, when, and whether exaflops can be reached: 2018? 2020? At all? Will the power requirements outstrip what can reasonably be supplied? How big would it have to be? And could anyone afford it?
In the end, of course, it's all about the data. The 500 petabytes of storage Blue Waters will have is only a small piece of the gigantic data sets that science is now producing. Across campus, also part of NCSA, senior research scientist Ray Plante is part of the Large Synoptic Survey Telescope project which, when it gets going, will capture a third of the sky every night on 3 gigapixel cameras with a wide field of view. The project will allow astronomers to see changes over a period of days, allowing them to look more closely at phenomena such as bursters and supernovae, and study dark energy.
Astronomers have led the way in understanding the importance of archiving and sharing data, partly because the telescopes are so expensive that scientists have no choice about sharing them. More than half the Hubble telescope papers, Plante says, are based on archival research, which means research conducted on the data after a short period in which research is restricted to those who proposed (and paid for) the project. In the case of LSST, he says, there will be no proprietary period: the data will be available to the whole community from Day One. There's a lesson here for data hogs if they care to listen.
Listening to Plante - and his nearby colleague Joe Futrelle - talk about the issues involved in storing, studying, and archiving these giant masses of data shows some of the issues that lie ahead for all of us. Many of today's astronomical studies rely on statistics, which in turn requires matching data sets that have been built into catalogues without necessarily considering who might in future need to use them: opening the data is only the first step.
So in answer to my friend: lots. I saw only about 0.1 percent of it.