Heidelberg Talk Tells How to "Fool the Masses"
June 29, 2004
HEIDELBERG, Germany — David Bailey, chief technologist for the Computational Research and NERSC Center divisions, delivered a tongue-in-cheek yet still serious presentation here on June 25, reminding attendees at the 2004 International Supercomputer Conference that hype and exaggeration still loom large in the field. As an invited speaker, Bailey drew one of the largest and most enthusiastic audiences of the conference to his talk on "12 Ways to Fool the Masses." His talk during the "Future Trends" session was an update of one of his best-known papers.
According to Bailey, the tendency to exaggerate or misrepresent data about the performance of supercomputers really took off during the early days of parallel computing. Extravagant claims about performance were pervasive, but it wasn't the first time that scientists had fooled themselves. As examples, Bailey cited research on the speed of light, which in 1945 was determined to be higher than the figures cited by sceintists in the 1930s and early 1940s. The speed of light didn't change during that time, but the research became more precise. Among the factors that can lead researchers to believe false findings are sloppy methodology and analysis, pervasive biases and wishful thinking, Bailey said.
Between 1990 and 1994, the high-performance computing community was subject to "extravagant hyping of successes" and a tendency to be positive while ignoring the shortcomings of the early parallel computing systems, Bailey said. But such tendencies helped lead to the downfall of some vendors and a cutback in research funding from the government.
One example cited by Bailey, and which drew appreciative nods and chuckles from the audience, was a claim by a scientist that his research was performed on a 65,536-processor computer. Under questioning, the author admitted he had used a system with only 8,192 processors, and then had multiplied his performance figures by a factor of eight.
Other examples included unfair comparisons of tuned codes run on parallel systems vis-a-vis untuned codes run on vector computers and the use of inefficient algorithms which resulted in artificially high megaflop/s rates. And when all else fails, "just show pretty pictures and animations and don't talk about performance," he said. Bailey said such practices ranged from stretching the truth to "outright fraud." In fact, he noted, scientists were sometimes as guilty as vendor marketing departments in distorting performance figures.
Today, such hype and overselling of capabilties can be found in discussions about scientific grids, he added. The key to avoiding such pitfalls? Design and use intelligent benchmarks that provide honest and consistent performance data on a variety of systems, he said. View slides from David Bailey's talk.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.