NERSC Users Learn Code Optimization Tips and Tricks at 1st Hackathon
March 6, 2015
A one-day "Hack-A-Thon" held February 25 at NERSC’s Oakland Scientific Facility during the annual NERSC Users Group (NUG) meeting has been deemed a rousing success by organizers and attendees alike.
The event was designed to give users the opportunity to optimize a computationally intensive piece of code—either their own or a sample kernel supplied by NERSC—with an eye toward preparing the code to run well on Cori, the Intel Xeon Phi processor-based system slated to arrive at NERSC in 2016.
During the hackathon, NERSC staff were on hand to provide optimization strategies, help users use coding tools and answer questions. Attendees did the work of refactoring code to make it run faster, and remote viewers were able to follow along and pose questions via a chat interface.
“This was a great learning opportunity for users who are trying to figure out how to optimize their codes for Cori,” said Richard Gerber, NERSC User Services Group lead. They got to learn about optimization strategies and were introduced to new tools that will help them do this.”
For example, many attendees had their first look at Intel’s VTune Amplifier, a performance analysis tool targeted for users developing serial and multithreaded applications. One attendee used VTune to identify a hot loop in his code and to determine whether he was bound by memory bandwidth. The code contained a large number of flops and instructions but was not being vectorized due to a loop dependence:
for (many iterations) {
… many flops ...
et = exp(outcome1)
tt = pow(outcome2,3)
IN = IN * et +tt
}
Experts from NERSC and Intel helped him restructure the code so that the flop-intensive loop no longer contained the IN variable dependence by creating temporary arrays et(:) and tt(:) and then computing the value of variable IN in a separate loop. This change enabled the flop-heavy loop to vectorize and sped up his entire application run by 30 percent.
“When asked at the end how many people would use VTune in the future on their own, I believe all attendees raised their hands,” said Jack DeSlippe, an HPC consultant at NERSC who led the hackathon.
Some attendees also participated in a hackathon competition using NERSC’s canned kernels. Balint Joo of Jefferson National Accelerator Facility was the winner. Balint was able to identify the hotspot in the unoptimized bgw.f90 kernel, add OpenMP and use VTune’s bandwidth collection capability to identify that the source of imperfect OpenMP scaling was poor memory locality. He then reordered loops and eventually got a roughly 12x speedup over the original (non-threaded) code.
Based on the success of the hackathon and feedback from users, NERSC plans to hold additional code optimization hackathons this year.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.