NERSCPowering Scientific Discovery for 50 Years

Rebecca Hartman-Baker

head shot
Rebecca J. Hartman-Baker Ph.D.
Group Leader
Phone: (510) 486-4810
Fax: (510) 486-6459
User Engagement Group
1 Cyclotron Road
Mail Stop 59R3103
Berkeley, CA 94720 us

Biographical Sketch

Rebecca Hartman-Baker leads the User Engagement Group at NERSC. She is a computational scientist with expertise in the development of scalable parallel algorithms for the petascale and beyond. Her other research interests include inverse and ill-posed problems, numerical optimization methods, and developing effective techniques for training users of HPC resources. She joined NERSC from iVEC in Australia, where she coached two teams to the Student Cluster Competition at the annual Supercomputing conference, led the HPC training program for a time, and was in charge of the decision-making process for determining the architecture of the petascale supercomputer at the Pawsey Supercomputing Centre

She began her career at Oak Ridge National Laboratory, where for her work on the initial implementation of load balancing in MADNESS as a postdoc she was on the R&D100-award winning team. As a scientific computing liaison in the Oak Ridge Leadership Computing Facility, she worked with a wide variety of scientific fields, including chemistry, nuclear physics, and logistics, and led the liaison task in the scientific computing group.

Rebecca earned a PhD in computer science, with a certificate in computational science and engineering, from the University of Illinois at Urbana-Champaign, and a BS in Physics from the University of Kentucky.

Selected Service and Synergistic Activities

Honors and Awards

Journal Articles

Mary Ann Leung, Katharine Cahill, Rebecca Hartman-Baker, Paige Kinsley, Lois Curfman McInnes, Suzanne Parete-Koon, Sreeranjani Ramprakash, Subil Abraham, Lacy Beach Barrier, Gladys Chen, Lizanne DeStefano, Scott Feister, Sam Foreman, Daniel Foreman, Daniel Fulton, Lipi Gupta, Yun He, Anjuli Jain Figueroa, Murat Keceli, Talia Capozzoli Kessler, Kellen Leland, Charles Lively, Keisha Moore, Wilbur Ouma, Michael Sandoval, Rollin Thomas, and Alvaro Vazquez-Mayagoitia, "Intro to HPC Bootcamp: Engaging new communities through energy justice projects", Journal of Computational Science Education, March 2024, Volume 1:49-56, doi: 10.22369/issn.2153-4136/15/1/10

Yun (Helen) He, Rebecca Hartman-Baker, "Best Practices for NERSC Training", Journal of Computational Science Education, April 2022, 13:23-26, doi: 10.22369/issn.2153-4136/13/1/4

Abe Singer, Shane Canon, Rebecca Hartman-Baker, Kelly L. Rowland, David Skinner, Craig Lant, "What Deploying MFA Taught Us About Changing Infrastructure", HPCSYSPROS19: HPC System Professionals Workshop, November 2019, doi: 10.5281/zenodo.3525375

NERSC is not the first organization to implement multi-factor authentication (MFA) for its users. We had seen multiple talks by other supercomputing facilities who had deployed MFA, but as we planned and deployed our MFA implementation, we found that nobody had talked about the more interesting and difficult challenges, which were largely social rather than technical. Our MFA deployment was a success, but, more importantly, much of what we learned could apply to any infrastructure change. Additionally, we developed the sshproxy service, a key piece of infrastructure technology that lessens user and staff burden and has made our MFA implementation more amenable to scientific workflows. We found great value in using robust open-source components where we could and developing tailored solutions where necessary.

Osni A. Marques, David E. Bernholdt, Elaine M. Raybourn, Ashley D. Barker, Rebecca J. Hartman-Baker, "The HPC Best Practices Webinar Series", Journal of Computational Science Education, January 2019, doi: 10.22369/issn.2153-4136/10/1/19

In this contribution, we discuss our experiences organizing the Best Practices for HPC Software Developers (HPC-BP) webinar series, an effort for the dissemination of software development methodologies, tools and experiences to improve developer productivity and software sustainability. HPC-BP is an outreach component of the IDEAS Productivity Project and has been designed to support the IDEAS mission to work with scientific software development teams to enhance their productivity and the sustainability of their codes. The series, which was launched in 2016, has just presented its 22nd webinar. We summarize and distill our experiences with these webinars, including what we consider to be “best practices” in the execution of both individual webinars and a long-running series like HPC-BP. We also discuss future opportunities and challenges in continuing the series.

 

Vetter, Jeffrey S.; Brightwell, Ron; Gokhale, Maya; McCormick, Pat; Ross, Rob; Shalf, John; Antypas, Katie; Donofrio, David; Humble, Travis; Schuman, Catherine; Van Essen, Brian; Yoo, Shinjae; Aiken, Alex; Bernholdt, David; Byna, Suren; Cameron, Kirk; Cappello, Frank; Chapman, Barbara; Chien, Andrew; Hall, Mary; Hartman-Baker, Rebecca; Lan, Zhiling; Lang, Michael; Leidel, John; Li, Sherry; Lucas, Robert; Mellor-Crummey, John; Peltz Jr., Paul; Peterka, Thomas; Strout, Michelle; Wilke, Jeremiah, "Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity", December 2018, doi: 10.2172/1473756

Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman­-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, Woo­Sun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Concurrency and Computation: Practice and Experience, August 2017, 30, doi: 10.1002/cpe.4291

The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon‐Phi “Knights Landing” (KNL) nodes. Compared to the Xeon‐based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine‐grain parallelization; vectorization; and use of the high‐bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented.

Robert J. Harrison, Gregory Beylkin, Florian A. Bischoff, Justus A. Calvin, George I. Fann, Jacob Fosso-Tande, Diego Galindo, Jeff R. Hammond, Rebecca Hartman-Baker, Judith C. Hill, Jun Jia, Jakob S. Kottmann, M-J. Yvonne Ou, Laura E. Ratcliff, Matthew G. Reuter, Adam C. Richie-Halford, Nichols A. Romero, Hideo Sekino, William A. Shelton, Bryan E. Sundahl, W. Scott Thornton, Edward F. Valeev, Álvaro Vázquez-Mayagoitia, Nicholas Vence, Yukina Yokoi, "MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation", SIAM Journal on Scientific Computing, October 27, 2016, 38:S123-S142,

Rebecca J. Hartman-Baker, Daniel J. Grimwood, Valerie Maxville, "Evaluating Parallel Programming Tools to Support Code Development for Accelerators", Procedia Computer Science, 2014, 2076-2079,

Conference Papers

Yao Xu, Zhengji Zhao, Rohan Garg, Harsh Khetawat, Rebecca Hartman-Baker, Gene Cooperman, "MANA-2.0: A Future-Proof Design forTransparent Checkpointing of MPI at Scale", Second International Symposium on Checkpointing for Supercomputing, held in conjunction with SC21. Conference website: https://supercheck.lbl.gov/, November 15, 2021,

Deborah J. Bard, Mark R. Day, Bjoern Enders, Rebecca J. Hartman–Baker, John Riney III, Cory Snavely, Gabor Torok, "Automation for Data-Driven Research with the NERSC Superfacility API", Lecture Notes in Computer Science, Springer International Publishing, 2021, 333, doi: 10.1007/978-3-030-90539-2_22

Gabor Torok, Mark R. Day, Rebecca J. Hartman-Baker, Cory Snavely, "Iris: allocation banking and identity and access management for the exascale era", SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2020, 42:1-11, doi: 10.5555/3433701.3433756

Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman­-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, Woo­Sun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Cray User Group 2017, Redmond, WA. Best Paper First Runner-Up., May 12, 2017,

Mario Melara, Todd Gamblin, Gregory Becker, Robert French, Matt Belhorn, Kelly Thompson, Peter Scheibel, Rebecca Hartman-Baker, "Using Spack to Manage Software on Cray Supercomputers", Cray User Group 2017, 2017,

Rebecca J. Hartman-Baker, Hai Ah Nam, "Optimizing Nuclear Physics Codes on the XT5", Proceedings of CUG 2011, 2011,

Presentation/Talks

Zhengji Zhao, Introduction to the CRI standard 1.0, A BOF presentation in the International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), November 15, 2022,

Tarun Malviya, Zhengji Zhao, Rebecca Hartman-Baker, Gene Cooperman, Extending MPI API Support in MANA, A lightning talk presented in SuperCheck-SC22 held in conjunction with SC22, November 14, 2022,

Rebecca Hartman-Baker, Zhengji Zhao, Checkpoint/Restart Project Update, CR Collaboration Day, August 9, 2022,

Zhengji Zhao, Rebecca Hartman-Baker, Checkpoint/Restart (C/R) Vision at NERSC, C/R project update meeting (internal), August 13, 2021,

Prashant Singh Chouhan, Harsh Khetawat, Neil Resnik, Jain Twinkle, Rohan Garg, Gene Cooperman, Rebecca Hartman-Baker and Zhengji Zhao, Improving scalability and reliability of MPI-agnostic transparent checkpointing for production workloads at NERSC, First International Symposium on Checkpointing for Supercomputing. Conference website: https://supercheck.lbl.gov/archive/supercheck21, February 4, 2021,

Zhengji Zhao, Rebecca Hartman-Baker, and Gene Cooperman, Deploying Checkpoint/Restart for ProductionWorkloads at NERSC, A presentation at SC20 State of the Practice Talks, November 17, 2020,

Rebecca Hartman-Baker, Craypat and Reveal, NERSC New User Training, February 23, 2017,

Rebecca Hartman-Baker, Accounts and Allocations, NERSC New User Training, February 23, 2017,

Rebecca Hartman-Baker, NERSC Overview, NERSC New User Training, February 23, 2017,

Rebecca J. Hartman-Baker, Past, Present, and Future Parallel Programming Paradigms, March 24, 2016,

Reports

Zhengji Zhao, Rebecca Hartman-Baker, "Checkpoint/Restart Vision and Strategies for NERSC’s Production Workloads", https://escholarship.org/uc/item/48v5r5rj, August 20, 2021,