2025 Summer Research Projects
Berkeley Lab’s Computing Sciences summer program offers college undergraduates, graduate students, and faculty opportunities to collaborate with NERSC staff on various science and technology research projects.
NERSC staff started posting projects in January 2025 for the upcoming summer. This page will be updated as more projects open and others close.
How to Apply
To apply to an open project, email the project mentor(s) directly.
Mentors determine who may join a research project, but interns should also fill out a Computing Sciences Area summer employment application. This makes them eligible to be hired by mentors if they are selected. Being accepted to a project is not the same as being hired and on the payroll. It is important to complete both steps.
Prospective summer interns should submit employment applications as soon as possible, even before being accepted to a project. Because Computing Sciences uses a single application across all its summer programs, interns’ applications may also be considered for research projects sponsored by the Energy Sciences Network (ESnet), Scientific Data (SciData) Division, and the Applied Mathematics and Computational Research (AMCR) Division. Likewise, prospective summer interns may apply directly to multiple programs across all Computing Sciences Area divisions.
Quantum Computing
Open Projects
- Auxiliary field quantum Monte Carlo using hybrid classical-quantum computing
- Development and testing of a quantum protocol for polynomial transformations of data sequences using quantum signal processing
No Longer Accepting Applications
Auxiliary field quantum Monte Carlo using hybrid classical-quantum computing
Science/CS Domains
Quantum computing, high performance computing, quantum Monte Carlo, pySCF
Project Description
For this summer internship, a graduate-level student will use pySCF and Cuda-Q to solve ground-state energy calculations of a hydrogen molecule using the auxiliary field quantum Monte Carlo (AFQMC) simulations. The goal of this project will be to benchmark performance metrics for a classically solved AFQMC calculation versus that augmented by quantum computing in the loop.
For the initial part of the project, the graduate student will reproduce a Python-based code to calculate the ground-state energy of a hydrogen molecule using AFQMC. This Python code is expected to use the pySCF module. Upon completion, the graduate student will incorporate a quantum computing algorithm in the calculations to perform unbiased sampling of the walkers, compute overlaps and local energies. This part of the code will initially use the Cuda-Q or Pennylane module for building a quantum circuit to prepare a walker state. This will be followed by performing a similar step using Qiskit and Cirq to benchmark the performance differences between the three quantum computing languages.
By completing this project, the graduate student will gain hands-on experience in high performance computing, quantum computing languages, and profiling toolkits. Additionally, the results of this project will provide experience in using HPC and quantum computing-in-the-loop for physics problems with wider applicability for problems beyond ground-state energy calculations.
Desired Skills/Background
Experience with pySCF, qiskit, cirq, pennylane, and Quantum Monte Carlo
Mentors
Neil Mehta (neilmehta@lbl.gov), Katherine Klymko (kklymo@lbl.gov), Ermal Rrapaj (ermalrrapaj@lbl.gov), Jack Deslippe (jrdeslippe@lbl.gov)
Development and testing of a quantum protocol for polynomial transformations of data sequences using quantum signal processing
Science/CS Domain(s)
Quantum signal processing, data encoding
Project Description
Purpose
Develop a protocol that computes polynomial transformations on data sequences and test that protocol on a shot-based simulator.
Method
Implement Python code to develop and test the protocol using a shot-based circuit simulator in Qiskit
Overview
Recent advancements in efficient data encoding into quantum states have enabled quantum-based operations on such data. Notably, the QCrank encoder [1] facilitates the input of sequences of real values into quantum processing units (QPUs). Additionally, quantum signal processing (QSP) techniques [2] provide a practical framework for performing computations on scalar polynomials.
The goal of this internship is to integrate these two approaches and develop the mathematical foundations for a protocol that computes low-degree polynomials on sequences of dozens of real numbers.
This protocol will be implemented in the Qiskit shot-based simulator and tested on NERSC systems. Existing implementations of QCrank [3], QSPpack [4], and pyQSP [5] will serve as foundational resources.
Stretch goals for this project include developing extensions to multivariate polynomials and improving the encoding schemes to take further advantage of data sparsity.
The intern will have the opportunity to contribute significantly to this cutting-edge field with the potential to co-author a publication upon successful implementation.
References
[1] QCrank protocol
[2] QSP theory
[3] QCrank-light reference code
[4] QSPpack reference code
[5] pyQSP reference code
Desired Skills/Background
- Understanding of the mathematical foundation of quantum computations
- Experience in Python, Mathematica, and Qiskit
- Familiarity with HPC environments and containers
Mentors
Jan Balewski (balewski@lbl.gov), Daan Camps (dcamps@lbl.gov)
(Applications Closed) Evaluating the performance of quantum algorithms for solving differential equations in the NISQ era and beyond
Science/CS Domains
Scientific computing: Partial differential equations, Quantum computing
Project Description
In this project, we aim to explore and evaluate the field of quantum algorithms for differential equations. We will build on recent results for near-term variational algorithms [1] and their realization on quantum hardware [2] and study more general theoretical frameworks for solving partial differential equations [3].
Our primary goal is to identify a set of relevant test problems and implement them using both near-term variational algorithms and longer-term scalable quantum algorithms. Secondary project goals could include
- A classical simulation of the algorithms that were implemented using NERSC’s Perlmutter system,
- Proof-of-concept demonstrations of the most promising problems using the IBM Quantum systems available at NERSC, and
- Resource estimation to evaluate the scalability of the identified approaches.
References
[1] Solving nonlinear differential equations with differentiable quantum circuits
[2] Variational quantum algorithms for nonlinear problems
[3] Quantum simulation of partial differential equations via Schrodingerisation: technical details
Desired Skills/Background
- Background in scientific computing and familiarity with partial differential equations
- Solid foundation in quantum computing
- Experience with Python
- Nice to have: Experience with quantum hardware runs
Mentor(s)
(This project is no longer accepting applications.) Daan Camps, Jan Balewski
Data/Machine Learning
- “Fair Universe” Uncertainty aware large-compute-scale AI platform for fundamental science
- Performance analysis of scientific AI workloads for next-gen HPC systems
- Developing Agentic AI Systems for HPC Operations, User Support, and Scientific Data Analysis
- Phenomenological model building at scale
- AI foundation models for protein-protein understanding and design
- Force-fitting molecular dynamics potential using AI-in-the-loop
“Fair Universe:” Uncertainty-aware large-compute-scale AI platform for fundamental science
Project Description
We are building a supercomputer-scale AI ecosystem for sharing datasets, training large models, and hosting machine-learning challenges and benchmarks. This ecosystem will be initially exploited for an ML challenge series based on novel datasets and progressively rolling in tasks of increasing difficulty, focussing on discovering and minimizing the effects of systematic uncertainties in physics. You will work with a multidisciplinary team, including machine learning researchers and physicists, to build AI challenges, models, and software that exploit supercomputers at NERSC. Our first challenge was an accepted competition at NeurIPS 2024, and we are now working on further competitions and platform development. Projects could range from software development on the machine learning platform to running the challenge itself to refining the datasets, tasks, and metrics.
Desired Skills/Background
- Required: Python development and some machine learning frameworks (e.g. PyTorch)
- Nice-to-have (dataset-related projects only): high-energy physics or cosmology experience
Mentors
Wahid Bhimji (wbhimji@lbl.gov), Chris Harris (CJH@lbl.gov)
»Back to Data/Machine Learning
Performance analysis of scientific AI workloads for next-gen HPC systems
Science/CS Domain(s)
machine learning, performance analysis + optimization
Project Description
Explore performance analysis and optimization of scientific AI workloads on next-generation HPC systems through this internship project. The focus is on addressing the increasing complexity and computational requirements of scientific AI, especially in the context of foundational models for science. The goal is to improve efficiency and scalability by looking into sophisticated parallelism techniques and advanced AI computing hardware. The intern will research existing tools for high-performant AI training and contribute to performance modeling tools that expose training bottlenecks for future models on future hardware.
Desired Skills/Background
- Required: Python, experience with ML and/or performance analysis, software engineering
- Nice-to-have: Data parallelism, model parallelism, AI hardware
Mentors
Steven Farrell (sfarrell@lbl.gov), Shashank Subramanian (shashanksubramanian@lbl.gov)
»Back to Data/Machine Learning
Developing Agentic AI Systems for HPC Operations, User Support, and Scientific Data Analysis
Science/CS Domain(s)
AI, HPC, scientific data analysis, software engineering
Project Description
As HPC systems grow in complexity, AI-driven automation is becoming essential for optimizing operations, assisting users, and accelerating scientific discovery. This internship project focuses on developing Agentic AI systems at NERSC—autonomous AI-driven assistants capable of performing tasks such as real-time system monitoring, user support, and scientific data analysis. The intern will explore frameworks for AI agents, integrate them with NERSC's infrastructure, and develop prototypes that can interact with users, automate troubleshooting, or analyze data workflows. This project offers the opportunity to work at the intersection of AI, HPC, and science, contributing to the next generation of intelligent HPC tools.
Desired Skills/Background
- Required: Python, experience with AI/ML or software development, interest in AI-driven automation
- Nice-to-have: Experience with LLMs, scientific data workflows, HPC systems, API development
Mentor(s)
Steven Farrell (sfarrell@lbl.gov)
» Back to Data/Machine Learning
Phenomenological model building at scale
Science/CS Domain(s)
Particle phenomenology, machine learning, high-performance computing
Project Description
Theoretical models in fundamental physics are often designed through intuitive reasoning. With high-performance computing, we can now systematically explore the mathematical space of theories for models that fit observed data. This project includes multiple components to which the summer intern might contribute:
- Design a pipeline to covert mathematical theories into UFO models that can be interpreted by physics simulators (MadGraph / micrOMEGAs)
- Automate a model evaluation workflow, including using machine learning
- Design visualization and interpretability tools, with machine learning, to understand what the search algorithm is finding in high dimensions
- Design better reward structures for the search algorithm
Desired Skills/Background
- Required: Experience with Python for visualization and data processing
- Nice-to-have: Experience with Mathematica, FeynRules, SymPy; A background in particle phenomenology; PyTorch and prior experience training neural networks
Mentors
Wahid Bhimji (wbhimji@lbl.gov), Aishik Ghosh (aishikghosh@lbl.gov)
» Back to Data/Machine Learning
AI foundation models for protein-protein understanding and design
Science/CS Domain(s)
deep learning, computational biology, high performance computing
Project Description
Synthetic biology is a rapidly growing field with many promising applications in energy, medicine, and materials. In this project, you will work on developing large-scale AI foundation models to aid in the design of novel proteins with enhanced functional properties. You will have access to high performance computing resources at NERSC to train models and will explore state-of-the-art deep-learning approaches to protein sequence and structure data, such as protein language models and graph neural networks.
Desired Skills/Background
- Required: Python, deep learning
- Nice: Experience with protein data, HPC systems, LLMs, GNNs, parallel model training
Mentors
Steven Farrell (sfarrell@lbl.gov)
» Back to Data/Machine Learning
Force-fitting molecular dynamics potential using AI-in-the-loop
Science/CS Domains
molecular dynamics, high performance computing, machine learning, programming models, GPU computing
Project Description
For this summer internship, the graduate-level student will be using TestNN SNAP code. This machine learning interatomic potential (MLIAP) for molecular dynamics showcases a seamless integration of C++ and Python through a hybrid codebase, leveraging Pybind11 to facilitate efficient data exchange between the two runtimes. Currently, the prototype code has been tested for energy fitting through two scientific use cases involving tantalum and tungsten. For the proposed summer internship project, the student will use a similar strategy to test force-fitting.
This project would involve several tasks, including understanding the SNAP MLIAP as well as neural net algorithms in pytorch. TestNN SNAP uses Pytorch for the Python data read interface through the “hippynn” MLIAP codebase, as well as the neural net part of the code. The SNAP potential part of the code is written in Kokkos. The typical TestNN SNAP iteration involves the following:1) reading the atom positions and reference energies using hippynn, 2) initiating the AI-in-the-loop part of the code, where the bispectrum coefficients calculated by the neural net are sent to the SNAP part of the code, 3) the bispectrum coefficients are passed through a neural network to produce an energy prediction, and 4) the weights of the neural network are updated based on the reference energies using backpropagation.
This project will involve using the force-computation aspect of SNAP to build a loss function that includes both energy and force computations, despite the fact that SNAP is not written in Pytorch, using an efficient algorithm.
By completing this project, the graduate student would gain hands-on experience in high performance computing, parallel programming, and profiling and optimization toolkits. Additionally, the results of this project would provide experience in using HPC and AI-in-the-loop for physics problems with wider applicability and the NERSC programming environment.
Desired Skills/Background
Experience with
- parallel programming (C/C++/Python),
- GPU computing,
- machine learning, and
- molecular dynamics.
NERSC Mentors
Neil Mehta (neilmehta@lbl.gov), Nick Lubbers (nlubbers@lanl.gov), Jack Deslippe (jrdeslippe@lbl.gov)
» Back to Data/Machine Learning
Workflow Capability
- Building streaming pipelines for HPC-in-the-loop electron microscopy analysis
- Intelligent resource management and parallelization of distributed data streaming pipelines
- Streaming experimental data between containerized operators with RDMA and shared memory
- Pathfinding for advanced HPC communication patterns
Building streaming pipelines for HPC-in-the-loop electron microscopy analysis
Science/CS Domains
machine learning, experimental materials science, streaming data analysis, integrated research infrastructure
Project Description
The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. This project will focus on the development of new operators for data pipelines for the National Center for Electron Microscopy (NCEM) at Berkeley lab. Such analysis could include tomography, ptychography, and closed-loop microscope control with machine learning/AI. The intern will be introduced to the interactEM project, work with NERSC mentors and NCEM staff to target operators to implement, and eventually test these operators with real microscope data. Your work will improve the efficiency of real electron microscopy experiments.
Desired Skills/Background
- Required: Python development and machine learning frameworks
- Nice-to-have: Container technologies, familiarity with HPC environments
Mentor(s)
Sam Welborn (swelborn@lbl.gov), Chris Harris (cjh@lbl.gov), Bjoern Enders (benders@lbl.gov)
Intelligent resource management and parallelization of distributed data streaming pipelines
Science/CS domains
Resource management, parallel computing, flow-based programming
Project Description
The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. This summer intern project will involve developing methods to scale and schedule operators intelligently according to workflow load by integrating multisource, real-time metrics from heterogeneous resources (e.g., Perlmutter compute nodes and edge servers). Further, the intern will contribute to enabling data parallelism for interactEM by scaling operators to fit the needs of the running pipeline.
Desired Skills/Background
Required: Python, parallel computing, container technologies, familiarity with HPC environments
Nice-to-have: C++
Mentor(s)
Sam Welborn (swelborn@lbl.gov), Chris Harris (cjh@lbl.gov), Bjoern Enders (benders@lbl.gov)
Streaming experimental data between containerized operators with RDMA and shared memory
Science/CS Domains
networking, high performance computing (HPC), RDMA
Project Description
The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. Currently, interactEM uses ZeroMQ to send messages between operators over TCP/IP. This summer intern project will involve employing RDMA and shared memory for inter-operator communication to improve data throughput between operators.
Desired Skills/Background
- Required: C++, Python, RDMA
- Nice-to-have: pybind11/nanobind
Mentor(s)
Sam Welborn, Chris Harris, Bjoern Enders
Pathfinding for advanced HPC communication patterns
Science/CS Domains
network interconnects; high performance computing (HPC); distributed computing
Project Description
Future workflows are increasingly exercising the limits of high-performance computing (HPC) networks – especially in the high-performance data analysis and AI/ML space. This is because many modern data science workflows make use of flexible resources and arbitrary point-to-point communication patterns. In contrast, traditional HPC communication libraries, like MPI, rely on Single Program Multiple Data (SPMD) communication patterns in order to achieve high performance.
At the moment, this forces developers to make unpleasant compromises: If flexibility is desired (for example, by dynamically scaling up and down the number of nodes as needed), then the algorithm needs to build on a slow transport protocol.
This project strives to alleviate this situation and, therefore, to improve the performance of modern workflows are NERSC by exploring the performance of modern HPC communication libraries, such as
- UDP over the high-speed network,
- UCX, and
- Modern MPI extensions
on the Perlmutter system under ad-hoc communication patterns.
We will also explore modern network discovery tooling and PMIx to determine if and how UCX and MPI can be flexibly scaled during runtime.
Desired Skills/Background
- Proficiency in Julia and C
- Experience with HPC systems and Slurm
- Experience with using Linux, and Git
Mentor(s)
Application Performance
Load Balancing Algorithms: Exploring Advancement
Load Balancing Algorithms: Exploring Advancement
Science/CS Domains
Load balancing; algorithms; high performance computing (HPC); distributed computing
Project Description
We are seeking enthusiastic summer students to investigate ways to improve the overall performance of large-scale AMReX simulations through load balance (LB) algorithm advancements. Load balancing is extremely important for large-scale, massively parallel simulations. Current LB algorithms are generally simplistic, as calculations must be done at runtime and are dependent on the reduced data users choose to collect and pass to them. However, as both hardware and simulations increase in complexity and power, having the best possible LB is an increasing priority for an HPC code’s large-scale performance.
This investigation seeks to specifically improve the LB capabilities of AMReX, the block-structured, GPU-enabled, mesh-and-particle framework used by a wide variety of large-scale simulation codes and frameworks. Specific directions of interest for this summer include
- Comparing the use of different space-filling curves in AMReX’s space-filling curve algorithm and
- Developing and comparing uniform-machine load balancing techniques to add the capability to optimize using diverse architectures.
In this project, selected summer students will create computational tools to perform statistical analyses of LB data, investigate how to present the analysis for publication, and test the algorithms on AMReX applications, time permitting. Students will also be able to explore other potential improvements identified during the investigation.
LB improvements could have far-reaching, long-term impacts, given AMReX’s international range of users and machines.
Desired Skills/Background
- Experience with C++ and Python
- Experience with algorithm development
- Experience with statistics / statistical analysis
- Experience with parallel codes and parallelization (MPI and/or OpenMP)
- Experience performing scientific research
- Experience with literature surveying and algorithm design
Mentors
Kevin Gott (kngott@lbl.gov), Rebecca Hartman-Baker
» Back to Application Performance
Infrastructure
Power/Energy Efficiency
Power Analysis of HPC Applications
Science/CS domain:
application power usage, power management, high-performance computing
Project Description:
As High-Performance Computing (HPC) systems continue to scale, power consumption has become a critical limiting factor. Understanding the power signature of current production workloads is essential to address this limit and continue to advance scientific computing at scale. This project aims to understand the power characteristics of NERSC applications at the National Energy Scientific Computing Center (NERSC) and investigate how the standard power management strategies impact these workloads. The insights gained from this research will illuminate pathways to enhance power efficiency in operational settings and achieve optimal system performance within predefined power budgets.
Required Skills:
Python (with Numpy and Pandas)
Desired Skills/Background:
experience in using HPC systems, computer sciences/physics
Mentors:
Zhengji Zhao (zzhao@lbl.gov), Brian Austin, Nicholas Wright