2025 Summer Research Projects
NERSC is a global leader in high performance computing (HPC) and data science. We empower researchers with the tools needed to tackle some of the world’s most pressing scientific challenges.
Every summer, we offer graduate students paid internships that allow them to collaborate with NERSC staff on various science and technology research projects.
NERSC staff started posting projects in January 2025 for the upcoming summer. This page will be updated as more projects open and others close.
How to Apply
To apply to an open project, email the project mentor(s) directly. Projects no longer accepting applications will be marked as their positions are filled.
Quantum Computing
Open Projects
No Longer Accepting Applications
- Development and testing of a quantum protocol for polynomial transformations of data sequences using quantum signal processing (Applications closed)
- Evaluating the performance of quantum algorithms for solving differential equations in the NISQ era and beyond (Applications closed)
Auxiliary field quantum Monte Carlo using hybrid classical-quantum computing
Science/CS Domains
Quantum computing, high performance computing, quantum Monte Carlo, pySCF
Project Description
For this summer internship, a graduate-level student will use pySCF and Cuda-Q to solve ground-state energy calculations of a hydrogen molecule using the auxiliary field quantum Monte Carlo (AFQMC) simulations. The goal of this project will be to benchmark performance metrics for a classically solved AFQMC calculation versus that augmented by quantum computing in the loop.
For the initial part of the project, the graduate student will reproduce a Python-based code to calculate the ground-state energy of a hydrogen molecule using AFQMC. This Python code is expected to use the pySCF module. Upon completion, the graduate student will incorporate a quantum computing algorithm in the calculations to perform unbiased sampling of the walkers, compute overlaps and local energies. This part of the code will initially use the Cuda-Q or Pennylane module for building a quantum circuit to prepare a walker state. This will be followed by performing a similar step using Qiskit and Cirq to benchmark the performance differences between the three quantum computing languages.
By completing this project, the graduate student will gain hands-on experience in high performance computing, quantum computing languages, and profiling toolkits. Additionally, the results of this project will provide experience in using HPC and quantum computing-in-the-loop for physics problems with wider applicability for problems beyond ground-state energy calculations.
Desired Skills/Background
Experience with pySCF, qiskit, cirq, pennylane, and Quantum Monte Carlo
Mentors
Neil Mehta (neilmehta@lbl.gov), Katherine Klymko (kklymo@lbl.gov), Ermal Rrapaj (ermalrrapaj@lbl.gov), Jack Deslippe (jrdeslippe@lbl.gov)
(Applications Closed) Development and testing of a quantum protocol for polynomial transformations of data sequences using quantum signal processing
This project is no longer accepting applicants.
Science/CS Domains
Quantum signal processing, data encoding
Project Description
Purpose
This project aims to develop a protocol for computing polynomial transformations on data sequences and test it on a shot-based simulator.
Method
Implement Python code to develop and test the protocol using a shot-based circuit simulator in Qiskit
Overview
Recent advancements in efficient data encoding into quantum states have enabled quantum-based operations on such data. Notably, the QCrank encoder [1] facilitates the input of sequences of real values into quantum processing units (QPUs). Additionally, quantum signal processing (QSP) techniques [2] provide a practical framework for performing computations on scalar polynomials.
The goal of this internship is to integrate these two approaches and develop the mathematical foundations for a protocol that computes low-degree polynomials on sequences of dozens of real numbers.
This protocol will be implemented in the Qiskit shot-based simulator and tested on NERSC systems. Existing implementations of QCrank [3], QSPpack [4], and pyQSP [5] will serve as foundational resources.
Stretch goals for this project include developing extensions to multivariate polynomials and improving the encoding schemes to take further advantage of data sparsity.
The intern will have the opportunity to contribute significantly to this cutting-edge field with the potential to co-author a publication upon successful implementation.
References
[1] QCrank protocol
[2] QSP theory
[3] QCrank-light reference code
[4] QSPpack reference code
[5] pyQSP reference code
Desired Skills/Background
- Understanding of the mathematical foundation of quantum computations
- Experience in Python, Mathematica, and Qiskit
- Familiarity with HPC environments and containers
Mentors
Jan Balewski, Daan Camps
(Applications Closed) Evaluating the performance of quantum algorithms for solving differential equations in the NISQ era and beyond
Science/CS Domains
Scientific computing: Partial differential equations, Quantum computing
Project Description
In this project, we aim to explore and evaluate the field of quantum algorithms for differential equations. We will build on recent results for near-term variational algorithms [1] and their realization on quantum hardware [2] and study more general theoretical frameworks for solving partial differential equations [3].
Our primary goal is to identify a set of relevant test problems and implement them using both near-term variational algorithms and longer-term scalable quantum algorithms. Secondary project goals could include
- A classical simulation of the algorithms that were implemented using NERSC’s Perlmutter system,
- Proof-of-concept demonstrations of the most promising problems using the IBM Quantum systems available at NERSC and,
- Resource estimation to evaluate the scalability of the identified approaches.
References
[1] Solving nonlinear differential equations with differentiable quantum circuits
[2] Variational quantum algorithms for nonlinear problems
[3] Quantum simulation of partial differential equations via Schrodingerisation: technical details
Desired Skills/Background
- Background in scientific computing and familiarity with partial differential equations
- Solid foundation in quantum computing
- Experience with Python
- Nice to have: Experience with quantum hardware runs
Mentors
(This project is no longer accepting applications.) Daan Camps, Jan Balewski
Data/Machine Learning
- AI agent development for performance metrics
- Inference-as-a-service for the DUNE experiment
- “Fair Universe” Uncertainty aware large-compute-scale AI platform for fundamental science
- Performance analysis of scientific AI workloads for next-gen HPC systems
- Developing Agentic AI Systems for HPC Operations, User Support, and Scientific Data Analysis
- Phenomenological model building at scale
- AI foundation models for protein understanding and design
- Force-fitting molecular dynamics potential using AI-in-the-loop
AI agent development for performance metrics
Science/CS Domains
High performance computing, AI/machine learning
Project Description
High performance computing (HPC) systems at NERSC generate vast amounts of data related to system performance, job scheduling, incident tracking, and internal communications. Efficiently monitoring and analyzing this data is crucial for optimizing resource allocation, improving system reliability, and enhancing user experience. However, the current approach to gathering and interpreting performance metrics is fragmented, requiring manual queries across multiple platforms such as Slurm job schedulers, ServiceNow incident management, and Slack internal communications.
As an AI intern, you will work on developing an AI-powered agent to gather, process, and analyze data from various sources, including Slurm databases, ServiceNow tables, and internal Slack communications. Your work will help our team understand system performance metrics, optimize workflows, and provide real-time insights to stakeholders.
The project will begin with an initial research phase to assess available data sources and define key performance indicators. The AI agent will then be developed iteratively, with continuous testing and refinement based on real-world usage. The internship position will support this development, focusing on data integration, AI model training, and system deployment.
Your duties will include the following:
- Data Integration: Work with APIs to extract and consolidate data from Slurm, ServiceNow, and Slack into a unified system.
- AI & NLP: Develop natural language processing (NLP) capabilities for the AI agent to interpret queries about performance metrics and generate responses.
- Analytics: Conduct data analysis to calculate key performance indicators (KPIs), such as job completion times, incident resolution times, and communication metrics.
- Real-Time Monitoring: Design and implement real-time monitoring systems to track performance and notify stakeholders of any anomalies or trends.
- Collaboration: Partner with the NERSC team to ensure the AI agent meets operational needs and integrates with existing systems.
As an AI intern, you will
- Work at the forefront of high performance computing and AI in scientific research,
- Gain valuable hands-on experience in AI, data integration, and performance analytics,
- Enjoy a flexible, remote-friendly working environment,
- Collaborate with world-class researchers and experts in the field, and,
- Contribute to important projects with real-world applications.
Desired Skills/Background
- Currently pursuing a degree in computer science, data science, AI, or a related field.
- Experienced with API integration and data extraction.
Strong knowledge of data analysis, performance metrics, and predictive modeling. - Proficiency in Python and relevant data science libraries (e.g., Pandas, TensorFlow, PyTorch).
- Excellent communication skills and a proactive, independent work style.
- Familiarity with Slurm, ServiceNow, or Slack integrations would be ideal but not required.
Mentor
To apply, please email Kadidia Konate (kadidiakonate@lbl) your resume, a brief cover letter, and any relevant portfolio or project examples by April 15, 2025.
»Back to Data/Machine Learning
Inference-as-a-Service for the DUNE Experiment
Science/CS Domains
High performance computing, machine learning, inference optimization, distributed computing
Project Description
The Deep Underground Neutrino Experiment (DUNE) at Fermilab requires efficient workflows to process large-scale data and simulation.
This project aims to work with Fermilab physicists to deploy an inference-as-a-service framework on the Perlmutter supercomputer. The goal is to optimize the simulation workflow by offloading model inferencing to a dedicated inference service, which will also serve external requests via an externally accessible endpoint.
The project will focus on the following tasks:
- Deploying an inference-as-a-service framework on Perlmutter.
- Developing an external-facing endpoint to serve inference requests from external users.
- Optimizing model inference workflows to reduce simulation runtimes.
- Ensuring scalability and efficient resource utilization of the inference service.
Desired Skills/Background
- Experience with machine learning inference frameworks (TensorFlow, ONNX, Triton Inference Server, etc.).
- Familiarity with high performance computing environments.
- Experience with containerization and distributed computing.
- Knowledge of networking and API development for external service access.
Mentors
Pengfei Ding (pding@lbl.gov), Andrew Naylor (anaylor@lbl.gov)
»Back to Data/Machine Learning
“Fair Universe:” Uncertainty-aware large-compute-scale AI platform for fundamental science
Project Description
We are building a supercomputer-scale AI ecosystem for sharing datasets, training large models, and hosting machine-learning challenges and benchmarks. This ecosystem will be initially exploited for an ML challenge series based on novel datasets and progressively rolling in tasks of increasing difficulty, focussing on discovering and minimizing the effects of systematic uncertainties in physics. You will work with a multidisciplinary team, including machine learning researchers and physicists, to build AI challenges, models, and software that exploit supercomputers at NERSC. Our first challenge was an accepted competition at NeurIPS 2024, and we are now working on further competitions and platform development. Projects could range from software development on the machine learning platform to running the challenge itself to refining the datasets, tasks, and metrics.
Desired Skills/Background
- Required: Python development and some machine learning frameworks (e.g. PyTorch)
- Nice-to-have (dataset-related projects only): high-energy physics or cosmology experience
Mentors
Wahid Bhimji (wbhimji@lbl.gov), Chris Harris (CJH@lbl.gov)
»Back to Data/Machine Learning
Performance analysis of scientific AI workloads for next-gen HPC systems
Science/CS Domains
machine learning, performance analysis + optimization
Project Description
Explore performance analysis and optimization of scientific AI workloads on next-generation HPC systems through this internship project. The focus is on addressing the increasing complexity and computational requirements of scientific AI, especially in the context of foundational models for science. The goal is to improve efficiency and scalability by looking into sophisticated parallelism techniques and advanced AI computing hardware. The intern will research existing tools for high-performant AI training and contribute to performance modeling tools that expose training bottlenecks for future models on future hardware.
Desired Skills/Background
- Required: Python, experience with ML and/or performance analysis, software engineering
- Nice-to-have: Data parallelism, model parallelism, AI hardware
Mentors
Steven Farrell (sfarrell@lbl.gov), Shashank Subramanian (shashanksubramanian@lbl.gov)
»Back to Data/Machine Learning
Developing Agentic AI Systems for HPC Operations, User Support, and Scientific Data Analysis
Science/CS Domains
AI, HPC, scientific data analysis, software engineering
Project Description
As HPC systems grow in complexity, AI-driven automation is becoming essential for optimizing operations, assisting users, and accelerating scientific discovery. This internship project focuses on developing Agentic AI systems at NERSC—autonomous AI-driven assistants capable of performing tasks such as real-time system monitoring, user support, and scientific data analysis. The intern will explore frameworks for AI agents, integrate them with NERSC's infrastructure, and develop prototypes that can interact with users, automate troubleshooting, or analyze data workflows. This project offers the opportunity to work at the intersection of AI, HPC, and science, contributing to the next generation of intelligent HPC tools.
Desired Skills/Background
- Required: Python, experience with AI/ML or software development, interest in AI-driven automation
- Nice-to-have: Experience with LLMs, scientific data workflows, HPC systems, API development
Mentor
Steven Farrell (sfarrell@lbl.gov)
» Back to Data/Machine Learning
Phenomenological model building at scale
Science/CS Domain(s)
Particle phenomenology, machine learning, high-performance computing
Project Description
Theoretical models in fundamental physics are often designed through intuitive reasoning. With high-performance computing, we can now systematically explore the mathematical space of theories for models that fit observed data. This project includes multiple components to which the summer intern might contribute:
- Design a pipeline to covert mathematical theories into UFO models that can be interpreted by physics simulators (MadGraph / micrOMEGAs)
- Automate a model evaluation workflow, including using machine learning
- Design visualization and interpretability tools, with machine learning, to understand what the search algorithm is finding in high dimensions
- Design better reward structures for the search algorithm
Desired Skills/Background
- Required, either one of:
- A background in particle phenomenology / group theory
- Experience with AI agents
- Nice-to-have: Experience with Mathematica, FeynRules, SymPy; PyTorch and prior experience training neural networks
Mentors
Wahid Bhimji (wbhimji@lbl.gov), Aishik Ghosh (aishikghosh@lbl.gov)
» Back to Data/Machine Learning
AI foundation models for protein understanding and design
Science/CS Domain(s)
deep learning, computational biology, high performance computing
Project Description
Synthetic biology is a rapidly growing field with many promising applications in energy, medicine, and materials. In this project, you will work on developing large-scale AI foundation models to aid in the design of novel proteins with enhanced functional properties. You will have access to high performance computing resources at NERSC to train models and will explore state-of-the-art deep-learning approaches to protein sequence and structure data, such as protein language models and graph neural networks.
Desired Skills/Background
- Required: Python, deep learning
- Nice: Experience with protein data, HPC systems, LLMs, GNNs, parallel model training
Mentor
Steven Farrell (sfarrell@lbl.gov)
» Back to Data/Machine Learning
Force-fitting molecular dynamics potential using AI-in-the-loop
Science/CS Domains
molecular dynamics, high performance computing, machine learning, programming models, GPU computing
Project Description
For this summer internship, the graduate-level student will be using TestNN SNAP code. This machine learning interatomic potential (MLIAP) for molecular dynamics showcases a seamless integration of C++ and Python through a hybrid codebase, leveraging Pybind11 to facilitate efficient data exchange between the two runtimes. Currently, the prototype code has been tested for energy fitting through two scientific use cases involving tantalum and tungsten. For the proposed summer internship project, the student will use a similar strategy to test force-fitting.
This project would involve several tasks, including understanding the SNAP MLIAP as well as neural net algorithms in pytorch. TestNN SNAP uses Pytorch for the Python data read interface through the “hippynn” MLIAP codebase, as well as the neural net part of the code. The SNAP potential part of the code is written in Kokkos. The typical TestNN SNAP iteration involves the following: 1) reading the atom positions and reference energies using hippynn, 2) initiating the AI-in-the-loop part of the code, where the bispectrum coefficients calculated by the neural net are sent to the SNAP part of the code, 3) the bispectrum coefficients are passed through a neural network to produce an energy prediction, and 4) the weights of the neural network are updated based on the reference energies using backpropagation.
This project will involve using the force-computation aspect of SNAP to build a loss function that includes both energy and force computations, despite the fact that SNAP is not written in Pytorch, using an efficient algorithm.
By completing this project, the graduate student would gain hands-on experience in high performance computing, parallel programming, and profiling and optimization toolkits. Additionally, the results of this project would provide experience in using HPC and AI-in-the-loop for physics problems with wider applicability and the NERSC programming environment.
Desired Skills/Background
Experience with
- parallel programming (C/C++/Python),
- GPU computing,
- machine learning, and
- molecular dynamics.
Mentors
Neil Mehta (neilmehta@lbl.gov), Nick Lubbers (nlubbers@lanl.gov), Jack Deslippe (jrdeslippe@lbl.gov)
» Back to Data/Machine Learning
Workflow Capability
- CI portal deployment on Spin for HPC applications
- Helm chart development for persistent edge services for HPC workflows
- Building streaming pipelines for HPC-in-the-loop electron microscopy analysis
- Intelligent resource management and parallelization of distributed data streaming pipelines
- Streaming experimental data between containerized operators with RDMA and shared memory
- Pathfinding for advanced HPC communication patterns
CI portal deployment on Spin for HPC applications
Science/CS Domain(s)
Web development, Kubernetes, CI/CD, authentication, logging, monitoring
Project Description
This project focuses on deploying and enhancing a continuous integration (CI) portal on Spin, NERSC’s Kubernetes-based platform specifically designed to run CI jobs for high performance computing (HPC) applications. The CI portal serves as a web-based interface that accepts CI jobs from GitLab and GitHub, which are then executed on Perlmutter via the Superfacility API (SFAPI).
The portal is intended to support developers in building, testing, and deploying HPC applications by providing a seamless and user-friendly interface for submitting and monitoring CI jobs that run in an HPC environment. Enhancements will focus on improving usability, security, and monitoring, ensuring that HPC-focused CI workflows are reliable, efficient, and easy to manage.
The project will focus on the following tasks:
- Developing an intuitive web interface for submitting and managing CI jobs targeting HPC systems.
- Implementing robust authentication and access control tailored for secure HPC resource usage.
- Enhancing logging and monitoring capabilities to provide clear visibility into CI job status and outcomes.
- Ensuring seamless integration with GitHub, GitLab, and SFAPI for efficient execution of HPC CI workflows.
Desired Skills/Background
- Experience with web development (HTML, CSS, JavaScript, React, or similar frameworks).
- Familiarity with Kubernetes and containerized applications.
- Experience with authentication mechanisms (OAuth, JWT, etc.).
- Knowledge of CI/CD pipelines, job scheduling, and familiarity with HPC workflows is a plus.
Mentors
Justin Cook (jscook@lbl.gov), Pengfei Ding (pding@lbl.gov)
Helm chart development for persistent edge services for HPC workflows
Science/CS Domains
DevOps, Kubernetes, software packaging, distributed computing
Project Description
This project focuses on developing Helm charts to enable the deployment and management of persistent edge services that support high performance computing (HPC) workflows on Spin, a Kubernetes-based platform. These persistent edge services play a critical role in connecting HPC jobs to external workflow managers, databases, and user-facing portals that remain available across job executions.
Targeted services include examples like the FireWorks workflow manager (requiring persistent MongoDB databases and web portals) and HAProxy load balancers for inference services deployed on Perlmutter. The goal is to provide robust, reusable Helm charts that simplify deploying these services and ensure they can reliably support HPC workflows that demand continuous availability and dynamic scaling.
The project will focus on the following tasks:
- Developing, testing, and refining Helm charts for persistent edge services.
- Ensuring compatibility and stability on Spin’s Kubernetes deployments.
- Automating deployment, scaling, and lifecycle management for services that interface with HPC workflows.
- Enhancing Helm chart documentation to support community adoption and ease of use.
Desired Skills/Background
- Experience with Kubernetes and Helm.
- Understanding of software packaging, containerization, and deployment automation.
- Familiarity with persistent databases (e.g., MongoDB), web services, and networking/load-balancing concepts.
- Experience with cloud-native application development and distributed computing environments.
Mentors
Johannes Blaschke (jpblaschke@lbl.gov), Pengfei Ding (pding@lbl.gov), Rui Liu (rui.liu@lbl.gov)
Building streaming pipelines for HPC-in-the-loop electron microscopy analysis
Science/CS Domains
machine learning, experimental materials science, streaming data analysis, integrated research infrastructure
Project Description
The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. This project will focus on the development of new operators for data pipelines for the National Center for Electron Microscopy (NCEM) at Berkeley lab. Such analysis could include tomography, ptychography, and closed-loop microscope control with machine learning/AI. The intern will be introduced to the interactEM project, work with NERSC mentors and NCEM staff to target operators to implement, and eventually test these operators with real microscope data. Your work will improve the efficiency of real electron microscopy experiments.
Desired Skills/Background
- Required: Python development and machine learning frameworks
- Nice-to-have: Container technologies, familiarity with HPC environments
Mentor(s)
Sam Welborn (swelborn@lbl.gov), Chris Harris (cjh@lbl.gov), Bjoern Enders (benders@lbl.gov)
Intelligent resource management and parallelization of distributed data streaming pipelines
Science/CS domains
Resource management, parallel computing, flow-based programming
Project Description
The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. This summer intern project will involve developing methods to scale and schedule operators intelligently according to workflow load by integrating multisource, real-time metrics from heterogeneous resources (e.g., Perlmutter compute nodes and edge servers). Further, the intern will contribute to enabling data parallelism for interactEM by scaling operators to fit the needs of the running pipeline.
Desired Skills/Background
- Required: Python, parallel computing, container technologies, familiarity with HPC environments
- Nice-to-have: C++
Mentors
Sam Welborn, Chris Harris, Bjoern Enders
Streaming experimental data between containerized operators with RDMA and shared memory
Science/CS Domains
networking, high performance computing (HPC), RDMA
Project Description
The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. Currently, interactEM uses ZeroMQ to send messages between operators over TCP/IP. This summer intern project will involve employing RDMA and shared memory for inter-operator communication to improve data throughput between operators.
Desired Skills/Background
- Required: C++, Python, RDMA
- Nice-to-have: pybind11/nanobind
Mentors
Sam Welborn, Chris Harris, Bjoern Enders
Pathfinding for advanced HPC communication patterns
Science/CS Domains
network interconnects; high performance computing (HPC); distributed computing
Project Description
Future workflows are increasingly exercising the limits of high performance computing (HPC) networks – especially in the high-performance data analysis and AI/ML space. This is because many modern data science workflows make use of flexible resources and arbitrary point-to-point communication patterns. In contrast, traditional HPC communication libraries, like MPI, rely on Single Program Multiple Data (SPMD) communication patterns in order to achieve high performance.
At the moment, this forces developers to make unpleasant compromises: If flexibility is desired (for example, by dynamically scaling up and down the number of nodes as needed), then the algorithm needs to build on a slow transport protocol.
This project strives to alleviate this situation and, therefore, to improve the performance of modern workflows are NERSC by exploring the performance of modern HPC communication libraries, such as
- UDP over the high-speed network,
- UCX, and
- Modern MPI extensions
on the Perlmutter system under ad-hoc communication patterns.
We will also explore modern network discovery tooling and PMIx to determine if and how UCX and MPI can be flexibly scaled during runtime.
Desired Skills/Background
- Proficiency in Julia and C
- Experience with HPC systems and Slurm
- Experience with using Linux, and Git
Mentor
Application Performance
- Load balancing algorithms: Exploring advancement
- HPC application and workflow performance tracing with OpenTelemetry
HPC application and workflow performance tracing with OpenTelemetry
Science/CS Domains
Performance monitoring, observability, tracing, eBPF, distributed systems, Kubernetes, Slurm
Project Description
Understanding and optimizing the performance of HPC applications and workflows is essential for maximizing resource utilization and computational efficiency. This project aims to leverage OpenTelemetry, combined with eBPF-based system-level monitoring and user-defined tracing within job scripts and applications, to deliver detailed, real-time observability for workloads running in both Kubernetes clusters and Slurm-managed HPC environments.
The outcome will help HPC users and developers gain deep insights into workflow and application behavior, identify performance bottlenecks, and improve execution efficiency across distributed and heterogeneous systems.
The project will focus on the following tasks:
- Integrating OpenTelemetry for distributed tracing of HPC workloads.
- Implementing eBPF-based samplers to collect low-level system and kernel metrics.
- Designing user-defined tracing hooks for job scripts and HPC applications to enable customizable observability.
- Ensuring seamless compatibility with Kubernetes-based and Slurm-based HPC workflows.
Desired Skills/Background
- Experience with performance monitoring and observability tools (OpenTelemetry, Prometheus, etc.).
- Familiarity with both Kubernetes and Slurm workload management systems.
- Understanding of eBPF and low-level system tracing and monitoring techniques.
- Programming experience in Go, Python, or C++.
Mentors
Pengfei Ding (pding@lbl.gov), Dhruva Kulkarni (dkulkarni@lbl.gov), Rui Liu (rui.liu@lbl.gov)
»Back to Application Performance
Load balancing algorithms: Exploring advancement
Science/CS Domains
Load balancing; algorithms; high performance computing (HPC); distributed computing
Project Description
We are seeking enthusiastic summer students to investigate ways to improve the overall performance of large-scale AMReX simulations through load balance (LB) algorithm advancements. Load balancing is extremely important for large-scale, massively parallel simulations. Current LB algorithms are generally simplistic, as calculations must be done at runtime and are dependent on the reduced data users choose to collect and pass to them. However, as both hardware and simulations increase in complexity and power, having the best possible LB is an increasing priority for an HPC code’s large-scale performance.
This investigation seeks to specifically improve the LB capabilities of AMReX, the block-structured, GPU-enabled, mesh-and-particle framework used by a wide variety of large-scale simulation codes and frameworks.
Specific directions of interest for this summer include the following:
- Comparing the use of different space-filling curves in AMReX’s space-filling curve algorithm.
- Developing and comparing uniform-machine load balancing techniques to add the capability to optimize using diverse architectures.
In this project, the selected summer students will create computational tools to perform statistical analyses of LB data, investigate how to present the analysis for publication, and, time permitting, test the algorithms on AMReX applications. Students will also be able to explore other potential improvements identified during the investigation.
LB improvements could have far-reaching, long-term impacts, given AMReX’s international range of users and machines.
Desired Skills/Background
- Experience with C++ and Python
- Experience with algorithm development
- Experience with statistics/statistical analysis
- Experience with parallel codes and parallelization (MPI and/or OpenMP)
- Experience performing scientific research
- Experience with literature surveying and algorithm design
Mentors
Kevin Gott (kngott@lbl.gov), Rebecca Hartman-Baker
»Back to Application Performance
Infrastructure
No projects at this time.
High Performance Computing (HPC) Architecture
Power analysis of HPC applications
Science/CS Domain
Application power usage, power management, high performance computing
Project Description
As high performance computing (HPC) systems continue to scale, power consumption has become a critical limiting factor. Understanding the power signature of current production workloads is essential to overcome this limit and continue to advance scientific computing at scale.
This project aims to understand the power characteristics of NERSC applications at the National Energy Scientific Computing Center (NERSC) and investigate how the standard power management strategies impact these workloads. The insights gained from this research will illuminate pathways to enhance power efficiency in operational settings and achieve optimal system performance within predefined power budgets.
Skills/Background
- Required: Python (with Numpy and Pandas)
- Desired: Experience in using HPC systems, computer sciences/physics
Mentors
Zhengji Zhao (zzhao@lbl.gov), Brian Austin, Nicholas Wright
Application-agnostic HPC profiling
Science/CS Domains
Computer Architecture & Engineering
Project Description
Understanding the algorithms running on today’s supercomputing systems is essential for designing next-generation systems. Historically, this has been done using detailed examinations of a small number of the most popular applications and making cautious inferences about the remaining applications. However, this approach requires significant computing and human time.
Recent improvements in the ability to monitor the activity of individual components within the supercomputer are opening the door to a complementary approach to understanding the workload in a way that is both lightweight and application-agnostic. In this project, you will identify and interpret sources of monitoring data and contribute to the analysis framework used to transform raw data into information about which hardware performance characteristics are most valuable to NERSC's scientific computing workload.
Desired Skills / Background
- Experience with Python
- Familiarity with basic methods of statistical analysis.
- Acquaintance with building blocks of computer systems (e.g. CPUs, GPUs, NICs, memory).