2025 Summer Research Projects

NERSC is a global leader in high performance computing (HPC) and data science. We empower researchers with the tools needed to tackle some of the world’s most pressing scientific challenges.

Every summer, we offer graduate students paid internships that allow them to collaborate with NERSC staff on various science and technology research projects.

NERSC staff started posting projects in January 2025 for the upcoming summer. This page will be updated as more projects open and others close.

How to apply

To apply to an open project, email the project mentor(s) directly. Projects no longer accepting applications will be marked as their positions are filled.

Quantum computing

Open projects

Auxiliary field quantum Monte Carlo using hybrid classical-quantum computing

No longer seeking applicants

Auxiliary field quantum Monte Carlo using hybrid classical-quantum computing

Science/CS domains

Quantum computing, high performance computing, quantum Monte Carlo, pySCF

Project description

For this summer internship, a graduate-level student will use pySCF and Cuda-Q to solve ground-state energy calculations of a hydrogen molecule using the auxiliary field quantum Monte Carlo (AFQMC) simulations. The goal of this project will be to benchmark performance metrics for a classically solved AFQMC calculation versus that augmented by quantum computing in the loop.

For the initial part of the project, the graduate student will reproduce a Python-based code to calculate the ground-state energy of a hydrogen molecule using AFQMC. This Python code is expected to use the pySCF module. Upon completion, the graduate student will incorporate a quantum computing algorithm in the calculations to perform unbiased sampling of the walkers, compute overlaps and local energies. This part of the code will initially use the Cuda-Q or Pennylane module for building a quantum circuit to prepare a walker state. This will be followed by performing a similar step using Qiskit and Cirq to benchmark the performance differences between the three quantum computing languages.

By completing this project, the graduate student will gain hands-on experience in high performance computing, quantum computing languages, and profiling toolkits. Additionally, the results of this project will provide experience in using HPC and quantum computing-in-the-loop for physics problems with wider applicability for problems beyond ground-state energy calculations.

Desired skills/background

Experience with pySCF, qiskit, cirq, pennylane, and Quantum Monte Carlo

Mentors

Neil Mehta (neilmehta@lbl.gov), Katherine Klymko (kklymo@lbl.gov), Ermal Rrapaj (ermalrrapaj@lbl.gov), Jack Deslippe (jrdeslippe@lbl.gov)

»Back to Quantum computing

Development and testing of a quantum protocol for polynomial transformations of data sequences using quantum signal processing (closed)

This project is no longer seeking applicants.

Science/CS domains

Quantum signal processing, data encoding

Project description

Purpose

This project aims to develop a protocol for computing polynomial transformations on data sequences and test it on a shot-based simulator.

Method

Implement Python code to develop and test the protocol using a shot-based circuit simulator in Qiskit

Overview

Recent advancements in efficient data encoding into quantum states have enabled quantum-based operations on such data. Notably, the QCrank encoder [1] facilitates the input of sequences of real values into quantum processing units (QPUs). Additionally, quantum signal processing (QSP) techniques [2] provide a practical framework for performing computations on scalar polynomials.

The goal of this internship is to integrate these two approaches and develop the mathematical foundations for a protocol that computes low-degree polynomials on sequences of dozens of real numbers.

This protocol will be implemented in the Qiskit shot-based simulator and tested on NERSC systems. Existing implementations of QCrank [3], QSPpack [4], and pyQSP [5] will serve as foundational resources.

Stretch goals for this project include developing extensions to multivariate polynomials and improving the encoding schemes to take further advantage of data sparsity.

The intern will have the opportunity to contribute significantly to this cutting-edge field with the potential to co-author a publication upon successful implementation.

References

[1] QCrank protocol
[2] QSP theory
[3] QCrank-light reference code
[4] QSPpack reference code
[5] pyQSP reference code

Desired skills/background

Understanding of the mathematical foundation of quantum computations
Experience in Python, Mathematica, and Qiskit
Familiarity with HPC environments and containers

Mentors

Jan Balewski, Daan Camps

»Back to Quantum computing

Evaluating the performance of quantum algorithms for solving differential equations in the NISQ era and beyond (closed)

This project is no longer seeking applicants.

Science/CS domains

Scientific computing: Partial differential equations, Quantum computing

Project description

In this project, we aim to explore and evaluate the field of quantum algorithms for differential equations. We will build on recent results for near-term variational algorithms [1] and their realization on quantum hardware [2] and study more general theoretical frameworks for solving partial differential equations [3].

Our primary goal is to identify a set of relevant test problems and implement them using both near-term variational algorithms and longer-term scalable quantum algorithms. Secondary project goals could include

A classical simulation of the algorithms that were implemented using NERSC’s Perlmutter system,
Proof-of-concept demonstrations of the most promising problems using the IBM Quantum systems available at NERSC and,
Resource estimation to evaluate the scalability of the identified approaches.

References

[1] Solving nonlinear diﬀerential equations with diﬀerentiable quantum circuits
[2] Variational quantum algorithms for nonlinear problems
[3] Quantum simulation of partial differential equations via Schrodingerisation: t echnical details

Desired skills/background

Background in scientific computing and familiarity with partial differential equations
Solid foundation in quantum computing
Experience with Python
Nice to have: Experience with quantum hardware runs

Mentors

(This project is no longer accepting applications.) Daan Camps, Jan Balewski

»Back to Quantum computing

Data/Machine learning

AI agent development for performance metrics

Science/CS domains

High performance computing, AI/machine learning

Project description

High performance computing (HPC) systems at NERSC generate vast amounts of data related to system performance, job scheduling, incident tracking, and internal communications. Efficiently monitoring and analyzing this data is crucial for optimizing resource allocation, improving system reliability, and enhancing user experience. However, the current approach to gathering and interpreting performance metrics is fragmented, requiring manual queries across multiple platforms such as Slurm job schedulers, ServiceNow incident management, and Slack internal communications.

As an AI intern, you will work on developing an AI-powered agent to gather, process, and analyze data from various sources, including Slurm databases, ServiceNow tables, and internal Slack communications. Your work will help our team understand system performance metrics, optimize workflows, and provide real-time insights to stakeholders.

The project will begin with an initial research phase to assess available data sources and define key performance indicators. The AI agent will then be developed iteratively, with continuous testing and refinement based on real-world usage. The internship position will support this development, focusing on data integration, AI model training, and system deployment.

Your duties will include the following:

Data integration: Work with APIs to extract and consolidate data from Slurm, ServiceNow, and Slack into a unified system.
AI & NLP: Develop natural language processing (NLP) capabilities for the AI agent to interpret queries about performance metrics and generate responses.
Analytics: Conduct data analysis to calculate key performance indicators (KPIs), such as job completion times, incident resolution times, and communication metrics.
Real-Time Monitoring: Design and implement real-time monitoring systems to track performance and notify stakeholders of any anomalies or trends.
Collaboration: Partner with the NERSC team to ensure the AI agent meets operational needs and integrates with existing systems.

As an AI intern, you will

Work at the forefront of high performance computing and AI in scientific research,
Gain valuable hands-on experience in AI, data integration, and performance analytics,
Enjoy a flexible, remote-friendly working environment,
Collaborate with world-class researchers and experts in the field, and,
Contribute to important projects with real-world applications.

Desired skills/background

Currently pursuing a degree in computer science, data science, AI, or a related field.
Experienced with API integration and data extraction.
Strong knowledge of data analysis, performance metrics, and predictive modeling.
Proficiency in Python and relevant data science libraries (e.g., Pandas, TensorFlow, PyTorch).
Excellent communication skills and a proactive, independent work style.
Familiarity with Slurm, ServiceNow, or Slack integrations would be ideal but not required.

Mentor

To apply, please email Kadidia Konate (kadidiakonate@lbl) your resume, a brief cover letter, and any relevant portfolio or project examples by April 15, 2025.

»Back to Data/Machine learning

Inference-as-a-Service for the DUNE experiment

Science/CS Domains

High performance computing, machine learning, inference optimization, distributed computing

Project Description

The Deep Underground Neutrino Experiment (DUNE) at Fermilab requires efficient workflows to process large-scale data and simulation.

This project aims to work with Fermilab physicists to deploy an inference-as-a-service framework on the Perlmutter supercomputer. The goal is to optimize the simulation workflow by offloading model inferencing to a dedicated inference service, which will also serve external requests via an externally accessible endpoint.

The project will focus on the following tasks:

Deploying an inference-as-a-service framework on Perlmutter.
Developing an external-facing endpoint to serve inference requests from external users.
Optimizing model inference workflows to reduce simulation runtimes.
Ensuring scalability and efficient resource utilization of the inference service.

Desired skills/background

Experience with machine learning inference frameworks (TensorFlow, ONNX, Triton Inference Server, etc.).
Familiarity with high performance computing environments.
Experience with containerization and distributed computing.
Knowledge of networking and API development for external service access.

Mentors

Pengfei Ding (pding@lbl.gov), Andrew Naylor (anaylor@lbl.gov)

»Back to Data/Machine learning

“Fair Universe:” Uncertainty-aware large-compute-scale AI platform for fundamental science

Project description

We are building a supercomputer-scale AI ecosystem for sharing datasets, training large models, and hosting machine-learning challenges and benchmarks. This ecosystem will be initially exploited for an ML challenge series based on novel datasets and progressively rolling in tasks of increasing difficulty, focussing on discovering and minimizing the effects of systematic uncertainties in physics. You will work with a multidisciplinary team, including machine learning researchers and physicists, to build AI challenges, models, and software that exploit supercomputers at NERSC. Our first challenge was an accepted competition at NeurIPS 2024, and we are now working on further competitions and platform development. Projects could range from software development on the machine learning platform to running the challenge itself to refining the datasets, tasks, and metrics.

Desired skills/background

Required: Python development and some machine learning frameworks (e.g. PyTorch)
Nice-to-have (dataset-related projects only): high-energy physics or cosmology experience

Mentors

Wahid Bhimji (wbhimji@lbl.gov), Chris Harris (CJH@lbl.gov)

»Back to Data/Machine learning

Performance analysis of scientific AI workloads for next-gen HPC systems

Science/CS domains

machine learning, performance analysis + optimization

Project description

Explore performance analysis and optimization of scientific AI workloads on next-generation HPC systems through this internship project. The focus is on addressing the increasing complexity and computational requirements of scientific AI, especially in the context of foundational models for science. The goal is to improve efficiency and scalability by looking into sophisticated parallelism techniques and advanced AI computing hardware. The intern will research existing tools for high-performant AI training and contribute to performance modeling tools that expose training bottlenecks for future models on future hardware.

Desired skills/background

Required: Python, experience with ML and/or performance analysis, software engineering
Nice-to-have: Data parallelism, model parallelism, AI hardware

Mentors

Steven Farrell (sfarrell@lbl.gov), Shashank Subramanian (shashanksubramanian@lbl.gov)

»Back to Data/Machine learning

Developing agentic AI systems for HPC operations, user support, and scientific data analysis

Science/CS domains

AI, HPC, scientific data analysis, software engineering

Project description

As HPC systems grow in complexity, AI-driven automation is becoming essential for optimizing operations, assisting users, and accelerating scientific discovery. This internship project focuses on developing Agentic AI systems at NERSC—autonomous AI-driven assistants capable of performing tasks such as real-time system monitoring, user support, and scientific data analysis. The intern will explore frameworks for AI agents, integrate them with NERSC's infrastructure, and develop prototypes that can interact with users, automate troubleshooting, or analyze data workflows. This project offers the opportunity to work at the intersection of AI, HPC, and science, contributing to the next generation of intelligent HPC tools.

Desired skills/background

Required: Python, experience with AI/ML or software development, interest in AI-driven automation
Nice-to-have: Experience with LLMs, scientific data workflows, HPC systems, API development

Mentor

Steven Farrell (sfarrell@lbl.gov)

» Back to Data/Machine learning

Phenomenological model building at scale

Science/CS domain(s)

Particle phenomenology, machine learning, high-performance computing

Project description

Theoretical models in fundamental physics are often designed through intuitive reasoning. With high performance computing, we can now systematically explore the mathematical space of theories for models that fit observed data. This project includes multiple components to which the summer intern might contribute:

Design a pipeline to covert mathematical theories into UFO models that can be interpreted by physics simulators (MadGraph / micrOMEGAs)
Automate a model evaluation workflow, including using machine learning
Design visualization and interpretability tools, with machine learning, to understand what the search algorithm is finding in high dimensions
Design better reward structures for the search algorithm

Desired skills/background

Required

One of the following:

Background in particle phenomenology/group theory, or
Experience with AI agents

Nice-to-have

Experience with any of the following:

Mathematica
FeynRules
SymPy
PyTorch
Neural network training

Mentors

Wahid Bhimji (wbhimji@lbl.gov), Aishik Ghosh (aishikghosh@lbl.gov)

» Back to Data/Machine learning

AI foundation models for protein understanding and design

Science/CS domains

deep learning, computational biology, high performance computing

Project description

Synthetic biology is a rapidly growing field with many promising applications in energy, medicine, and materials. In this project, you will work on developing large-scale AI foundation models to aid in the design of novel proteins with enhanced functional properties. You will have access to high performance computing resources at NERSC to train models and will explore state-of-the-art deep-learning approaches to protein sequence and structure data, such as protein language models and graph neural networks.

Desired skills/background

Required: Python, deep learning
Nice: Experience with protein data, HPC systems, LLMs, GNNs, parallel model training

Mentor

Steven Farrell (sfarrell@lbl.gov)

» Back to Data/Machine learning

Force-fitting molecular dynamics potential using AI-in-the-loop

Science/CS domains

molecular dynamics, high performance computing, machine learning, programming models, GPU computing

Project description

For this summer internship, the graduate-level student will be using TestNN SNAP code. This machine learning interatomic potential (MLIAP) for molecular dynamics showcases a seamless integration of C++ and Python through a hybrid codebase, leveraging Pybind11 to facilitate efficient data exchange between the two runtimes. Currently, the prototype code has been tested for energy fitting through two scientific use cases involving tantalum and tungsten. For the proposed summer internship project, the student will use a similar strategy to test force-fitting.

This project would involve several tasks, including understanding the SNAP MLIAP as well as neural net algorithms in pytorch. TestNN SNAP uses Pytorch for the Python data read interface through the “hippynn” MLIAP codebase, as well as the neural net part of the code. The SNAP potential part of the code is written in Kokkos. The typical TestNN SNAP iteration involves the following:

Reading the atom positions and reference energies using hippynn,
Initiating the AI-in-the-loop part of the code, where the bispectrum coefficients calculated by the neural net are sent to the SNAP part of the code,
Passing the bispectrum coefficients through a neural network to produce an energy prediction and
Updating the weights of the neural network based on the reference energies using backpropagation.

This project will involve using the force-computation aspect of SNAP to build a loss function that includes both energy and force computations, despite the fact that SNAP is not written in Pytorch, using an efficient algorithm.

By completing this project, the graduate student would gain hands-on experience in high performance computing, parallel programming, and profiling and optimization toolkits. Additionally, the results of this project would provide experience in using HPC and AI-in-the-loop for physics problems with wider applicability and the NERSC programming environment.

Desired skills/background

Experience with

Parallel programming (C/C++/Python),
GPU computing,
Machine learning, and
Molecular dynamics.

Mentors

Neil Mehta (neilmehta@lbl.gov ), Nick Lubbers (nlubbers@lanl.gov), Jack Deslippe (jrdeslippe@lbl.gov)

» Back to Data/Machine learning

Workflow capability

Open projects

No longer accepting applications

CI portal deployment on Spin for HPC applications

Science/CS domain(s)

Web development, Kubernetes, CI/CD, authentication, logging, monitoring

Project description

This project focuses on deploying and enhancing a continuous integration (CI) portal on Spin, NERSC’s Kubernetes-based platform specifically designed to run CI jobs for high performance computing (HPC) applications. The CI portal serves as a web-based interface that accepts CI jobs from GitLab and GitHub, which are then executed on Perlmutter via the Superfacility API (SFAPI).

The portal is intended to support developers in building, testing, and deploying HPC applications by providing a seamless and user-friendly interface for submitting and monitoring CI jobs that run in an HPC environment. Enhancements will focus on improving usability, security, and monitoring, ensuring that HPC-focused CI workflows are reliable, efficient, and easy to manage.

The project will focus on the following tasks:

Developing an intuitive web interface for submitting and managing CI jobs targeting HPC systems.
Implementing robust authentication and access control tailored for secure HPC resource usage.
Enhancing logging and monitoring capabilities to provide clear visibility into CI job status and outcomes.
Ensuring seamless integration with GitHub, GitLab, and SFAPI for efficient execution of HPC CI workflows.

Desired skills/background

Experience with web development (HTML, CSS, JavaScript, React, or similar frameworks).
Familiarity with Kubernetes and containerized applications.
Experience with authentication mechanisms (OAuth, JWT, etc.).
Knowledge of CI/CD pipelines, job scheduling, and familiarity with HPC workflows is a plus.

Mentors

Justin Cook (jscook@lbl.gov), Pengfei Ding (pding@lbl.gov)

»Back to Workflow capability

Helm chart development for persistent edge services for HPC workflows

Science/CS domains

DevOps, Kubernetes, software packaging, distributed computing

Project description

This project focuses on developing Helm charts to enable the deployment and management of persistent edge services that support high performance computing (HPC) workflows on Spin, a Kubernetes-based platform. These persistent edge services play a critical role in connecting HPC jobs to external workflow managers, databases, and user-facing portals that remain available across job executions.

Targeted services include examples like the FireWorks workflow manager (requiring persistent MongoDB databases and web portals) and HAProxy load balancers for inference services deployed on Perlmutter. The goal is to provide robust, reusable Helm charts that simplify deploying these services and ensure they can reliably support HPC workflows that demand continuous availability and dynamic scaling.

The project will focus on the following tasks:

Developing, testing, and refining Helm charts for persistent edge services.
Ensuring compatibility and stability on Spin’s Kubernetes deployments.
Automating deployment, scaling, and lifecycle management for services that interface with HPC workflows.
Enhancing Helm chart documentation to support community adoption and ease of use.

Desired skills/background

Experience with Kubernetes and Helm.
Understanding of software packaging, containerization, and deployment automation.
Familiarity with persistent databases (e.g., MongoDB), web services, and networking/load-balancing concepts.
Experience with cloud-native application development and distributed computing environments.

Mentors

Johannes Blaschke (jpblaschke@lbl.gov), Pengfei Ding (pding@lbl.gov), Rui Liu (rui.liu@lbl.gov)

»Back to Workflow capability

Building streaming pipelines for HPC-in-the-loop electron microscopy analysis (closed)

This project is no longer seeking applicants.

Science/CS domains

machine learning, experimental materials science, streaming data analysis, integrated research infrastructure

Project description

The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. This project will focus on the development of new operators for data pipelines for the National Center for Electron Microscopy (NCEM) at Berkeley lab. Such analysis could include tomography, ptychography, and closed-loop microscope control with machine learning/AI. The intern will be introduced to the interactEM project, work with NERSC mentors and NCEM staff to target operators to implement, and eventually test these operators with real microscope data. Your work will improve the efficiency of real electron microscopy experiments.

Desired skills/background

Required: Python development and machine learning frameworks
Nice-to-have: Container technologies, familiarity with HPC environments

Mentors

Sam Welborn, Chris Harris, Bjoern Enders

»Back to Workflow capability

Intelligent resource management and parallelization of distributed data streaming pipelines (closed)

This project is no longer accepting applications.

Science/CS domains

Resource management, parallel computing, flow-based programming

Project description

The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. This summer intern project will involve developing methods to scale and schedule operators intelligently according to workflow load by integrating multisource, real-time metrics from heterogeneous resources (e.g., Perlmutter compute nodes and edge servers). Further, the intern will contribute to enabling data parallelism for interactEM by scaling operators to fit the needs of the running pipeline.

Desired skills/background

Required: Python, parallel computing, container technologies, familiarity with HPC environments
Nice-to-have: C++

Mentors

Sam Welborn, Chris Harris, Bjoern Enders

»Back to Workflow capability

Streaming experimental data between containerized operators with RDMA and shared memory (closed)

This project is no longer accepting applications.

Science/CS domains

networking, high performance computing (HPC), RDMA

Project description

The interactEM project uses a flow-based programming model to enable building data streaming pipelines from containerized components called operators. Currently, interactEM uses ZeroMQ to send messages between operators over TCP/IP. This summer intern project will involve employing RDMA and shared memory for inter-operator communication to improve data throughput between operators.

Desired skills/background

Required: C++, Python, RDMA
Nice-to-have: pybind11/nanobind

Mentors

Sam Welborn, Chris Harris, Bjoern Enders

»Back to Workflow capability

Pathfinding for advanced HPC communication patterns

Science/CS Domains

network interconnects; high performance computing (HPC); distributed computing

Project Description

Future workflows are increasingly exercising the limits of high performance computing (HPC) networks – especially in the high-performance data analysis and AI/ML space. This is because many modern data science workflows make use of flexible resources and arbitrary point-to-point communication patterns. In contrast, traditional HPC communication libraries, like MPI, rely on Single Program Multiple Data (SPMD) communication patterns in order to achieve high performance.

At the moment, this forces developers to make unpleasant compromises: If flexibility is desired (for example, by dynamically scaling up and down the number of nodes as needed), then the algorithm needs to build on a slow transport protocol.

This project strives to alleviate this situation and, therefore, to improve the performance of modern workflows are NERSC by exploring the performance of modern HPC communication libraries, such as

UDP over the high-speed network,
UCX, and
Modern MPI extensions

on the Perlmutter system under ad-hoc communication patterns.

We will also explore modern network discovery tooling and PMIx to determine if and how UCX and MPI can be flexibly scaled during runtime.

Desired skills/background

Proficiency in Julia and C
Experience with HPC systems and Slurm
Experience with using Linux and Git

Mentor

Johannes Blaschke

» Back to Workflow capability

Application Performance

Open project

HPC application and workflow performance tracing with OpenTelemetry

No longer seeking applicants

Load balancing algorithms: Exploring advancement (Closed)

HPC application and workflow performance tracing with OpenTelemetry

Science/CS domains

Performance monitoring, observability, tracing, eBPF, distributed systems, Kubernetes, Slurm

Project description

Understanding and optimizing the performance of HPC applications and workflows is essential for maximizing resource utilization and computational efficiency. This project aims to leverage OpenTelemetry, combined with eBPF-based system-level monitoring and user-defined tracing within job scripts and applications, to deliver detailed, real-time observability for workloads running in both Kubernetes clusters and Slurm-managed HPC environments.

The outcome will help HPC users and developers gain deep insights into workflow and application behavior, identify performance bottlenecks, and improve execution efficiency across distributed and heterogeneous systems.

The project will focus on the following tasks:

Integrating OpenTelemetry for distributed tracing of HPC workloads.
Implementing eBPF-based samplers to collect low-level system and kernel metrics.
Designing user-defined tracing hooks for job scripts and HPC applications to enable customizable observability.
Ensuring seamless compatibility with Kubernetes-based and Slurm-based HPC workflows.

Desired skills/background

Experience with performance monitoring and observability tools (OpenTelemetry, Prometheus, etc.).
Familiarity with both Kubernetes and Slurm workload management systems.
Understanding of eBPF and low-level system tracing and monitoring techniques.
Programming experience in Go, Python, or C++.

Mentors

Pengfei Ding (pding@lbl.gov), Dhruva Kulkarni (dkulkarni@lbl.gov), Rui Liu (rui.liu@lbl.gov)

»Back to Application performance

Load balancing algorithms: Exploring advancement (closed)

This project is no longer seeking applicants.

Science/CS domains

Load balancing; algorithms; high performance computing (HPC); distributed computing

Project description

We are seeking enthusiastic summer students to investigate ways to improve the overall performance of large-scale AMReX simulations through load balance (LB) algorithm advancements. Load balancing is extremely important for large-scale, massively parallel simulations. Current LB algorithms are generally simplistic, as calculations must be done at runtime and are dependent on the reduced data users choose to collect and pass to them. However, as both hardware and simulations increase in complexity and power, having the best possible LB is an increasing priority for an HPC code’s large-scale performance.

This investigation seeks to specifically improve the LB capabilities of AMReX, the block-structured, GPU-enabled, mesh-and-particle framework used by a wide variety of large-scale simulation codes and frameworks.

Specific directions of interest for this summer include the following:

Comparing the use of different space-filling curves in AMReX’s space-filling curve algorithm.
Developing and comparing uniform-machine load balancing techniques to add the capability to optimize using diverse architectures.

In this project, the selected summer students will create computational tools to perform statistical analyses of LB data, investigate how to present the analysis for publication, and, time permitting, test the algorithms on AMReX applications. Students will also be able to explore other potential improvements identified during the investigation.

LB improvements could have far-reaching, long-term impacts, given AMReX’s international range of users and machines.

Desired skills/background

Experience with C++ and Python
Experience with algorithm development
Experience with statistics/statistical analysis
Experience with parallel codes and parallelization (MPI and/or OpenMP)
Experience performing scientific research
Experience with literature surveying and algorithm design

Mentors

Kevin Gott, Rebecca Hartman-Baker

»Back to Application performance

Infrastructure

No projects at this time.

High performance computing (HPC) architecture

Power analysis of HPC applications

Science/CS domains

Application power usage, power management, high performance computing

Project description

As high performance computing (HPC) systems continue to scale, power consumption has become a critical limiting factor. Understanding the power signature of current production workloads is essential to overcome this limit and continue to advance scientific computing at scale.

This project aims to understand the power characteristics applications at the National Energy Scientific Computing Center (NERSC) and investigate how the standard power management strategies impact these workloads. The insights gained from this research will illuminate pathways to enhance power efficiency in operational settings and achieve optimal system performance within predefined power budgets.

Skills/background

Required: Python (with Numpy and Pandas)
Desired: Experience in using HPC systems, computer sciences/physics

Mentors

Zhengji Zhao (zzhao@lbl.gov), Brian Austin, Nicholas Wright

Application-agnostic HPC profiling

Science/CS domains

Computer Architecture & Engineering

Project description

Understanding the algorithms running on today’s supercomputing systems is essential for designing next-generation systems. Historically, this has been done using detailed examinations of a small number of the most popular applications and making cautious inferences about the remaining applications. However, this approach requires significant computing and human time.

Recent improvements in the ability to monitor the activity of individual components within the supercomputer are opening the door to a complementary approach to understanding the workload in a way that is both lightweight and application-agnostic. In this project, you will identify and interpret sources of monitoring data and contribute to the analysis framework used to transform raw data into information about which hardware performance characteristics are most valuable to NERSC's scientific computing workload.

Desired skills/background

Experience with Python
Familiarity with basic methods of statistical analysis.
Acquaintance with building blocks of computer systems (e.g. CPUs, GPUs, NICs, memory).

Mentors

Brian Austin, Dhruval Kulkarni