Xianzhu Tang

FES Requirements Worksheet

1.1. Project Information - Plasma materials interaction

Document Prepared By	Xianzhu Tang
Project Title	Plasma materials interaction
Principal Investigator	Xianzhu Tang
Participating Organizations	Los Alamos National Laboratory
Funding Agencies	DOE SC DOE NSA NSF NOAA NIH Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

This project combines kinetic modeling of boundary plasma and atomistic modeling of wall material response to plasma irradiation to understand the physics of plasma-materials interaction in fusion reactor conditions. The plasma modeling is based on solving the six dimensional kinetic equation in the sheath and presheath region. The materials modeling is based on molecular dynamics, accelerated molecular dynamics, and kinetic Monte Carlo simulations. These involve a suite of simulation codes developed at Los Alamos National Laboratory: VPIC, SPASM, and TAD/AMDF. An example of the specific research objectives would be understanding the tritium permeation and trapping in displacement damaged tungsten.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

VPIC: Particle-in-cell code solving the six dimensional kinetic equation plus the Maxwell equations. Optimized for minimize data traffic.
SPASM: standard molecular dynamics code capable of high performance computing on hybrid architecture and modeling the dynamic behavior of materials under extreme conditions.
TAD/AMDF: LANL's accelerated molecular dynamics code/framework which incorporates the three acceleration methods: parallel replica, temperature-accelerated and hyper-dynamics, to allow atomistic modeling of diffusive transport in sold materials.
All codes use MPI/POSIX threads.
VPIC, SPASM, and AMDF have achieved petaflop performance on LANL's Roadrunner.

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

The TAD code deploys a global constrained minimization (currently conjugate-gradient), whose convergence (scalability) can be algorithmically improved.

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using	NERSC OLCF ACLF NSF Centers Other: Roadrunner
Architectures Used	Cray XT IBM Power BlueGene Linux Cluster Other: Roadrunner opetron/cell hybrid (the data below using AMDF code)
Total Computational Hours Used per Year	a few tens of millions Core-Hours
NERSC Hours Used in 2009	0 Core-Hours
Number of Cores Used in Typical Production Run	10,000
Wallclock Hours of Single Typical Production Run	24
Total Memory Used per Run	1000 GB
Minimum Memory Required per Core	0.1 GB
Total Data Read & Written per Run	10 GB
Size of Checkpoint File(s)	0.1 GB
Amount of Data Moved In/Out of NERSC	10s GB per day
On-Line File Storage Required (For I/O from a Running Job)	1 TB and Files
Off-Line Archival Storage Required	10 TB and Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year	50 million
Anticipated Number of Cores to be Used in a Typical Production Run	120000
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above	24
Anticipated Total Memory Used per Run	12000 GB
Anticipated Minimum Memory Required per Core	0.1 GB
Anticipated total data read & written per run	100 GB
Anticipated size of checkpoint file(s)	0.1 GB
Anticipated Amount of Data Moved In/Out of NERSC	GB per
Anticipated On-Line File Storage Required (For I/O from a Running Job)	TB and Files
Anticipated Off-Line Archival Storage Required	TB and Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

A new global minimization/search algorithm for TAD.
There is also a development plan which couples the plasma and materials modeling on the fly.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

Need NERSC help on performance tuning and IO optimization.

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

Except for TAD, all other codes worked well on Roadrunner. But we have not tried GPU yet, though there is a pilot project associated with SPASM.

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

With 50X increase in computational power, we will have a much improved chance to understand key physical and chemical processes of plasma/materials interaction in fusion reactor conditions. Specifically the boundary plasma will be understood on a kinetic level, while the materials response on diffusive time scale will be understood at the atomistic level. The rate and pathway information will be essential to understand and model the materials behavior at meso- and macro-scale. This will build the scientific foundation to overcome the extreme challenges of fusion materials in the plasma facing component of a fusion reactor.

On the importance of the extended HPC resources, we note that the study of long time kinetics of materials typically operates on two modes: integration of very long trajectories and thorough analysis of these trajectories to extract key processes that control the evolution of the system. The first phase, when carried out with Parallel Replica Dynamics, benefits from large-scale, high-node-count, computational capabilities. A 50x increase in computing power would potentially translate into a 50x increase in the time horizon that can be directly probed, hence allowing us to further bridge the gap between computationally amenable and technologically relevant timescales. Such an increase is important because rare events that do not manifest in shorter simulations impede our capability to extrapolate the behavior of the material to longer times. The second phase often requires an high-throughput, extensive exploration of critical kinetic steps along the paths generated in the first phase. Increasing the available computing power at this step would significantly improve the robustness of our models of long time evolution by identifying alternative competing pathways. In summary, our investigation would greatly benefit both from the availability of massively-parallel resources targeting both very large individual simulations and large-number of smaller scale simulations. Our memory and storage requirements are usually modest and we do not foresee a need for a proportional increase in their availability.