Frank Tsung
FES Requirements Worksheet
1.1. Project Information - Large Scale Particle-in-Cell Simulations of Laser Plasma Interactions Relevant to Inertial Fusion En
Document Prepared By |
Frank Tsung |
Project Title |
Large Scale Particle-in-Cell Simulations of Laser Plasma Interactions Relevant to Inertial Fusion En |
Principal Investigator |
Frank Tsung |
Participating Organizations |
UCLA |
Funding Agencies |
DOE SC DOE NSA NSF NOAA NIH Other: |
2. Project Summary & Scientific Objectives for the Next 5 Years
Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.
The goal of this project is to use state-of-art particle-in-cell tools (such as OSIRIS and UPIC) to study parametric instabilities under conditions relevant to inertial fusion energy (IFE). These instabilities can absorb, deflect, or reflect the laser, and generate hot electrons which can degrades compression. However, it is not enough just to eliminate these interactions because in some exotic schemes, such as shock ignition, the fast electrons creates a shock which can produce an ignition and enhance gain. Therefore, it is critical to gain a thorough understanding of these instabilities instead of simply eliminate them. Because of the highly nonlinear nature of these instabilities (which includes the interaction between waves and particles and waves and other waves), particle-in-cell codes which are based on first principles are best suited to study them.
The UCLA computer simulation group has a long history of expertise in particle-in-cell simulations as well as parallel computing. In the past few years, we have applied this expertise to the study of laser plasma interactions. Some of our past accomplishment include:
(i) Used the parallel PIC code osiris to observe (for the first time) the high frequency hybrid instability (HFHI).
(ii) Identified the importance of convective modes in two plasmon decay.
(iii) Shown the importance of plasma wave convections in the recurrence of SRS.
(iv) Found that multi-dimensional plasma waves become localized due to wave particle effects even in the absence of plasma wave self-focusing.
With NIF (National Ignition Facility) coming online, this is the perfect time to apply both the expertise of the UCLA group and the HPC resource of NERSC to study the various LPI's that can be occur under IFE relevant conditions. In the next 3-5 years, we plan to tackle the following problems at NERSC:
(i) 2D simulations of SRS involving multiple speckles or multiple laser beams.
(ii) Effects of overlapping laser beams for two plasmon/HFHI instabilities near the quarter critical surface.
(iii) Two dimensional studies of SRS/2wp instability under shock ignition relevant conditions.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
OSIRIS is a fully explicit, multi-dimensional, fully relativistic, parallelized PIC code. It is written in Fortran95 and takes advantage of advanced object oriented programming
techniques. This compartmentalization allows for a highly optimized core code and simplifies modifications while maintaining full parallelization done using domain decomposition with MPI. There are 1D, 2D, and 3D versions that can be selected at compile time. In addition, one of OSIRIS’s strongest attributes is the sophisticated array of diagnostic and visualization packages with interactive GUIs that can rapidly process large datasets (c.f. visualization section). These tools can also be used to analyze data generated from our PIC codes.
Recently, we have added dynamic load balancing, perfectly matched layers absorbing boundary conditions [vay:02], and an optimized version of the higher order particle shapes [esirkepov:01]. The use of higher order shape functions combined with current smoothing and compensation can dramatically reduce numerical heating and improve energy conservation without modifying the dispersion relation of plasma waves.
OSIRIS also has packages for including physics beyond the standard PIC algorithm. These include tunnel and impact ionization as well as a binary collision operator. There are two field ionization models, the ADK model and a simple barrier suppression model. These algorithms could also be used to model electron
positron pair creation. Due to the presence of the grids (cells), particles in PIC codes have finite size and therefore collisions are modified from point particle collisions, especially when the impact parameter is comparable to the cell size, typically a Debye length. For smaller impact parameters, the effects of collisions are greatly reduced in PIC codes. In order to study the effects of collisions for absolute and not normalized plasma density and temperatures, it is also useful to explicitly add a Coulomb collision model into the PIC algorithm. We have
implemented a binary collision module for OSIRIS using both the methods of T. Takizuka and H. Abe [takizuka:77] and Nanbu [nanbu:97]. We have generalized these methods for relativistic temperatures, and extended them to handle particles of different weights (useful, for instance, in a density gradient). The algorithm has been tested by comparing the relaxation times obtained from simulations of a two species plasma out of equilibrium. The algorithm was also extensively tested to guarantee that the proper Jüttner distribution functions are reached in equilibrium for relativistic temperature.
The code is highly optimized on a single processor, scales very efficiently on massively parallel computers, and is very easily portable between different compilers and hardware architectures. To date, it has been ported on Intel, AMD, and IBM PowerPC, and BlueGene processors running a large variety of operating systems
(Mac OS X, AIX, Linux, among others). And for each of these platforms, the parallel scalability has been good regardless of the network configuration. On the Atlas
machine at LLNL, 80% efficiency was achieved for 4,096CPUs using a fixed size problem (strong scaling) with significant communication overhead (only 512x512x256 cells and only 1 billion particles were used). More recently, OSIRIS was ported to the Argonne BlueGene Intrepid cluster (8,192 quad core nodes, 32,768 processors - www.alcf.anl.gov). The code is 97% efficient on 32,768 CPU’s with weak scaling and 86% efficient with strong scaling.
Another code, UPIC, developed by Dr. Viktor Decyk of the UCLA simulation group, is being used as a testbed for the GPU platform. The UCLA Parallel PIC Framework (UPIC) is a unified environment for the rapid construction of new parallel PIC codes. It provides trusted components from UCLA’s long history of PIC development, in an easily accessible form, as well as a number of sample main codes to illustrate how to build various kinds of codes. UPIC contains support for electrostatic, Darwin, and fully electromagnetic plasma models, as well as relativistic particles.
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
OSIRIS scales > 60% on > 64k cores on the Cray XT5 Jaguar. Therefore there is no bottleneck at this point.
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
Facilities Used or Using |
NERSC OLCF ACLF NSF Centers Other: LLNL/Atlas |
Architectures Used |
Cray XT IBM Power BlueGene Linux Cluster Other: |
Total Computational Hours Used per Year |
3250000 Core-Hours |
NERSC Hours Used in 2009 |
0 Core-Hours |
Number of Cores Used in Typical Production Run |
2048 |
Wallclock Hours of Single Typical Production Run |
100 |
Total Memory Used per Run |
1200 GB |
Minimum Memory Required per Core |
0.6 GB |
Total Data Read & Written per Run |
4000 GB |
Size of Checkpoint File(s) |
1200 GB |
Amount of Data Moved In/Out of NERSC |
GB per |
On-Line File Storage Required (For I/O from a Running Job) |
TB and Files |
Off-Line Archival Storage Required |
TB and Files |
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
4. HPC Requirements in 5 Years
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
Computational Hours Required per Year |
40000000 |
Anticipated Number of Cores to be Used in a Typical Production Run |
100000 |
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above |
400 |
Anticipated Total Memory Used per Run |
600000 GB |
Anticipated Minimum Memory Required per Core |
6 GB |
Anticipated total data read & written per run |
500000 GB |
Anticipated size of checkpoint file(s) |
600000 GB |
Anticipated Amount of Data Moved In/Out of NERSC |
600000 GB per month |
Anticipated On-Line File Storage Required (For I/O from a Running Job) |
TB and Files |
Anticipated Off-Line Archival Storage Required |
TB and Files |
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
In order to perform the calculations described here, subcycling of ions may be required to save CPU time.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).
the OSIRIS code has shown excellent scaling for > 100,000 cores and we do not expect any new requirement in the near future. One complication is that for simulations which are greater than 100TB, it may be impossible to checkpoint and thus queueing policy will need to change.
4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.
Due to the large memory requirement of future simulations, a higher bandwidth fileserver for I/O and checkpointing will be needed.
4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
Viktor Decyk of our group has ported his code UPIC to the GPU. This work, which relies on streaming of data, will also improve performance on other advanced architectures.
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
With a 1 order of magnitude increase, we can study the effects of multiple (in this case, more than 2) beams on the excitation of SRS/2wp instabilities in NIF relevant regimes. However, with 2 order of magnitude increase, we can finally perform full 3D simulations of parametric instabilities using parameters relevant to NIF. Thus higher dimensional effects, such as side loss or wave front bending, can be investigated in full 3D geometry.