Minutes
Minutes of the ERSUG/EXERSUG Meeting
Pacific Northwest National Laboratory
Richland, WA
Jan 12-13, 1995
Prepared by: Brian Hingerty
Vice-Chair/Secretary
beh@ornl.gov
Agenda
Added topics from exersug (Jack Byers)
--Exersug interaction with ESSC, DCCC
--How to get Exersug more involved.
--Exersug membership-- how to encourage new blood?
--Exersug change of chair; need to elect new vice-chair
--Fallout, reaction to Exersug Letter.
--Fallout, reaction to Exersug contribution to OER Summaries
kind of a shortened version of green book, our deadline i think Dec 30
--Various prioritizing, dialog issues raised during Exersug letter writing.
-- what positive/negative effects to expect by pushing for SMPs?
-- push too hard-- endanger MPP
-- push too hard-- oversell beyond what they can really do.
-- don't push hard enough-- get inappropriate mix, weaker models
---from braams:
Please make sure that there
is a suitable occasion there for a frank discussion about priorities. I
would like us to be informed about and to contribute to changes in NERSC's
thinking about the MPP acquisition, in view of market developments over
the past nine months.
-- jardin-- nersc cost-effectiveness
-- Herrmannsfeldt: history lesson: re effectiveness of present model
strong central facility vs spreading money out to local sites.
--All-- what proper mix of high-end smps at nersc vs probably lower end
platforms at local sites?
Morning Session (Thursday, Jan 12, 1995)
GENERAL NOTES:
- Jack Byers has graduated to emeritus status at the end of this meeting
and Brian Hingerty has become the new chairperson.
- Rick Kendall was chosen as the new vice/chair-secretary as of the end
of this meeting.
- Reaction to the Fusion letter at DOE was not favorable as per J-N Leboeuf.
Welcoming Remarks - Jack Byers
------------------------------
Jack Byers' comments on need for a new green book for DOE
---------------------------------------------------------
EXERSUG members:
Note Kitchens' appeal for stronger ERSUG.
We all need to work on this. E-mail between ourselves and Kitchens
clearly isnt enough. We need plans suggestions mechanisms we can use
that we now don't have or don't use.
Also following is more push for us to get going on the green book.
I will need help. Am starting to work with McCoy and MIrin (from NERSC)
on division of work between NERSC and EXERSUG.
I am presently struggling with my version of statement from users
point of view of needs, requirements and trying to see that it
fits it in with a statement of NERSC vision by Mike McCoy.
When he and I get to some partial agreement I will send this to you
for editing modification etc. My present idea is that the users
statement ought to be independent of NERSC or Osc or anybody.
If that makes sense, the NERSC vision would naturally stand as a response
to the users statement of needs.
It might make sense to plan to have the ERSUG users statement targeted
elsewhere also, ie, not to use it only for the green book. This
might serve as an initial action in making ERSUG stronger. Ideas for targets?
I will need help from all of you at least on the science accomplishments
sections in your areas. If you cant do this yourselves, please at least
take responsibility for farming it out to people in your discipline.
Potter has agreed to do the climate section.
I have a lot of good material (3 ERPUS papers) on QCD. I will take a first cut
at pulling out a statement of needs and accomplishments from those papers.
But I will need a high energy physics person to at least edit that and perhaps
even rewrite what I can do.
There is some more material from ERPUS that you might use as starting
points, though the most complete ERPUS seemed to be the QCD papers and
the ocean modeling paper by Malone. Contact me for a list of what
I have. I haven't got anything from the ERPUS papers of Leboeuf, Hammett,
Colella, Kendall and others i think.
You also should look at the previous green book to see what is involved.
if you don't have copies, E-mail kitchens for them.
There is a possibility that NERSC will hold a workshop to bring the
green book together, early next year. This is NOT to suggest that we
are off the hook, but rather to point out that all of the rough drafting must
be complete by then, and probably we should try to have each individual
science section completed in final form, so that the meeting could then fill in
the holes, stitch together the pieces, and make coherent summary statements.
Washington View - Tom Kitchens
------------------------------
-A new science committee is being formed for oversight and guidance
-10% cuts are coming pretty much across the board
-Distributed Computing Committee (DCCC) needs more interaction from
the users (ERSUG and EXERSUG). How will what they are developing
affect what the users need etc.
Welcome from PNL - Rick Kendall
-------------------------------
-evening meeting in conference room 2 of the Tower Inn
NERSC Production Environment: Plans for 1995-96
-----------------------------------------------
General Overview - Bill McCurdy
-------------------------------
-microprocessors - revolution (SMP's are available) symmetric multiprocessors
-high end computers not being sold
-defense programs in DOE interested in computers
-DCCC evolving (AFS, X-windows etc)
-need for integrated environment
-unified production environment
-SMP -symmetric multiprocessors (shared memory - 32 processors)
Mike McCoy - Unified Production Environment
-------------------------------------------
-SMP from SGI 12 nodes available in 6 months
-RFP Jan 95 (winner June/July)
-PEP delivery Aug 95
-FCM Aug 96
-draft write-ups available by request (mgm@nersc.gov)
-DCA - development computer assimilation
Unification of the Production Environment - Moe Jette
-----------------------------------------
Systems Administration Thrusts
------------------------------
-Provide our clients with the ability to exploit the diverse computational
and storage resources of NERSC with the ease of a single computer system.
Local Services
--------------
-authentication
-storage
-computation
-batch
-networking
NERSC Services
--------------
-authentication
-storage
-computation
-batch
-networking
Hardware Components
-------------------
-Diverse Computation Resources
Vector Supercomputers
Massively Parallel Supercomputers
Workstations (SAS and Desktop)
-Diverse Storage Resources
Andrew File System (AFS)
Distributed File Service (DFS)
Common File System (CFS)
National Storage Lab (NSL) technology based
High Performance Storage System (HPSS)
-High Speed Interconnect
Local Area Network (LAN)
Wide Area Network (WAN,ESNET)
Software Components
-------------------
-Uniform operating system (present)
UNIX (preferably POSIX compliant)
-Global Authentication (1996-first half)
Single-use passwords and kerberos
-NERSC Resource Allocation and Accounting (1995- first half)
Centralized User Bank (CUB)
-Global File Systems (1995-1996)
AFS Server (present)
DFS Server (1995-first half)
AFS and DFS Integrated with Archive (1996+)
-Global Batch and Interactive Computing (1995-2nd half)
Network queuing environment (NQE)
portable batch system (PBS)
load sharing facility (LSF)
global job submission, monitoring and execution
Other System Components
-----------------------
-Security (1995-1st half)
protection of client and system information
-Integrated management (1995-1st half)
project leader
uniform environment for system administrators
uniform environment for clients
integrated "trouble ticket" system
-Licensed software (1995-2nd half)
convenient access to third-party software
Historical NERSC Development Paradigm
-------------------------------------
SAS-pre-processing
post-processing
Supercomputer
-------------
code generation
compilation
libraries
run
debug
performance analysis
post-processing
characteristics:
----------------
explicit file transfers
code development done on supercomputer
some pre- and post-processing done on SAS
slow response
limited tool set
Integrated Development Paradigm
-------------------------------
NERSC Interface
---------------
authentication
code generation
compilation
libraries
run
debug
performance analysis
pre-processing
post-processing
NERSC Services
-------------
supercomputers
other compute servers
global file system
load sharing
remote execution
common integrated toolkit
compatibility tools
global resource allocation and accounting
Characteristics:
---------------
global file system
single NERSC login
integrated, common software development tool kit
remote execution of user and system processes
Development Environment Milestones
----------------------------------
-porting codes to massively parallel systems
-access to massively parallel systems
-support of special parallel processing (SPP)
-cray C90 for capability computing
-provide common home directories (through AFS)
-enhance SAS development environment
-document "how to use the unified production environment"
-partner with our clients
-acquire a symmetric multiprocessor (SMP)
-encourage MPP vendors to support integrated tool-kits
-encourage load sharing
-POSIX compliance
Development Environment
-----------------------
-many tools on all platforms
-compilers
-linkers/loaders
-math libraries
-C++ class libraries
-debuggers
-performance analyzers
-some tools only on selected platforms
-source code generators (GUI builders)
-documentation preparation tools
-computer aided software engineering (CASE) tools
-source code control systems
------------------------------------------------------
Break
------------------------------------------------------
Mass Storage - Steve Louis
---------------------------
Project Leader, High Performance Storage
----------------------------------------
The Storage Role at NERSC
-------------------------
-Key element of the Unified Production Environment
-Long-Term and High-Performance Storage (HPSS)
-Medium-Term and Mid-Scale Storage (AFS)
-Provide solutions for use of new storage hardware
-New storage integration architectures (NSL)
-Cooperative software development (HPSS)
-Provide services that cannot be duplicated locally
-Capacities in the hundreds of terabytes
-Transfer rates in the hundreds of megabytes
-Continuous 24-hour/day 7-day/week operation
NERSC Strategic Storage Goals
-----------------------------
-High service quality: reliability,availability, security (COTS)
-Scalable I/O facilities to narrow the "storage gap" (HPSS)
-Archival storage as local/shared file system (UPE,HPSS)
-Support for heterogeneous client environments (UPE,HPSS)
-Support for large data management systems (HPSS,HPCC)
-Policies that balance resources with user demand (UPE,CUB)
-Flexible administration of quotas and charging (UPE,CUB)
-Import/export mechanisms for "user-owned" media (???)
Milestones for Storage Hardware
-------------------------------
-Acquisition of NSL-Technology Base System
-IBM 3494 robotic systems (Nov 94)
-IBM RS/6000 and 100GB disk (Jan 95)
-NSL-Unitree commercial software (Feb 95)
-System integration and startup (Mar 95)
-Interim upgrades to Base System
-Additional Ports on HIPPI switch (now)
-New SCSI-2 or HIPPI disk array (Spring 95)
-New NTP tape and storage units (Summer 95)
-Possible early conversion to HPSS (Fall 95)
-Acquisition of a Fully Configured Storage System in FY 96
-Implementation Plan to DOE (Mar 95)
-Specifications written (Jun 95)
-Vendor solicitation (Aug 95)
-Vendor selection and award (Nov 95)
-Delivery of hardware/software (Jan 96)
-System integration and startup (Mar 96)
Description of IBM RISC System 6000 Model 7015-R24
--------------------------------------------------
Typical HPSS Configuration
--------------------------
-Available from Steve Louis (louis@nersc.gov)
NERSC Storage Infrastructure Costs
----------------------------------
Budget Year Disk Cost in $/MB Tape Cost in $/MB
----------- ----------------- -----------------
FY86 $16.70 N/A
FY87 $45.28 (1) N/A
FY88 $14.10 $0.320
FY89 $12.39 $0.265
FY90 $10.94 $0.236
FY91 $ 9.44 $0.104
FY92 $ 7.44 $0.092
FY93 $ 5.44 $0.045
FY94 $ 3.34 $0.026
FY95 $ 2.50 (est.) $0.015 (est.)
FY96 $ 1.75 (est.) $0.005 (est.)
FY2001 $ 0.35 (est.) $0.0001 (est.)
(1) Includes new mainframes, software, servers, adapters, controllers
What Could I Have ONLINE (1) for $1,000,000 (2)
-----------------------------------------------
Year Disk Devices (3) Tape Devices (4,5)
---- ---------------- ------------------
1996 2 TB on 200 drives 1 TB on 50 drives
2001 15 TB on 400 drives 25 TB on 50 drives
2006 100 TB on 1000 drives 500 TB on 50 drives
1)Represents accessible data without mount operations
2)Drive costs only (excludes servers, controllers, robotics)
3)10GB/disk in '96; 37.5 GB/disk in '01; 100 GB/disk in '06
4)20 GB/tape in '96; 500 GB/tape in '01; 10,000 GB/tape in '06
5)CFS's current ONLINE to Total tape ratio is 1:1,000
-------------------------------------------------------------------
User Services and Information Systems - Jean Shuler
-------------------------------------
Building on a Foundation
------------------------
Traditional Role
----------------
-Provide technical consulting services - act as a user advocate
-Bring issues to attention
-Coordinate and collaborate with NERSC scientists and researchers
-Develop and provide technical training
-Provide Software Quality Assurance - administer logging and
tracking system
-Provide current, accurate information to documentation system
-Provide searching/browsing software for information development
and retrieval system- move all documentation to Web server
Navigation Tools - provide help
----------------
User Service - Greater emphasis on collaborations with NERSC staff to best
------------ utilize the new technologies
User Services and Information Systems
------------------------------------
New Focus Area Goals
--------------------
-Provide new Single Interface Information Delivery System based on
standards
-Develop training utilizing new media - video on demand, video
tele-conferencing, etc.
-Provide technical expertise for collaboration and coordination with
NERSC scientists and researchers - parallel programming techniques,
visualization, optimization
Use of Netscape - for users to obtain information
---------------
Information Delivery System
---------------------------
-Video on demand
-Applications training
-NERSC documentation
-MAN pages
-CrayDoc system
-Vendor Help Packages (NAG)
-NERSC online documentation
-Web information database
-Logging and tracking system
*The goal of the information delivery system is to deliver the needed
information anytime, anywhere.
Examples of Databases to be linked with browsing and search tools
-----------------------------------------------------------------
-NERSC developed documentation (Intrograph, Accounting)
-Web databases (NERSC Home Page, ERPUS talks, ESnet)
-BUFFER newsletter
-REMEDY logging and tracking system
-MAN pages
-Bulletin Boards
-CrayDocs and other vendor help packages
User Services - Building on a Foundation
----------------------------------------
In this Age of Information and Technological Revolution we will meet
the needs of the research community through:
-Delivering information in a faster, more effective manner through a
single interface
-Providing technical expertise for collaboration and coordination with
computational scientists and researchers
-Provide new training methods and media for facilitating information
exchange
-Providing continued traditional consulting and support services
------------------------------------------------------------------------
SMP's...where do they fit in, what do they do? - Brent Gorda
----------------------------------------------
-Symmetric Multiprocessors - SMPs
-gaining power
-similarities and differences to existing systems discussed
------------------------------------------------------------------------
SPP Workshop - Bruce Curtis
------------
-Held at NERSC Dec 1994
-Attendees: Brown,Greef (LBL) Hingerty (ORNL)
Dimits,Byers (LLNL) Minkoff (ANL)
Mankofsky (SAI) Reutter (LBL)
Gai,Kendall,McCarthy,Schenter (PNL)
Pavlo,Vahala,Vahala (IPP,W&M,ODU)
-Topics included:
-SPP Environment
-Vector Performance
-MPP Overview
-Message Passing
-Parallel Performance
-I/O Optimization
-System Time
-Case Studies
-Copies of slides, video tapes available now. Document will be available
soon. Send U.S. mail address to curtis@nersc.gov
-Next Workshop (Proposed)
-May 1995
-Targeted for applicants for SPP96, instead of 'winners' (but current
SPP users welcome)
-1 1/2 days presentation, 1 1/2 day (optional) hands-on
-SPP96 to coincide with Fiscal Year 1996, and will be aligned with ERDP.
-SPP 1995
-Aydemir --Nonlinear Gyrofluid Calculation of Tokamak Transport
1000 CRUs 4GB
-Bell --Numerical Simulation of the Three-Dimensional Reacting Flow
in a Pulse Combustor using an Adaptive, Cartesian, Multi-fluid
Algorithm 2000 CRUs 4GB
-Cahill --Studies in Lattice Gauge Theory 250 CRUs 1GB
-Cohen --Toroidal Gyrokinetic PIC Simulation Using Quasi-ballooning
Coordinates 3600 CRUs 22GB
-Chen --Simulation of Alpha/Energetic-particle Driven Instabilities
in Tokamak Plasmas 300 CRUs 1GB
-Dunning --Study of Solvent Cation Interactions;Chemistry on Oxide
Surfaces using Ab Initio Molecular Dynamics; Determination
of Physical and Electronic Properties of Fluorinated
Polymers 10000 CRUs
-Fu --Gyrokinetic MHD Hybrid Simulation of MHD Modes Destabilized by
Energetic Particles 1500 CRUs
-Hammett --Gyrofluid Simulations of Tokamak Plasma Turbulence
1000 CRUs .5GB
-Hingerty --Atomic Resolution Views of Carcinogen Modified Closed
Circular DNA that can Super-coil 1000 CRUs 1GB
-Huesman --Experimental Medicine: Clinical Diagnostic, and Isotopic
Imaging Studies 250 CRUs 1GB
-Kogut --Simulations of Quenched QCD at Finite Density and Temperature
16000 CRUs 40GB
-LeBoeuf --High Resolution Gyro-Landau Fluid Plasma Turbulence
Calculations at the Core of Tokamaks 4000 CRUs
-K.-H. Lee --High Resolution Imaging of Electrical Conductivity Using
Low Frequency Electromagnetic Fields 300 CRUs
-W.W. Lee --Gyrokinetic Simulation of Tokamak Plasmas - Investigation
of Micro-turbulence and Core Transport Using Three-
Dimensional Toroidal Particle Codes 4000 CRUs 18GB
-Lester --Quantum Monte Carlo for Molecules 500 CRUs 2GB
-Mankofsky --3D EM and EM-PIC Simulation with ARGUS 250 CRUs 1GB
-Soni --Hadronic Matrix Elements of Heavy-Light Mesons
14800 CRUs 392.4 GB
-Stevens --Benchmarking Comparison of Computational Chemistry Codes
with MPPs 1500 CRUs 3GB
-Vahala --Lattice Boltzman Approach to Turbulence in Divertor Plasmas
800 CRUs 8GB
SPP 1995
--------
Started December 1, 1994. During the first month of SPP95, the performance
of jobs has been substantially improved: averaging about 13 cpus vs.
about 8 cpus for the first month of the previous two years of SPP. The
workload has been moderate, however, probably due in part to the holidays.
Recent Runs
-----------
P.I. Avg. CPUs Total GF/wall
sec
--- --------- -------------
Cohen 14.4 5.3
Leboeuf 13.4 4.5
Soni 14.5 5.0
Vahala 15.3 9.0
------------------------------------------------------------------------
Adjoin for lunch
Closed ExERSUG Lunch Meeting - Hills Street Deli
Afternoon Session - Thurs Jan 12,1995
LAN - Tony Hain - Video-conference - White Room
---
-Local Area Network (LAN)
-video-conference from NERSC
-on M-Bone
------------------------------------------------------------------------
ESNET - Jim Leighton - Video-conference - White Room
-----
Reports and Issues of Current Interest
--------------------------------------
-ESNET report
-bytes double in 6-8 months!
-T3 expansion (leading edge) can have problems
-WAN (Wide Area Network)
------------------------------------------------------------------------
Preparation for the MPP - Tammy Welcome
-----------------------
NERSC provides several means whereby researchers can prepare for
MP computing.
-Collaboration with NERSC staff
-MPP Access Program
-MPP Workshop
NERSC is collaborating with researchers to parallelize C90 capability codes
---------------------------------------------------------------------------
dtem- LeBoeuf (ORNL), 2.1% of C90
-fluid simulation of plasma turbulence
-developed new convolution algorithm (used also in KITE) which minimizes
memory usage, programmed inner loops in assembly language achieving 10X
speedup for that phase of the code
-currently parallelizing for T3D using message passing
xg3e- Cohen (LLNL), 1.5% of C90
-gyrokinetic PIC plasma simulation
-ported to T3D using PVM
-currently retro-fitting new production code into existing framework
lu.x- Soni (Brookhaven), 7.9% of C90
-lattice quantum chromodynamics
-Soni will be collaborating with MILC directly
-NERSC may help tune application performance on the PEP
..to parallelize applications for the MPP access program
kite- Lynch/Leboeuf (ORNL)
-fluid simulation of plasma turbulence
-ported to T3D using PVM
-tuned convolution algorithm (see dtem)
-in future will tune matrix transpose communications
..and to parallelize and enable applications for the h4p
ParFlow/SLIM - Ashby/Tompson(LLNL)
-chemical migration (SLIM) in ground water simulation (ParFlow)
-ported SLIM to C90 with plans to parallelize for T3D
ardra - Dorr (LLNL)
-simulation of nuclear well logging devices
-ported to T3D using PVM
-tuned to minimize communication overhead and maximize single
processor performance
mdcask - Diaz De La Rubia (LLNL)
- 3-D molecular dynamics modeling ion beam implantation
- ported to T3D using PVM
- developed distributed application running over WAN that allows
interactive program control/input and permits real-time
visualization of data
- work described in invited paper at the April HPC Symposium
..more parallelization and enablement
camille - Mirin (LLNL)
- global climate model
- ported to T3D using portable message passing macro library
- currently tuning with shmem calls
icf3d - Kershaw (LLNL)
-study interaction of radiation (diffusive) with matter - to be used
mainly for Inertial Confinement Fusion
-currently being developed in C++ on t3D using shmem
-development of Parallel Data Distribution Preprocessor
- cgscf 0 Mailhiot (LLNL) - simulation of advanced materials design
-development of dynamic time-sharing scheduled environment on the Cray T3D
MPP Access Program provides computer resources for the development of
---------------------------------------------------------------------
parallel applications
---------------------
Initially, user develops parallel applications using a small test case
User debugs application
Resources permitting, user scales-up application to a larger number of
processors and larger problem sets
The goal is to have these MP applications ready for production when NERSC's
first MP computer system arrives in the latter part of 1995.
9 proposals have been award allocations on 4 parallel platforms
MPP Access Program, Round 1 (Sept 94 - Sept 95)
PI Project
CM-5 at LANL
------------
Banerjee Direct Simulation of Turbulence-Surface Interaction
Vinals Numerical Studies of Non-equilibrium Processes in
Condensed Matter Physics and Materials Science
Paragon at ORNL
---------------
Brown Combustion Research
Cotton The Parallelization of an Atmospheric Simulation Model
Herrmannsfeldt Accelerator Design and Analysis in 3D
Watson Parallel Mathematical Software
T3D at LLNL
-----------
Depristo Large-Scale Molecular Dynamics with Explicit
Density Functionals
Dory High-Resolution Plasma Fluid Turbulence Calculations
on the T3D
Stevens Benchmarking Comparison of Computational Chemistry Codes
with MPPs
KSR1 at ORNL
------------
Watson Parallel Mathematical Software
Proposals for Round 2 are due Jan 20, 1995
Only for allocations on the 256 processor LLNL T3D
Allocation begins March 1995 and ends Sept 30, 1995
Evaluation criteria and instructions for proposals in following:
-December issue of the Buffer
-/afs/nersc.gov/u/ccc/mpp/Public/mppaccess.ps (postscript)
-/afs/nersc.gov/u/ccc/mpp/Public/mppaccess.text (ascii text)
-NERSC World Wide Web page http://www.nersc/gov
-nersc.Parallel.Processing and nersc.PI.info news groups
Allocation decisions made by OSC
PIs responsible for short project status report
The MPP Workshop will prepare NERSC researchers for the arrival of the PEP
--------------------------------------------------------------------------
system in late 1995
-------------------
3-week summer workshop in JUne
20-25 participants
Access to LLNL T3D or pre-PEP system
classroom instruction + exercises
+ guest lectures + personal project --> MPP application
The workshop will consist of classes on basic concepts
in parallel processing... (week of June 11-17)
-MPP architecture overview
-Programming models overview
- MPI, PVM, and HPF
-Operating Environment
-Tool Use
-Exercises illustrating new concepts
..lectures on advanced topics... (week of June 18-24)
-Approaches to parallel programming
-Parallel languages and libraries
-Frameworks, templates
-Scientific chromodynamics
-Computational fluid dynamics
-Molecular dynamics
-Plasma physics
-Climate modeling
-High-end graphics for high-end computing
..and work on personal projects (weeks of June 18-24 and June 25-30)
-NERSC staff available to assist in projects, both during and after workshop
-Attendees will make enough headway on project to continue development
after workshop
-Attendees will maintain access to LLNL T3D or pre-PEP system until
arrival of PEP system
Do the benefits of this workshop outweigh the costs?
-time, travel
-boot-strapped onto parallel machine
-parallel development experience
-immersion-free from distractions
-building bridges with staff
-Livermore night life (!)
---------------------------------------------------------------------------
Break
---------------------------------------------------------------------------
Follow-up on Throughput on C90 - Bruce Griffing
------------------------------
NQS Throughput
--------------
At the last ERSUG meeting some sites voiced concern that their NQS jobs were
not progressing through NQS in a timely manner.
NERSC agreed to analyze PNL's throughput and report back.
Response Team Members
---------------------
Bruce Curtis
Bruce GRiffing
Moe Jette
Bruce Kelly
Alan Riddle
Clark Streeter
Some Observations
----------------
We had to identify performance metrics and gather the data that would let us
do the analysis:
Velocity (cpu time/wall time once job begins execution)
Wait time (time between submission and initial execution)
Held time (time between when job is check-pointed because
allocation was depleted and new allocation is infused. This
affects velocity.)
It is extremely difficult to reconstruct the C90 environment at any moment
in time using the information logged by UNICOS and NQS.
We had to make many iterations refining NQSTAT because of the many end-cases
in the data we encountered. The public version that you can run is improved
as a result.
Additional Observations
-----------------------
Just before the holidays we received a summary of some PNL jobs spanning
a three month period. The summary included items such as time of submission,
wait and run times, and a brief completion status.
We have just begun to analyze what can be known about the failing cases.
In failing cases where lack of disk space was involved we can't tell from
the logs which file system(s) were involved. People should be using
/usr/tmp for cases where there is a requirement for large amounts of disk
space and/or long-running jobs.
Some Conclusions
----------------
The NQS data is very noisy and post-analysis is very labor intensive.
In a class of large jobs PNL's velocities were lower than that of other
comparable jobs.
NQS is not discriminating against PNL jobs.
We aren't done with the analysis completely, but it appears at this stage
that Gaussian and Crystal suffer lower velocities.
We are analyzing a Gaussian test case. We will do the same with other test
cases as they are made available to us. The goal is to improve defaults
and make recommendations to users and/or developers.
A Recommendation
----------------
It is apparent that it is very difficult to reconstruct the facts months
later so getting information as quickly as possible is essential. Please
contact us quickly if you suspect a problem. For some problems being able
to see it in real time makes resolution much, much easier.
Then we can tell if it's some systematic problem, or a problem with a user
script or technique that could be fixed or improved.
ERDP - Energy Research Decision Packages for FY96
-------------------------------------------------
The ERDP X-Windows and text mode applications for requesting allocations
will open for business on April 3, 1995.
The process will close on August 18, 1995.
-----------------------------------------------------------------------
Update on Distributed Computing and DCCC - Roy Whitney
----------------------------------------
DCCC Vision
Mission
Task Forces and Working Groups
Bartering
History
Charter and Membership
Connections to Other Groups
Near Term Goals
DCCC Mission
Develop a distributed computing foundation with the goal of establishing
an infrastructure and environment capable of supporting the anticipated
Distributed Information and Computing Environment (DICE) needs of DOE
program collaborators on a national basis with a production quality
level of support.
Key Distribution Task Force
Bill Johnston (LBL) Chair
The KDTF will examine issues related to the deployment of secure keys to be
used by e-mail technologies such as PEM and PGP and authentication services
such as digital signature, and if appropriate, recommend strategies for such
deployment. The KDTF will also be charged to be sure that their efforts are
compatible with those of the IETF and Federal inter-agency key distribution
task forces.
Joint task force with the ESCC ( ESnet Site Coordinating Committee
kdtf@es.net
Distributed Computing Environment Working Group
Barry Howard (NERSC) Chair
The DCEWG will examine and identify the recommended appropriate elements
of a distributed computing environment, including such components as
OSF/DCE, the Common Open Software Environment (COSE), the Common Object
Request Broker Architecture (CORBA) and Load Sharing. The DCEWG will also
be responsible for recommending strategies and pilots for implementing
these components.
dcewg@es.net
AFS/DFS Task Force
Task Force reports to the DCEWG
Troy Thompson (PNL) Chair
The ADFSTF will develop plans for the implementation of DFS in a WAN
environment and for the migration of existing ESnet AFS to DFS. The
group may choose to implement a DFS pilot project.
adfstf@es.net
Distributed System Management Working Group
John Volmer (ANL) Chair
The DSMWG will develop strategies, tools and pilot projects for
effectively providing systems management to distributed heterogeneous
systems. This group will also interact with the DCEWG for the effective
systems management of DCEWG layer tools.
dsmwg@es.net
Application Working Group
Dick Kouzes (PNL) Chair
The AWG will develop strategies, tools and pilot projects for
collaboratory use in areas such as the following:
National Information Infrastructure (NII) focused projects
Information services including data storage and retrieval,
project documentation, and multi-media lab notebook
Distributed collaboratory tools including multi-media communications
and software development
Collaboration social organization issues including effective standard
operating procedures
awg@es.net
DCCC Architecture Task Force
Arthurine Breckenridge (SNL) Chair
The ATF is to recommend a high level architecture for a distributed
collaboration environment which will eventually provide production-level
support of research efforts in DOE.
The architecture should be developed in such a manner that it complements,
and possibly helps to define, the DOE NII activities.
It should also address the non-technical (social, political, and
budgetary) issues to facilitate the establishment of such an environment.
atf@es.net
Real-Time Working Group
Suggested by Tommy Thomas (DOE)
Considerable Interest
Connect to IEEE Computer Applications in Nuclear and Plasma Sciences
(CANPS) Committee
Connection to super-lab Project
super-lab --> LANL-LLNL-SNL
Defense Programs Activity
Hank Shay (LLNL) is facilitating coordination with DCCC
Major supercomputing effort
Goal is to minimize duplication of efforts and maximize utility of all
distributed computing efforts
Bartering
Examples which could be pieces for EPICS* like collaborations:
Authentication Services (ANL, LANL, NERSC, PNL, Sandia)
Electronic Places and Caves (ANL)
Mass Storage (LBL, NERSC/LLNL)
MBONE Video & High Speed Information Retrieval (LBL)
Network Monitoring (NERSC, SLAC)
On-line Data Acquisition System (CEBAF)
Systems Management (ANL, FNAL)
The DCCC will initiate a survey of the laboratories for potential pieces
to put into this collaboration.
* Experimental Physics and Industrial Control System
History
Started with need for DCCC and ended up with CCIRDA
CCIRDA --> Coordinating Committee for Informatics Research, Development,
and Application
ESSC and the Chairs of EXERSUG & SCIE agree to support a first DCCC meeting
First meeting: September 22-23, 1994 at CEBAF
Roy Whitney (CEBAF) elected DCCC Chair
December 1994: ESSC agrees to charter the DCCC
Charter and Membership
Group to propose Charter:
Steve Davis (PPPL)
Jim Leighton (NERSC)
Sandy Merola (LBL)
Roy Whitney (CEBAF)
Membership by participation
Need to involve computer scientists from academic and commercial areas
DICE Consortium
Distributed Information and Computing Environment (DICE) Consortium
Method for integrating
DOE Facilities
Universities
Commercial Interests
Tool for distributing results of DCCC, ESCC and possibly other DOE
groups work
Connections to Other Groups
ESnet Steering Committee - Parent
ERSUG & EXERSUG - Close and continuous exchange
Participation in Task Forces and Working Groups
SCIE - Close and continuous exchange
ESnet Site Coordinating Committee - High bandwidth exchange
super-lab - Joint efforts
DICE Consortium - universities and commercial interests
Near Term Goals
Next Meeting February 16-17, 1995 - General Atomics
Productive Task Forces and Working Groups
Active Bartering initiated
DICE Consortium initiated
Quality Exchanges with other Groups
DCCC Summary
DOE laboratories and collaborators must work together to prosper.
ERSUG/EXERSUG, ESSC & SCIE; DCCC, ESCC & DICE Consortium; and CCIRDA
are excellent examples of how this can be accomplished.
We have the tools and the talent.
-------------------------------------------------------------------------
Open Discussion - Exersug involvement with DCCC
---------------
It was determined that the ExERSUG chair should attend at least one
meeting per year of the DCCC. It was also determined that the ExERSUG
chair should attend the ESSC meetings as an observer.
-------------------------------------------------------------------------
End of Day - Adjoin
Evening Session Thursday Jan 12, 1995
Dinner at Mexican Restaurant followed by closed ExERSUG-NERSC-OSC
meeting at Tower Inn.
Rick Kendall elected new Vice-Chair/Secretary representing DOE Office
of Basic Energy Sciences
Adjoin for the evening about 9PM
Morning Session Friday Jan 13, 1995
Science Talk by Host Institution - Thom Dunning, Jr.
--------------------------------
Hanford Site is contaminated - remedial action needed - large program
EMSL (Environmental Molecular Sciences Laboratory) program - new building
crown ethers - extract ions
cytochrome P450 - detoxifier
1 GHz NMR facility coming - remote users (distributed computing)
-----------------------------------------------------------------------
Visit to PNL Computing Facilities (host: Rick Kendall)
-----------------------------------------------------------------------
Open Discussion - Bill McCurdy (NERSC)
---------------
Invitation to Discussion - Bill McCurdy
A)The case for SMP's
B)The role of the current capability platforms
C)Dial-up services for NERSC customers (SLIP,XRemote,PPP)
D)Other topics - suggestions welcome
-How to improve attendance from ERSUG community
-Next ERSUG meeting
-ExERSUG membership
-Search for ExERSUG Vice-Chair
SMPs - Symmetric Multiprocessors
- F-machine replacement (several SMP's)
- migration environment to MPPs
- easier migration
- MPP software environment hard to deal with
- not permanent problem
- benchmarking needed for SMP's
- SMP loaner from SGI
Next Meeting: ERSUG 6/13-14
MPP Workshop 6/19-23 and 26-30
SPP Workshop 6/7-9
visualization workshop 6/12
-----------------------------------------------------------------------
Proposal: NERSC will discontinue providing Dialup Service to customers
- Barry Howard (NERSC)
Current service provides TELNET access via 800 number to NERSC and Internet
Future proposed service would require use of commercial access service
For the User:
Only TELNET access provided
Growing number of X applications require dialup support for SLIP, PPP,
or XRemote (NCD) protocols
For NERSC:
Cost - 800 phone service cost is increasing ($10K/month currently)
Anticipate hugh increase if X access supported
Security:
- Some versions of XRemote don't have display access restrictions
(Xhost, Xauthority)
No longer a unique service
Commercial Internet Access Providers
-Full range of services offered, including TELNET, PPP, SLIP
-Wide variety of providers; some national and many local
-State of the art equipment
-Additional providers coming on line each month (InterNIC: 130 existing)
-Reasonable costs
Provider (SLIP/PPP) Area Setup 10hrs/month 40hrs/month
Covered Cost
California Online California 20. 25. 25.
Portal Info Network nationwide 20. 30. 58.
Performance System
Int'l nationwide 200. 29. 51.
What NERSC proposes to provide
-No upgrade or expansion of current service
-Provide information on how to find list of commercial Internet providers
and what criteria to apply when selecting one
References:
PC World, Jan 95
PC Magazine, 11 Oct 94
MacUser, Sept 94
MacUser, Dec 94
-Provide assistance to sites interested in providing local dialup service
-For more information: Barry Howard, howard@nersc.gov
Neal Mackanic, mackanic@nersc.gov
-----------------------------------------------------------------------
End of meeting - adjourn approximately noon.