Full Comments
What does NERSC do well?
* support. high marks in many categories, including providing many channels for getting knowledge, providing fast knowledgeable and friendly responses, and providing many channels (email, help desk, phone) for getting in touch with support.* compile software that I need* balance needs of many types of users* in most cases, uptime
Individual user/project attention.
The module system is very good, offers a lot of good and up-to-date software.MPI performance is very good, and performance variability between runs is very small.
Lots of computers with lots of uptime.
NERSC provides the high-level computing resources that our science needs and provides professional service to keep it going and keep users running smoothly.
Seriously capable hardware, configured well, administered well, great support, great classes learning about new infrastructure, easy access, great communication with the users. Seriously accommodating.
Excellent user support, communication. Keep up the good work.
NERSC is extremely well organized. I am never surprised about, e.g., new software, removed deprecated software, uptime or downtime, or anything else. The NERSC staff communicate very clearly with users and have been very helpful and timely in responding when I have problems (which is rare).
Overall I think system availability and maintenance is excellent. there is a hugeamount of work being done on these machines with minimal disruption
Managing systems and making sure they are available
overall it is one of the best US centers
The consultants are amazingly responsive. I am thoroughly impressed by response times and general helpfulness.
Rapid response to issue or questions.
NERSC provides state-of-the-art supercomputing resources to do forefront science.NERSC support staff are essential to making optimal use of these resources.Training provided by NERSC over the web is instrumental in making more and more suse of these outstanding supercomputing resources.
Of all the HPC systems I have used (Stampede, Lonestar), Hopper has been the most satisfying experience in terms of user experience. I must admit that the learning curve to familiarize myself with the environment was a little steep in the beginning, but once everything was setup, it was smooth sailing all the way!
The cluster uptime, updates from admins, and availability of support are top-notch. Bugs submitted are responded to quickly, and usually resolved in short order. Dirac and its staff are currently my favorite cluster.
consultation of many issues
In 20 years of scientific computing (and I'm only 33) I've been to centers all over the world. I'm currently actively using 5 centers on three continents. NERSC is by far the best, from coaching beginners to data management to uptime to just the overall professional appeal. Keep up the good work!
Support.
I'm not aware of anything that NERSC doesn't do well. The documentation NERSC provides for using the HPC systems from compiling code to running batch jobs is excellent.The code development toolchains provide the latest versions of compilers, eliminating the need to work around bugs in the compiler that have been resolved in newer versions. This is an area where NERSC really sets itself apart from other computing facilities I use (e.g., LLNL Computing).The recent training event for Edison was well organized, and as a remote attendee I took a lot of information away from the event that will help me better utilize the system.As I said, I've not had an experience with NERSC where something wasn't done very well.
Helpful quick responses to consult questions. Good system availability and reliability. Good availability of needed libraries.
NERSC is very good at supporting a wide range of users and applications in a production setting. I find unannounced outages to be extremely rare.
You guys are great!
Excellent resource for large jobs.
Providing the best machines in the world for us to perform various calculations.
Support is excellent. Whenever I have a technical issue, it is usually resolved within 2 days.
All systems function exceptionally well for my purposes.
overall
Consulting. Resource avaliabilityData integrity.
The HPC environment is relatively stable.
The NERSC facility should be the standard that all computing facilities strive for. The communication, consultation, and reliability arethe attributes that come to my mind first.
Great systems. Well maintained.
NERSC is great in that they can be used for commodity computing for medium to large sized jobs that are hard to run at home institutions--as long as one is very patient and quick turn-around is not required. Their websites are also very good for providing the information needed to use their resources and know the current status of jobs, etc.
just about everything
addressing to problems in reasonable time and trying to find a solution.
I very much appreciate the ease of access to high performance computer systems. Only one password is needed, and can log in both from home and from work.
consulting and providing necessary computing/storage resources
Overall Im very satisfied with NERSC, the machines perform well and there is hardly any unexpected downtime. Modules are easily accessible and generally up to date.
Streamlined access to powerful computational resources that make my computational chemistry calculations run very fast.
NERSC makes it easy to get started and to run serial as well s parallel jobs.The people who answer the phones and give advice are superb.
In my view NERSC is the best supercomputing center in the US.
- Computing systems (Hopper/Edison) are overall easy to use and stable.- There's an extensive documentation how to compile and run job (much better than in other supercomputing centers).- There's a good variety of installed programs and compilers.- The queuing system is good (in particular, it's good to have a debug and interactive queue).- Debugging programs is way better than in any other supercomputing center- The local scratch file systems are good and their performance is more or less stable (also, much more stable than in other centers)- The security policy is good (no hassles with cryptocard, OTP, etc.)
Provide HPC systems, software and consulting as well education opportunities for computing.
Computations, storage, computing environment, access.
1. hardware2. service
Doug and Kjiersten are very good consultants and very helpful. I appreciate the many presentations we have and I feel that I have learned a great deal.
provide access to a large range of systemseasy to run interactive / trivially parallel / massively parallel jobsarchival HPSSdatabases (mongodb)support from consult is great
1. Interactive management of allocation time and resource availability.2. Online information and tutorials are well documented. Very easy to get started.
Like I've written in prior years, I think NERSC continues to be the model organization for how to manage large supercomputer resources. I can't think of another organization that manages supercomputers that I use that does any aspect of it better. The website is extremely well put together, the actual performance on the machines is fantastic, and the level of user support is just downright ridiculously awesome. To be honest, NERSC's quality would have to drop by, say, 50% in each of these categories to even have a rival. The above is partly due to the fact that other existing organizations fall below the standard that I think should be expected of them, but also due to the fact that NERSC exceeds what I think should be reasonably expected of a well-functioning organization.
Providing massively parallel computational resources.
NERSC is my favorite supercomputing center.
Great supercomputing resource that "just works" most of the time.When support is required (not often), it is excellent.
Provide excellent HPC systems and services to scientific users.
Provide reliable, considerate, and fast service.
Attention to our needs by the consultants is excellent. They do an excellent job of helping solve our technical issues as well as providing alternate solutions when our needs run into conflict with NERSC security and other policies.
Answering questions, taking occasional special requests, machine uptime
NERSC is the most professional and well-run computing organization I have ever dealt with. The level of user support is exceedingly high. NERSC allows a wide range of people to collaborate, with shared space for files and a good mix of hardware for running jobs.
Almost very good executed jobs. Thank you.
Mainly the genepool section of NERSC. Have used other sections infrequently.
Responds rapidly and usefully to tickets and some special requests.
Very happy with hardware configurations, plenty of compute resources (on genepool)Filesystem performance is excellent especially considering the size and complexity of the configuration.Great job with the phase-out of /house. The workshops/tutorials were very helpful. NERSC (esp. Kjiersten) did a great job of making this all go smoothly.Great job by Doug and Kjiersten in helping JGI start to develop a positive informatics culture.
Figure out what I really need from my description of what I think I want
Kjiersten and Doug are very responsive to our questions/comments.
Nersc consultants are the BEST! Timeliness and quality of help is just great.The machine uptime and reliability is also very good.Documentation is good.Notices of machine/software upgrades, planned machine downtimes, web seminars, workshops, are all very good and timely.
People I interact with all seem to be of the highest quality.
A great place to develop HPC code. It is feasible to run simulations that require only moderate numbers of CPUs per job (which may still require large amounts of compute time, because of the length and number of the jobs).
Consultants work well
Security policy that still allows ssh access is great. Keep it up.Support for science gateways are good.Traditional HPC with massively parallel jobs is a strength.HPSS is pretty good, though it still needs a programatic interface (e.g. python library) rather than just script interfaces.Globus!
Create long surveys. Doug J. and Kristen F. are good with helping with our problems quickly.
I am very satisfied with Nersc user support, as well as the online tutorials and help pages.I especially like using NERSC's NX service. NX is fast and efficient for working from home. The new NX server version allows copying and pasting of text, which is appreciated (the older version did not allow copy and paste).
For the computations we are performing, NERSC does very well. It would be good if we could get more jobs through the queue faster, as opposed to having to wait for others, but I am sure that this is common to everyone.My collaboration is exceptionally happy with NERSC and we view it as a critical resource to our nuclear physics program.
Wonderful flexibility for adapting to the needs of projects on the fly (e.g. soft quotas, modifications of disk and archival allocations). Also implementation and full support of major data portal.
I am continually impressed with the help desk and their ability and willingness to help NERSC users. NERSC is my favorite of all supercomputers I am and have been associated with. Keep up the good work!
provide easy access to (massively) parallel computing for a large and broad user-base;in particular easy to add users (students, new collaborators) and get them started.
Plans for future usage, growth and efficiency.
Everything ! Whatever I needed over the last 20+ years NERSC has provided in a timely, responsible and professionally manner.
Most things that I need. The only (serious) problem to me is the slow response of basic file commands, especially on Hopper, and especially everything involving X11. NX would probably be a good solution to this, but in spite of help from both NERSC support and local ORNL support, this does not work through the ORNL firewall for me.
I run most of my VASP and Turbomole jobs on NERSC (mostly Hopper, slightly Carver). Hopper is vital for my project. I appreciate the professional performance of NERSC as a whole and Hopper partically.
NERSC is very efficient in keeping their computers up for the most part and managing the batch queue. It also provides a scratch and project directory to allow fast I/O and data sharing with other team members, respectively. Back up and retrieval of data on the HPSS is fast.
shorter queue time for longer jobs.
massively parallel computing. My jobs often need 70k cores and more. edison and hopperare great for them.
The state of are hardwares and softares.
Excellent computational resource and great supporting service.
Iwona Sakrejda is very responsive (prompt) to requests, helpful, knowledgeable and dedicated.
I think the website is very well organized with lots of useful information. The system I use (mostly Hopper and sometimes Edison) has been very useful and reliable. Also technical support is very good when needed.
Keeps their systems up and running, manages the queue.
So far I'm most impressed by the website (excellent beginner documentation) the friendly and helpful atitude of the consultants, and the frequent appearance of improvements.
Very proactive in addressing potential issues and announcing system maintenance.
all necessary things for me
NERSC is a very important resources to my research. I really appreciate that the allocation team at NERSC respond very prompt to our requests.
Technical support is extremely high-quality and prompt.
The high-speed and reliable CPU resources at NERSC greatly facilitates our computational research. The spontaneous or even proactive response from your allocation team to our CPU needs is highly appreciated.
Mostly is good. The only thing is the machines. Carver and Hopper become slow now and the queue waiting time is long for a big calculation. So most of the time, we only can use edison, but edison is always down.
The systems have good reliably. There is good and frequent communication about downtimes, changes, and new features. The helpdesk has always been very helpful.
Provide HPC resources, and inform users when resources will be temporarily unavailable.
Provide HPC resources, and inform users when resources will be temporarily unavailable.
Storage, data transfer speeds, ease of use
NERSC provides multiple systems for running jobs (Carver, Edison, Hopper) that I take advantage of. Realizing that Edison is often down for maintenance, NERSC has not imposed a charge factor on jobs for this machine, which is a great plus. The queue system is set up well, so that not as important jobs can be placed in the low priority queues while other, more important jobs take the forefront.
Maintain existing computer clusters (e.g. Hopper) with long uptimes
Your allocation procedures, especially the quarterly adjustments. I find your queue structures to be well-balanced. I have also found that your support services to be responsive and helpful.
Updating, reliability, availability.
Consulting
Excellent technical support. Generous distribution of allocation hours.
Provides reliable access to HPC computing. Through the project file system, it provides an adequate live file system which is accessible to the compute engines and provides sufficient online storage. Also it allows users to ask for additional resources during the allocation year without having to provide an additional ERCAP request.
I would especially like to thank the allocation team, which has been very pro-active at meeting all of our CPU needs.
Managing computation resources for a large number of users.
NERSC provides high-performance computing in an accessible way and are willing to help new users.
Keep users informed, help users if they have problems
NERSC does very well in providing massive parallel computing environment and data storage.
Provide reliable computing and data resources.
Manage and run these crazy systems.
NERSC has a pretty good Linux environment, and I can both work easily and submit jobs on it easily.
NERSC is simply swift in every capacity that I have encountered. My only experience is with Carver and it has been all positive. Jobs enter the queue quickly, maintenance is performed quickly and I can quickly move data to, from and around the environment.
NERSC does very well at:(a) reliably providing HPC resources with high availability and, importantly, communicating status updates to users at times when resources are not available; and(b) providing extensive documentation re: best practices (compilation, I/O tuning, performance monitoring, etc...) for production systems.
The NERSC allocation team was very efficient in responding to our CPU needs.
Offers a reliable HPC environment with which to conduct research.
Easy access, sensible structure for the systems, good tutorials.
The I/O speed improvements on global scratch. Hopper is an excellent Massively Parallel computer; its resources are most adequate for my research. NERSC also does very well with keeping the users up to date with messages upon login, emails, etc. The emails to inform users of shutdowns/delays/upgrades/etc are sent out well in advance.
I have been using NERSC (MFECC) for 33 years. It is far and away the best run computer center on the planet.
Uptime is excellent. On the occasions when I've interacted with someone from the support staff they have been very helpful.
A great website, great uptime--great notice of upcoming downtime
availabilityperformancesqueue time for debugging
NERSC is very good at supporting scientific research, not just trying to run simple computer codes fast.
Provide big computers for my calculations
NERSC has great support when the machines are going wrong, or for some reason a code is not running. Really top notch in this department. The debug and interactive queues are great features not seen at othe places. Very important for code development. The website is very helpful in understanding different coding models, how to get the most out of your code on NERSC machines, etc. NERSC technical advice is always very good too. Whenever I'm having a problem with compiling something or running a job, I have always received good advice from NERSC.
Keeping users informed of changes, tutorials, staying up with latest computing technologies.
Large scale parallel computing
Pretty much everything. It succeeds in its aim of being the most user-friendlycomputing facility.
They ask me if they can help; everywhere else I am told what not to do.
Variety of resources and flexible access to them. It allows a reasonable number of simultaneous queued jobs. Allocation award process is effective.
technical support updates regarding the status of the systems and updates in software
The machines are efficient in running parallel computing jobs.The computational resources are mostly up and running and the communication is excellent.The ability to request and obtain supplemental allocation when the allocation is nearing the expirationis a major positive point that smoothens out potential hurdles and is greatly appreciated by researchers who are concerned by unnecessary interruptions.
I LOVE the broad choice of compilers, rapid response to issues, and excellent documentation. NERSC also does a great job with versioning of software executables and libraries.
High availability, very important for me. I can connect every day, anytime.Machines at NERSC are also enough stable to trust results.Performances (computing but also services) are very good.
Provides excellent and reliable HPC platforms and relevant software. NERSC's understanding and response to users' requests and needs is equally excellent.
Outstanding management (in its most comprehensive sense) with a central focus on science.
module system libraires etc are always working and updates
Support DOE specific science
I have seen a marked improvement in the communication of changes by NERSC over the last few years, mostly due to the efforts of our 2 technical support people, Doug and Kirsten. They are very proactive in communicating changes and providing training in a timely matter to adapt to the changes.
Provides a diverse set of HPC resources.
Accessible and usable HPC system access
I like how the computing environment is set up for scientists. There are many of the scientific packages pre-installed, allowing me to quickly move to science.
computer systems work well, help desk and consultants are amazing (fast and always useful). Collaboration with NERSC staff to make our science gateway work is going great.
high speed of computing spontaneous response to our CPU allocation request
Provides easy to use computational resources
good customer service, ample warning, stable systems, knowledgable staff. Overall, they are an excellent, compentent resource!
Almost everything, especially help and support.
NERSC is the best supercomputing center in US taht I know in terms of capability, resources, service and response.
Without NERSC computational resources, it is impossible that I can have three important paper published. In 2013, I have a paper on BiFeO3 domain wall published in Phys. Rev. Lett. which is well known having the toughest standard for accepting paper. From my own opinion, Caver is the best one in my experience
NERSC serves many users very efficiently, and is very user friendly to all of its users. The transitions to new computers are quite easy, usually without requiring new programming techniques, so users can concerntrate on science instead of programming.
Provide world-class systems and support, with high reliability and a clear focus on delivery of HPC for science.
NERSC provides a broad spectrum of software applications to its users.NERSC specifically provides important parallel software (compilers, parallel libraries, etc) for its users enabling them to compile very efficient code for computation.
I really appreciate the consultants. Being able to ask and get quick answers to questions is worth a lot. With the exception of edision the NERSC machines seem to be available as much as anyone could possibly expect.
I have been fortunate to have used NERSC facilities since its inception and its earlier facility directed by my friend and colleague Prof. Bill Lester, whom I have known for about 40 years. NERSC is undoubtedly one of the best ( if not the best) supercomputing facilities in the world which supports the research efforts of thousands of scientists from all over the world working in all areas of Biological, Chemical, Mathematical, Physical sciences,etc. I personally find the NERSC facilities to be sine quo non for my research in Computational Relativistic Quantum chemistry and Physics and some of the novel and unique results we have obtained have been written up in Chemical Engineering News,etc. Especially I sincerely express my thanks to Prof. Ted Barnes , Manager, NP division ,DOE and Ms. Francesca Verdier , NERSC Allocation Manager for their generous support and encouragement of our research , especially during my serious medical complications arising from cataract surgeries. Without the help and support of Prof. Barnes and Ms. Verdier I would not have been able to accomplish much in my research. The excellent advice and help by the various Consultants, especially our long-tem guide Mr. David Turner deserves my grateful thanks. I am deeply indebted to the various dedicated personnel in Account and Allocation and Password departments,especially our "old" friend Mark Hare . I am sure there are in addition numerous dedicated personnel who make sure each and every day that NERSC facility remains on top of the world supercomputing facilities, and to all these we express our heartiest congratulations and heartfelt gratitude for the work best done. Need any more be said?. Wishing you all the best in FY 2014.
In general NERSC does all jobs very good.
Very simply, NERSC provides the resources and support we need to do our science. Of the computer centers we deal with, it is the best run, least burdensome, and offers the fewest number of surprises.
In general, my jobs get scheduled and start running quickly. Notification system for announcing system changes, maintenance dates etc. has improved greatly since last year.
1) I find it impressive how reliable hopper and edison are, given their complexity and the wide variety of users they have. 2) Whenever I have a problem, the consultants are quick and knowledgeable.
High concurrency runs
NERSC consistently demonstrated itself as a leader to support and help users in various ways.I did not say this lightly, and I have evidence. This Tuesday, I received a letter from the Editor of Physical Review A. They plan to post one of our figures in our recent papers on their webpage. We never had this chance before. NERSC makes a big difference in our research.
Great speed and great service! I really appreciate a quick and helpful support staff.
Provide ample resources to suit a variety of computational needs.
great platform for my code (includes machines, software, compilers etc)
NERSC do very good job in providing very powerful computational platforms and an efficient Queue system. I appreciate it.NX on NERSC is the best remote desktop system I ever used! It is extremely convenient and fast.
Keeps the machines up and running, and documents how they're supposed to work.
NERSC provides access to very powerful computing systems that span a wide range of architectures. The latter bit is actually most important- some of our codes run best as massively parallel MPI, which others want powerful cores or GPUs. On NERSC we can run all types and pick systems which best fit the type of computation we are looking to perform. I also really enjoy access to live queue and management through the web interface, both desktop and mobile.
As a user, I am very satisfied with the whole NERSC facility.
Online customer support, scratch data storage
Priority to medium, big and xbig jobs.
NERSC give us a good computing resource to get the physics results we want.
Customer service was excellent in all respects: timeliness and accuracy of answers, availability, and a helpful attitude. I enjoyed interacting with the customer assistance center.
Consultants and their help are AWESOME. User friendly system is top notch.
NERSC is doing well for me
Thanks for doing well on user experience and support!
The service at NERSC is excellent. Every help desk ticket I have opened has been solved within a couple of days. The amount of Up time for all the systems seems very good compared to other clusters I use.
1. Lots of softwares and programming environments.2. Good consultant services, webpages, and plenty of training.3. Powerful computers.
Good computing resources, excellent technical support.
NERSC has been the main avenue for my computational work. I am truly impressed by the systems as well as the services.
NERSC does extremely well in designing its batch queue system and providing necessary technical service on various issues.
NERSC does an excellent job of managing Hopper (and I presume its other machines)! While there are occasional glitches, NERSC has always managed them in a timely and very helpful manner.
Professional Systems&Support team
NERSC is an invaluable resource for serving the general DOE scientific community. It is a resource for those that do not have access to any other MPP computing.
1) NERSC has a very good support team. 2) The software is regularly updated.
NERSC provides easy to use high performance computing resources. The queue structure and charging policies are equitable and allow for jobs to run through the queue relatively quickly.
NERSC staff is very responsive to accommodate user's needs. They have contributed a lot to the user-friendliness of NERSC machines.
Documentation is excellent. Response to our questions and concerns are most excellent and very friendly.
Web pages, on account and run information, on system status, on usage tips, etc. are excellent compared to the other centers I work with.User support and consulants are very responsive and conscientious (but I find this to be true at all of the centers).
Everything I needed, NERSC did was done exceptionally well. Thank you very much!
Timely announcements.
NERSC provides useful computation resource that are important to my research. More than 50% of my research were done on NERSC. Also, I got many nice helps from NERSC. Whenever I had questions or problems, I can get quick response from NERSC. That lets me feel that, I am important in NERSC (although i know that actually i am not that important.) Thank you very much for your great supports to me and my work.
Overall yes.
keep running the servers. Get back to me quickly when I have a problem.
NERSC maintains a highly reliable cluster (Hopper) on which I can run a variety of molecular dynamics simulations important to my Ph.D. research, complete with a very informative website on the systems available as well as guides and tutorials, and MODs and e-mails informing me of any necessary maintenance, system shutdowns, etc.
Overall performance of parallel computing is impressive to me and file I/O performance is also outstanding.
The queue time is a little bit longer than what I expected. If the queue time can be shorten, it will be much better.
I am very happy with the HPC systems at NERSC, the web pages, and for the most part, job turn-around. Keep up the good work!
providing exceptional computational resources that are accessible and stable.
To serve variety of smaller users, variety of usable software.
user service, software up to date, flexible addition of cpu time resources when needed and available
To be honest, not very much.
Customer service is generally excellent. It can improve for the Dirac cluster.
I appreciated NERSCs help implementing yorick and mpy (=parallel yorick) which is LLNL-developed software to analyze simulation results.
You provided help when I asked for it. Thanks!
Reliability. Providing information.
User Services is outstanding.
ConsultingHelp with sorting out batch queue length for jobsKeeping users informed of the status of the systemLong term storageAccess to current and emerging technologiesExcellent array of scientific libraries and softwareHelpful webpagesWebairs to keep users up to date
Provide significant computing resourcesSupport all our Python-based toolsAllow interactive command-line work
Overall I am pleased with the operation on Hopper in the year 2013. I have some issues with how it will be run in 2014, please see my comments below. The technical support is quite good when needed and very responsive. The overall system performance of Hopper and HPSS is good.
From my perspective, NERSC did an excellent job of managing Edison.
NERSC makes access to high-performance computing resources very easy for DOE researchers and collaborators. There is no comparison.
NERSC has very good uptime and handles a huge number and variety of jobs.
what is mainly intended for - runs jobs
It provides very reliable service.
Integration of services, modules and resources
I consider NERSC to be the adults in the room when it comes to HPC.It is also clear that NERSC management understands the need for providing data ingest/export capabilities and data analysis capabilities in addition to traditional HPC services. I think that this is a significant strength that NERSC would do well to maintain.I am admittedly somewhat biased - I used to work for NERSC. However, based on dealing with NERSC as well as other HPC centers I can say that NERSC's reputation for technical excellence is well-grounded in reality.Also, the "other" capabilities such as networking and security that NERSC also does well are a significant asset to the center, and allow NERSC to be the excellent resource that it is.
NERSC provides the baseline parallel computing resources for my computational research needs. NERSC's computing services and reliability are essential to my research and they have yet to let me down.
There are a large number of users running a large number of jobs, and through it all NERSC keeps things running smoothly. The support staff is quick to address any issues, and overall things do run well in my opinion.
Range of scientific software provided. Performance of the software and the system. Stability of the system. Rapid and knowledgable response to user inquiries.
Management of the machines, (i.e. uptimes and informing the user when there is a need for downtime - or why there was an unexpected downtime.)
The overall performance, especially in computing source.
There was nothing in my experience that did not run well on NERSC
Pretty much every thing. If you can increase retention time in scratch and global scratch and increase home and project space. It would be perfect.
NERSC sets up, maintains, and provides access to high-performance, >100k core, systems. In my experience, these systems have performed reliably and well. In addition, the response from NERSC staff to problems and issues has been excellent.
NERSC does very well in providing highly competent technical consulting, and they are very prompt.All the computers are up-to-date, and they are designed for large-slcale jobs, which require supercomputers (and not clusters at universities).
I have been very impressed with NERSC's provision of resources and the pro-active way they have given real access even to new, essentially NDA resources. I think this is very useful to the community to help both to port code, and also to have an idea of the capacities and capabilities of the system. I was also impressed with the Globus Online interface to HPSS, tho originally I may have hit some bumps there.
I will like to thanks the NERSC team for stabilizing both genepool and our storage to a level that we can spend time thinking about Science rather than the infrastructure. Specifically, Douglas Jacobsen and Kjiersten Fagnan have put in a lot of effort to maintain the open communication so that users can get more involved and informed. It is also obvious that the team can now more proactively manage the system.Thank you and keep up the excellent works!
Timely reply/resolution of consult tickets. Good training.
customer interaction
Support staff is top notch
Good customer service
keep systems up & running.JGI support team is helpful, responsive, and a good resource.
Of the national computing resources, NERSC is the most useful to me because the queue system and resources give me access to a lot of nodes for long enough to do correct ensemble sampling.
Great service and technical support. Very fast state-of-art machines.
Support is AMAZING. Systems are rock solid, and software "just works."
Does NERSC provide the full range of systems and services you need to meet your scientific goals? If not, what else do you need?
I have one problem with NERSC that is shared by almost all users that I speak with in the computational chemistry community.It is very difficult to get small high-urgency or quick-turnaround of jobs done in time at NERSC. It is easy to measure how efficiently NERSC is using its computers - are processors being used 99% of the time or "idling around" waiting for something, and NERSC optimizes these measures heavily. What is missing is the efficiency of the scientist - if the scientist is "idling around" waiting for a job to finish, that can also be very expensive and completely derail scientific projects more than an idle processor. I'm still not sure NERSC fully appreciates the cost of causing the scientists to "idle", and it is difficult to quantify this in a report. But many people I know wouldn't even accept free NERSC time because they simply can't wait so long to get back a result. They'd rather pay the money to buy their own computer cluster which gives them results more quickly than accept free computer time and lose their human time waiting.Of course not everyone can be at the front of the line all the time, and that's not my suggestion. But there needs to be some better procedure to prioritize certain jobs that are important and deprioritize ones that can wait. The "priority" queue is meant to tackle this problem but honestly I think it is a poor solution and I don't think it's always used as intended. Anyway, many urgent jobs don't have the correct shape to fit in the queue. I think NERSC could do better on priority jobs and attract more users that typically buy $300K research clusters into shifting their computing at NERSC.
Yes
Yes.
Yes. It would be helpful to have more Mathematica licenses though.
I would like to NERSC nodes to have Kepler-generation GPGPUs and/or Xeon Phi boards.
I'd like to see queues exceeding 96 hr walltime on NERSC. Also, plans for data storage should be better defined.
We understand NERSC aims to support high-end, capability computing and even offers discounts for jobs requiring a large fraction of the machine. I have tried running jobs that would qualify for discounts and encountered very long queue waits. Fortunately, we can configure our jobs so they run on fewer processors. Then we get excellent throughput.
Absolutely. Computing at NERSC is fantastic.
Yes.
NERSC provides what I need.
Mostly. It would be great to have access to latest hardware technologies (as testbeds) such as new/latest GPUs, and other processors.
yes
Yes.
Yes.
Yes it does. Short, basic tutorials on using heterogeneous systems (GPUs, co-Phis) would definitely be helpful.
Yes, but this is a tricky question. To some extent we shape our scientific problems to best make use of the resources available. If, for example, 10x the GPUs were available tomorrow we could change the scale and scope of our questions to take advantage of them. NERSC is great now, and I look forward to taking advantage of your excellent resources in the future when they inevitably improve.
yes, provides full range
Sort of. I think that DOE lacks a high throughput environment that's general purpose, something like the XSEDE condor pool. We have OSG but (unless I'm mistaken) it's only for high energy?
Yes.
Yes, NERSC systems allow me to accomplish my research objectives.
The main thing I would want, if I could have my wish, and I already had a pony, would beA queue system wherein the total number of processes, not the total number of jobs, is restricted. Especially on carver.That would mean I don't have to go through the rigamarole of putting several instances of mpirun in a single job script, i.e., putting multiple parallel or serial jobs in a batch, on carver. Also, when jobs crash, that means that if I have a batch of parallel programs running within, say, the maximum number of two jobs on the 504 hour queue on carver -- say I have 8 parallel programs running within each of two carver jobs -- when one of those crashes, I can't restart it until the other 7 jobs have finished (or crashed too). Which sucks if it crashes when I am 200 hours into it. (Implementing a restart capability would be great, but this is nontrivial and has not been accomplished for a key program.)So if you could just limit the total number of nodes within a queue, without regard to the number of jobs, that would be great!
I don't use all of NERSC services.
Everything is there except there is tendency not to consider serial programming anymore. Some of us are still using serial programming in our research; even if it is getting outdated, NERSC should consider the serial programming community also in future choice. We still need those those processor with large memory to do our job.
My software consists of a main program plus a large set of dynamic libraries. The program produces C++ source code at runtime which is compiled into more dynamic libraries, which then are linked to the program using dlopen calls. NERSC is not ideally equipped to handle this type of program. Loading libraries (they can be as large as 2GB) comes at severe performance penalties. If this could be improved it would be a great help.
- Ability to have a per-project data drop-in location to be able to securely receive data and to be able to easily setup jobs to process data as data is received- Ability to process data as a shared project user- Support for advanced file management beyond just Unix permissions at least for select project locations - Increase options for web-based data sharing and analysis
Pretty much
Yes.
It is unfortunate that all of NERSC's production systems rely upon poorly designed network interfaces (OFED and uGNI) that make it difficult to scale applications with nontrivial communication patterns. NERSC really needs to push vendors to come up with network interfaces that do not require O(N) or worse metadata or force the user to make rock-vs-hard place decisions regarding memory-gobbling static connectivity vs. on-demand connections and crappy latency in fundamental operations such as MPI_Alltoall.
Yes.
It has always been my fortune that NERSC has granted and assisted my requests for new innovative methods to achieve our scientific goals.
NERSC covers my needs so far pretty well.
The turn-around time to run jobs is so slow that it hurts scientific productivity. My post-doc spends most of his week waiting for jobs to get through the queue on hopper.
yes
It would have been nice to have Matlab or Octave installed on Edison.
yes!
GPUs would be nice
Yes
Yes, the suite of systems and services at NERSC is pretty exhaustive and more than what I typically need.
Yes.
yes--NESRC is full service
When I wanted to use Mathematica I found the number of licenses quite small and found a different way instead of waiting for a free license.
NERSC is great, thank you for the support!
more real-application examples to learn from
Yes.
As far as I know, it does.
Yes.
Yes.
Yes
Yes.
Business continuity and disaster recovery needs. Would also be useful to validate systems by having simulated disaster recovery (such as restore data sources, dbs etc).
While the I/O bandwith is pretty good, I wonder if it could be improved by having some local disk on the compute nodes. Just a guess. I could be wrong.
Yes.
Probably, some increase of RAM amount for Carver.
Virtual memory in excess of physical memory on large-memory nodes.
No. My group (JGI Portal Group) have been asked to use NX as a replacement for the Linux desktop machines we had before and that are no longer supported. NX's performance is ok, and usable, but the lack of a full set of tools for web development means it is hard to use. Firefox doesn't get updated very often. Chrome is not even installed. Netbeans can only use Chrome for debugging. This is a big problem for us. The JGI may save money in terms of support by going to NERSC, but the developers clearly suffer.
No. I need more disk space that files aren't going to be removed from.
yes
NERSC needs to continue to expand its offerings for data-intensive high-throughput computing. This is different than just raw I/O bandwidth and different than just having a bunch of CPUs. e.g.* project space optimized for a larger number of smaller files* an allocations process for getting 100s of TB* serial queues or methods for processing many thousands of jobs, not just hundreds.* Carver is currently the best NERSC general purpose resource for this and I am concerned that it is going away, leaving only more targeted offerings like PDSF, Genepool, and non-data-intensive HPC offerings like Edison and Hopper.Could we please get a shorter path to /project/projectdirs/NAME ? Like /proj/NAME maybe or even /p/NAME?
Yes.
Yes.
Yes, the resources at NERSC are critical in carrying out my BES-funded research.
Yes
So far all I need is access to VASP and Turbomole on Hopper and Carver and I am satisfied with it.
Yes it does. I would like to be able to submit background demons that run for days and this is not possible in the current set up. It would allow me to perform high throughput calculations quite efficiently on nersc machines improving my scientific output. The demon is a shell script that manages runs and submits new runs when needed without human intervention.
Yes.
The current NERSC clusters are not suitable to execute any long time, large memory DFT calculations using hybrid functional or random phase approximation (RPA). Especially in the latter case, even for small system (less than 5 atoms), the calculation is usually killed due to insufficient memory (I always see a bunch of core files in my working folder). Sometimes, the calculation could finish, but the number simply is wrong.
Yes.
More slots for jobs (nodes) on PDSF.
yes.
A programming environment that is easier to use and get the user's own software running.
Overall, everything we need is readily available.
yes
Yes. NERSC is very useful.
To make better use of NERSC resources, more long-term storage (larger project directories) would be my first priority. Currently I am dependent on other systems for providing 100-TB-scale disk storage.
I would like more, newer Intel-based GPUs to be available.
Yes.
Yes.
I would really like to use GPU resources if more current hardware was available with longer run times
NERSC has provided everything I've needed so far.
Yes, more GPUs might be useful if NERSC does mean to begin emphasizing GPU use. As far as available computing resources and storage, however, NERSC seems to have all that I require.
Mostly. I have asked in the that Pathscale compiler on Hopper be retained, even if it is not supported anymore. Upgrading to new glibc made linking Pathscale object into executables give library errors, but they can be fixed with relative ease by providing alternative libraries (which I did by myself, since NERSC seemed not very receptive to keeping Pathscale working). A good reason for keeping as many (fortran) compilers as possible is that code that compiles and runs with a large number of compilers is much less likely to be buggy. Also for me, running CHARMM with pathscale on Hopper gives the same speed per core as running GNU or Intel on Edison, i.e. Pathscale would be a factor of 2 cheaper (this only applies to certain codes)
Mostly yes for computation. Not for visualization.
Emacs and related software are not up to date on carver.
No, we need large disk space under the project directory to store data to be shared by multiple users and not purged foran extended period of time, such as a year.
Yes.
Many of my codes do not scale to a large number of processors. Including more queues for a small number of processors (< 256) would be helpful.
For my particular research, it would be easier if NERSC had an updated version of Ferret, but this has not been a major issue.
Mostly everything we need. It would be very useful if we can setup database and web access to some of the data we generated.
Yes, it does for the projects I have worked on so far.
Yes, but with the caveat that guidance re: target architectures and supported programming models for future many-core hardware will be needed in the near future.
A GUI would be pretty cool.
Probably, I just don't always know what I need or could benefit from.
Pretty much. I could always use more time.
Yes.
Overall, NERSC definitely provides everything that I need to achive my scientific goals. Hands-down it is the best user facility I've used.That being said, here's a few features I'd be interested:1) More compute resources. Queue time seems to have gone up a lot over the last 2 years. Maybe I'm imagining it, but that's what it seems like.2) More I/O resources. I/O is becoming increasingly important and can dominate jobs. It also frustrates me when my jobs' run time is variable because of the variability of I/O resources. I have many jobs that don't finish because of this, wasting the resources. 3) On a related and more practical note (since I doubt you can just increase both compute and memory resources without getting a lot more money) : are there any schedulers that could take into account the current loading of the I/O resources and update the wall-time to reflect that, i.e. if the I/O resources are heavily loaded, then the wall-time would be increased? That would be a great feature. I could see it also working by the user specifying a minimum and maximum wall time based on I/O performance. 4) An online tool, or bash tool, that tells me which files of mine are in danger of being deleted. I am currently always paranoid about this happening. It's hard to do my scientific work and be a data manager. A tool like this would make this much simpler. I understand you may be concerned that then everyone would just "touch" their files to keep them from getting purged and then we'd have a death spiral of shorter and shorter data sweep times. However, I think people would just save the data they really needed if they had such a tool. I think the reason people touch their files now is that they're worried about losing the important ones and there's no way for them to keep track of all of their files and do their work. If they could see which ones might be deleted, I think they'd be more selective. If there were problems, you could implement a stricter policy about what happens if a user is touching their files to save them. Either way, this would save people a lot of time and is a rational solution, I think.
Yes, I can't think of anything not covered.
Yes
Yes.
Yes, NERSC provides a full range of services I need.
yes NERSC does provide
The computational resources provided by NERSC are essential for important work supported by federal grants. The performance is excellent, and kept very high by the staff. We are grateful. Nevertheless, the very nature of the cutting edge problems addressed computationally produces a justifiable pressure for increased capabilities. The magnitude of the compute problems we face is constantly growing and so does the expectation for improved resources that would enable their solution. As other computer centers (e.g., on Teragrid, Oakridge, etc.) have advanced in their capabilities and compute power, the hope is that NERSC will not be left behind.
Yes it does.
Yes. For over a decade the increase of the computing power at NERSC has been mostly in step with the needs of my science projects and I count on this to remain being the case in the future.
Mostly.
Yes
I think this is not a NERSC problem. The NERSC team provides good currently available tools. Unfortunately, the state of the art is tedious at best. When good solutions are found, I expect NERSC will supply them swiftly.
It does provide
Need better support for JGI webservices. For many developers at JGI, part of our role is to install or create webservices that support lab processes and for pipeline control. Currently, getting a webservice up and running is a difficult process and requires extensive intervention by the NERSC/JGI admins. A better solution would be to use virtual machines, and give developers more control over the configuration of the VMs.
I think there needs to be more resources and infrastructure set up for data-parallel tasks. I break my work up into many single-core jobs and submit them to the serial queue on carver. I think there needs to be more resources available for this kind of work. Also, sometimes I just want to submit a few < 10 jobs to make sure my code is running properly. If there is a huge backlog in the serial queue, there can be long wait times, and it can take several hours to find a small bug. It would be great if there were processors reserved, to allow for quick turnaround for debugging purposes.
visualization tools
We can always benefit from even more computing power
yes.
Mostly. Our throughput is mainly limited by 1) queue wait time2) /scratch IO performance (esp. on edison, where we see an order of magnitude variability in run times due to IO slowdowns on /scratch)Also, uptime on edison has also been somewhat limiting.
Yes, I need more allocations to meet my scientific goals and for sound planning. It is not adequate for my research group now and we have to slow down or abandon some innovative and ambitous new initiatives. Most of our special requests or applications got rejected.
The only thing that I am unsatisfied is about the home directory quota which is somewhat too low for my use.
Mostly, yes. However AORSA is limited by memory per core, so more memory per core would allow more physics.
Yes, I feel my needs are well met by NERSC.
More and faster persistent disk space!
Yes it does.
YES!
I think that queue politics is not without problem. NERSC would be oriented more to large scale problems. There is a lot of small size clusters (inside Universities and Labs) where are up to 100 cores is available. So small task may be have some restrictions on massively parallel system as Edison.
Training to escape from 1970s/80s/90s modes of running and to enter the world of workflow tools, databases etc.
Yes, thank you!
YES!
YES!
Yes!
I would be very happy to see a system like DaVinci back on the floor. Even though my use of DaVinci was intermittent, it was great to have access to a multi core analysis system when the situation called for it. When Carver is brought down we will be performing our analysis on the Cray systems and they seem less suited for such.
I found that I forget to mention one thing, the validity of command "showstart". In most of time, the waiting time shown by "showstart" is incorrect. It is very disappoint when I found my jobs is still queueing after the time "showstart" tell me.
I would like to mention that the wait times for small - medium sized jobs is large, sometimes a week ... which decreases productivity.
Generally, yes.
1) The mobile interface needs to start including Edison jobs. I mostly run on Edison and the mobile site is useless to me if I can't actually control anything I am running. MOTD should also be easier to access on mobile.2) Dirac needs some more documentation. We were having issues compiling some of our GPU code to run on Dirac at full capability, and that was due to CUDA driver mismatch. It would be nice to see a list of tested software and its performance/scaling.3) The home filesystem is slow. There needs to be a way to prevent users from running I/O jobs on it because otherwise everyone else starts suffering from delayed performance. I use plugins with my Vim and ran into >30s opening/closing times.4) Zsh and Fish support on Cray systems. I am so used to running Zsh 5 on my desktop and our private cluster, that going back to Bash on NERSC is a bit jarring (I want my autocomplete!). Zsh is installed and supported, but it's a very old version and Cray doesn't seem to support it well.
yes
Yes
I am very satisfied with NERSC. What I need for scaling my science problem is the same architecture as Edison but with 8x the core count.
A testbed platform having novel computing architectures (GPGPUs, Xeon Phi) is th enext step for me. The DoD is actively exploring these new computing technologies.
NERSC does a fantastic job. This keeps us wanting to do more at NERSC. One new capability would be to serve multiple hundreds of terabytes via a fast full-file access like globus.org and also ability to allow subsetting over high-bandwidth connections with opendap of even more data than we are already distributing via the Science Gateway.
Yes
Yes.
mostly are good. It is just sometimes the waiting time is bitterly long and I still find a good way to estimate the shortest total time (waiting+running) used.
More GPU support would be good. For example, the XK6 systems are look very nice for my applications.
Yes. My only wish is the total allocation can be increased and there is an additional opportunity for an allocation cycle . A more structured mid-year allocation will be an ideal to supplement the year-long allocation.
Yes
Yes, very much so!
More tools for runtime-analysis of MPI programs (tracing, memory). For example I didn't manage to make use of valgrind and didn't find another memory-tracker.
My jobs would perform better on systems with more memory/processor. Up to 8GB/processor.We would also benefit in shorter queue waits. Could the queue wait be proportional to the time requested, so that , on average, a 2 hour job would wait 12 times less than a 24 hour job?
Yes
Although NERSC provides enough systems for my ongoing research, it would be benefitial to have a larger variety of GPU architectures including NVIDIA Kepler.
Yes, NERSC systems are sufficient for me to meet my scientific goals.
I use NERSC primarily to archive science data on HPSS from a remote location in Chicago.Currently HPSS supports access via Globus Online, but it does not work in the event ofany errors in the transport layer that result in re-transmission of partial files, because append to fileoperations are not supported. This negates a large part of the benefit of using Globus Onlinefor very large data transfers. The recommended work-around is to transfer data first to global scratchand then from there to HPSS. This makes the process much more complex, especially becausethe default quota on global scratch is smaller than the typical transfer size.
Yes. There is a program called Amsterdam Density Functional. It would be nice to have this program as well.
yes?
Yes.
Although not an easy task, I/O disk performance is still the biggest bottleneck users feel in my area.
NERSC provides most, if not at all, of the systems and services I need.
NERSC never give to me reply or answer for my question through NERSC web page Q/A. Please give me answer for my question first. I am still waiting.....
Yes, my overall experience is excellent.
yes.
Decrease waiting time in queues.
The ability to run large ensembles of long running serial jobs.
It would be very useful if NERSC can provide Intel Inspector and Intel VTune. I understand there are technical difficulties installing these softwares on Edison. But it could be extremely useful if these can be installed on a small cluster. If the next big machine at NERSC is based on Intel MIC, I think, based on my own experience, it will be crucial for users to be able to have access to these debugging and profiling softwares to help port codes to many-core architecture.
yes
NERSC does provide a broad range of systems although system management can sometimes be lacking. For example, Babbage has still not been properly configured after a year of waiting.
I wish there were higher disk quotas and less fuss about limitations. but maybe Im spoilt by working on LC mostly
Yes
Yes
In theory, yes. But for the types of jobs we are running (e.g., on the scale of 30 minutes on 8 cores), the queue times were too long.
Yes, the do.
Yes, met all expectations and needs.
It would be great if you could provide a dedicated system for users to set up individual or ensembles of virtual instances that could do things like serve web pages, run databases, etc., and give the users a lot more power to set those things up directly. Like, a mini EC2 for DOE scientists (with reasonable restrictions) and maybe better hardware options.The DOE community is full of people who can use Edison, Hopper, Carver, but there are a bunch of people who could use the above but are currently spending DOE funds for it and that duplication could end if NERSC was able to put another machine on the floor that could do this. I understand this is a difficult thing for NERSC to propose, but no other DOE computing center is so ideally placed to do it. Services provided by those machines would be essential to building a real exascale data science facility.
One thing that would be immensely useful for our needs are possibilities for faster I/O for applications that are particularly I/O heavy.
We have no current plans to migrate our code to GPUs, so please don't go to GPU only architecture.
more allocation time
Yes. It works well.
I am quite happy with the services for my current needs
One small comment on the survey - I did the survey in two sittings, and it would have been helpful if the sections I had already completed were marked in some way (e.g. change the button color). That's a small nit though....
yes, very reliable and available.
Yes it does.
Yes.
YES!
Everything is excellent.
Yes.
In addition to existing systems and services, I need prompt interactive visualization, as described in a previous section. It would also be helpful to have a more mature GPU cluster.
Yes
* Better support for embarrassingly parallel jobs on systems such as carver.* More frequent updates of basic tools like svn.
Need more availability of experts in solving problems to install special scientific code on NERSC. Need some easy way to store data. Over all I am satisfied. Thanks!
Yes.
More practical training on coding, computing, best practices for tasks like those we do at JGI.E.g. * python training* R training* best practices for using HPC resources for running serial and embarassingly parallel jobs (e.g. NOT mpi, open-mp, etc).
I am writing an application for Syn Bio that will need to use blast against the current nr and nt databases. It would be great if these could be in a central place and maintained by NERSC. Currently anyone using these databases has to maintain their own copies. About a year ago I was told that there were plans to change this in the "future". I am just developing now, so I have a copy in my scratch space, but when I am ready to deploy, I will need a better solution.Also, syn bio needs to deploy a couple web apps that need access to genepool (for blast primarily). These are apps that were developed at JBEI, and we are deploying JGI specific versions. They have been written in java/servlets, thus run through tomcat/apache and use postgresql as their backend database. Also, because of certain technologies they use they must run as the 'root" web app (NOT as the superuser! just the root directory in the tomcat/webapps directory! the tomcat terminology leads to misunderstandings with admins all the time ). I opened a ticket with the server admins about getting this set up 4-5 months ago, and am still waiting to hear back. We can wait a little longer, that is not the issue. However, I think this is an area where the JGI-NERSC relationship needs some work, or we need to have a JGI server which can meet these needs.
NERSC is not structured to provide high reliability, high stability web hosting services. Scheduled maintenances, too frequent changes in file system environment and supported OS versions, do not meet our users expectations in terms of availability of a public web resource.I don't know that NERSC should modify practices to accommodate this, as it is outside their core mission of "bleeding edge" medium scale supercomputing. Perhaps web hosting / colo should be handled outside NERSC.
Yes.
If there is anything important to you that is not covered in this survey, please tell us about it here.
I just wanted to say that I'm really happy with the shift from Hopper to Edison. My code is MPI only and the limited memory/core on Hopper made it really difficult to run the problem sizes I needed to run. It's very difficult to commit the necessary 6 months required to rewrite one's code to target a single fleeting architecture choice. With Edison's increased memory/core, I'm now able to make forward progress on scientific problems for which I'd been fighting with this limitation on Hopper.
NERSC is THE best!
my primary complaint is the front end performance associated with GPFSmy workflow can be disrupted for 30 seconds at time...and if its consistently bad I might as will give up working for the day
It would be extremely useful to have longer data retention on scratch, and availability of space to store large data which are used often (not archival).
My data on GSCRATCH was purged well before 12 weeks. Please ensure this does not happen again.
I am not fond at all of the 'fairshare' variable in the batch scheduler on Edison.My jobs (which unfortunately come under OFES) take forever to run.In particular, the jobs submitted on large numbers of processors on Edison take a few weeks to actually run and sometimes not successfully through no fault of my own. Whether that is due to this 'fairshare' business and/or some additional adverse setting of the scheduler needs to be ascertained.A clearer 'description' of the scheduling on Edison would be most helpful. I am most satisified and pleased with the Webninars put together by Richard Gerber and his staff. I (remotely) enjoyed the GPU lectures because they made it possible for me to learn about GPUs other than by 'hearsay'. The lectures about optimization of performance on Edison, especially the profiling part, have had a direct and important impact on tuning my code for Edison. The support material made directly available in a directory on Edison was extremely useful in that regard.I also very much appreciate the NX server. It gave me back the X11 capability that I lost a few years back due to hardware and software changes on my home computer. The NX server isessential to my efficient use of the profiling tools at NERSC. Most importantly, the interfaces are very easy to use and are also much faster than native X (I tested that). That type of user-centric software development is much appreciated and I think much needed to make more and more efficient use of the outstanding NERSC facilities.
If possible purge times could be longer.
Modules often do not contain key information needed to use them. e.g. matplotlib needs python loaded; siesta needs a user agreement to have been signed. This info should be in the module itself.Compilers are quite slow on hopper. Running a configure script can take many times as long as on another machine. This is unhelpful for doing code development.
I truly believe that the team of people you have is amazing, and they work extremely well together. The key to NERSC's success, I believe is having motivated, smart, and good communicating staff. Bravo!
The problem I have with NERSC is the long queueing times as well as quotas. In other words, I can't use NERSC for all of my needs because that would lock me out of NERSC at some point.In addition, I had some issues with server's memory management policy. Some of my jobs had bugs that kept allocating memory. That ended up crashing nodes, whereas I was expecting memory quotas (max allocations) to be enforced.
Queue wait times is a critical issue that drives us away from NERSC. Because of the long waits, we tend to only use NERSC as a last resort resource.
Overall I am very satisfied with NERSC. I have one minor request (which I know has come up before): a more accurate version of showstart to predict when a job will run would be excellent for planning purposes.
- Hopper and Edison have been quite overloaded lately, and it takes quite a while to run a large job job there.- The *home* filesystem has been quite unstable. In many cases, the whole home directory is completely unusable (i.e., you can't ls or edit any file!)- The policy for removing old files from SCRATCH could be improved. We could at least get notifications when our files are about to be deleted. This might be asking too much, but having an auto-backing up mechanism to the tape archive would be very nice.
The more storage space I have, the less time I need to spend processing my data. Also, to meet storage requirements, I end up throwing away data that may prove useful in the future. From the perspective of a grant budget I would imagine that storage is cheaper than my time. HPSS is not ideal because there is a delay accessing data. Perhaps NERSC could allow people to buy more storage, expand their project directories? If it was a one time upfront fee, it could save a lot of time for me. Maybe you could even allow people to sell back their storage at a lower price if they ever no longer need the space.
This past year has seen the introduction of Edison and of /global/scratch2, neither of which went smoothly.Currently Edison is still reserved for testing alternate days.The switch to GSCRATCH2 caused me several headaches. I migrated my data in a timely manner as requests in NERSC's announcements only to have my data become inaccessible when GSCRATCH2 was down (but ironically GSCRATCH remained available).This past year also saw the introduction of a google calendar for NERSC's planned and unplanned outages. While there have been times when the information did not exactly match the MOTD or email announcements, the calendar has been of great benefit to me. It has allowed me to plan my work with much less effort than extracting the same information from the MOTD page or the status list emails. HOWEVER, I am not certain this service has been widely advertised as most of my colleagues have been unfamiliar with it when I've mentioned it to them.
Survey too long and detailed.
A minor comment:VASP version 5.2 is not available in Edison. It is much more robust than VASP 5.3.3 which is currently the only available version installed on Edison.Could you please install VASP 5.2 on Edison? Thanks.
No, the survey was well done also. Big surprise =)
The RStudio implementation that NERSC offers is very useful. It would be helpful to have the option of having several R Studio analysis windows open at the same time; currently it is only possible to have one instance of R studio up at a given time.
There's a high degree of inconsistency on Genepool - depending on whether you ssh or qlogin, and depending on which node you are on, various commands like 'ulimit' give varying and unpredictable output. This makes it incredibly difficult to develop, test, and deploy software in a way that will assure it can run on any node. Furthermore, forking and spawning processes is impossible to do safely when UGE may nondeterministically kill anything that temporarily exceeds the ulimit, even if it uses a trivial amount of physical memory. As a result, it is difficult to utilize system parallelism - if you know a set of processes will complete successfully when run serially, but UGE may kill them if you try any forking or piping, then the serial version has to go into production pipelines no matter how bad the performance may be.Also, Genepool's file systems are extremely unreliable. Sometimes projectb may yield 500MB/s, and sometimes it may take 30 seconds to read a 1kb file. Sometimes 'ls', 'cd', or 'tab'-autocomplete may just hang indefinitely. These low points where it becomes unusable for anywhere from a few minutes to several hours are crippling. Whether this is caused by a problem at NERSC or the actions of users at JGI, it needs to be resolved.
Regards system software upgrades: I understand the need to do this, but always dread it as often my code(s) may have problems. While consultants are always great at overcoming those types of problems, it takes time and I may end up missing compute times for "target dates",eg, quarterly account revisions, end of year reallocations, etc. So it would be nice if such updates were made after target dates have passed rather than in the 2 or 3 wk period before... especially so for the end of ERCAP year... Last week's software updates are the perfect example. Right now I'm in position of being ready to execute some large grid runs but am dead in the water because my code is not working after the updates.
sometimes access to basic functions, like ls, more, vi are very slow. it would be good if they becamefaster at home directories
We need the equivalent of /project optimized for installing collaboration code. Collaboration production code shouldn't live in some postdoc's home directory, and /project/projectdirs/ is optimized for large streaming files, not for installing code and managing modules.
I have mixed feelings about the new 'fair share' policy on Edison.In particular during 2013, I noticed several users from our own group who used large amounts of CPU time because it was 'free' without even knowing that this would impact the queue time of other jobs in our own group as well as in the larger science domain. I suspect that once we're being charged for Edison, this will become less of a problem.I am also not sure how well this 'fair share' policy was announced (though I am aware that a lot of users do not carefully read all email announcements...)
One thing, as a user, I find frustrating is the complete system shutdown in the middle of the week (I think it's Tues - quarterly shutdown?). Would it be possible to schedule such "global" downtimes on a long weekend or even friday-sat? I ask because at our facility, many people rely on the system and it requires them to ramp down and then back up.And if it does need to remain in the middle of the week, could you explain to us why so we understand the situation please?thanks
I cannot thing of anything more. Keep up the good work!
I would like to be able to submit background demons that run for days and this is not possible in the current set up. It would allow me to perform high throughput calculations quite efficiently on nersc machines improving my scientific output. The demon is a shell script that manages runs and submits new runs when needed without human intervention.
Debug jobs sometimes do not get chance to run during night.
no
I hate the module system! It's a huge pain in the neck to use for someone not familiar with it. I just spent several days trying to compile a custom version of lammps, which takes 15 minutes to do on my desktop! I had two issues, one was the bizarre cray proprietary software which I will forever hate no matter how fast it is because my time trying to get it to work is much more precious than squeezing out a few extra timesteps per second. The second was that when I did manage to successfully compile code in some module environment (e.g., using gcc and openmpi in the ccm), often when trying to execute the binary on the debug queue, it would fail to find the libraries it previously linked to. I had completely given up hope that I was going to be able to run my software to run on hopper but a more experienced colleague was able to find out what was wrong with my Makefile.
nothing
Currently HPSS is entirely local; any chance of remote backup services?
I would like it if the low priority queues allowed for longer clock time, or possibly the addition of a low_long queue. Many of the jobs I run essentially hit a wall in terms of scaling (e.g., adding more cores actually takes longer than using fewer), thus the only way to collect the necessary data is to run longer. During times of high utilization, jobs can take a week or longer to finally run in the low priority queue, making it difficult to acquire the necessary data while also efficiently using the award allocation.
It would also be nice to have more queues with >48 hours max walltime, maybe at least the low priority onesAs always shorter queue times would be nice.
I would hope that there will not be a complete focus on GPUs and massively parallel jobs in the future, since there are still many useful things that can be done without them.
The instructions for writing proposals are very vague. Some help on what a "good" proposal should contain would be appreciated. When writing proposals it is also not clear how much detail is wanted in each section. If we attach PDF's for detailed parts of proposals, how detailed/long should the attached documents be?Is there a way to get feedback on the review of our submitted proposals to improve the next round?
System performance with the jobs I run can vary tremendously. I assume this is due to changes in I/O speed, but it makes it difficult to anticipate the correct amount of walltime needed for a job to complete. It is somewhat common for a normally short (1/2 hour to 1 hour) job to take 2-3 times longer than expected. I am referring to actual run time, not queue time. Metadata and module access operations can also become frustratingly slow sometimes. When running interactively, simple python import statements that normally complete in a few seconds or less will sometimes take up to 5 minutes to finish. It is also frustrating to have to sometimes wait 20 seconds to 'ls' a directory.
There is one feature I would love: an email when I fill my data quota! I've had it when suddenly my jobs don't work, and realize I inadvertently went over quota. Maybe one email when I hit 90%, and another at 100% would be really helpful!
Would like to see larger scale access to GPUs or other accelerators.Have not had much success with access via NX, I think due to firewalls I'm behind at ORNL, but is difficult to diagnose from here.
No, thank you very much for all.
I was given an account on the NERSC systems so as to be able to share profiling data on how parallel applications use the Cray network. I do not use the systems themselves.
The element of crowd-sourcing seems somewhat missing at NERSC. If NERSC had a social-networking portal informing about the presence of other NERSC-users in the area there could be local meetings and exchange of ideas in a very decentralized fashion. User guides and other notices on NERSC are a great resource but people tend to learn significantly more by interacting with others. Something to think about...
NO
Queue time for job is really long. Some time I need to wait for a week to run job. Could you please consider the queue time and make it faster as much as possible.
I am enormously grateful for the exceptional level of service (particularly addressing issues of throughput) you have provided to me, both historically and especially this year.
There are three areas that I feel need improvement:1) filesystem delays. It always happens. I am logged on to a login node trying to perform interactive work: editing input files, saving input files, copying input files to ready for a long parallel batch run. And then someone else is on the same login node doing extensive I/O work such as copying files, running a series of short very I/O intensive jobs that prevent everyone else from doing work. And it happens at 2:00 am. There is no recourse, no one to tell. By the time, morning arrives, that person is gone without a trace. Help. How about doing more monitoring. 2) nx client/server. How much fluff, i.e. how many steps does one need to logon to the nersc systems. Please make this more direct and a one-step, single command process like this: nx-client carver and I am logged on. That's it. 3) Please consider installing several login nodes that are large smp machines that are capable of both parallel processing using either MPI or OpenMp. I think this would be an excellent experiment to try. These nodes could be specific to only those codes that are capable of running in parallel but are run interactively. You would be surprised at the pent up demand for this.
My one small gripe is that it sometimes takes 3-4 hours for a new ticket to be assigned to the consultant. If the issue is urgent, this delay is problematic. I have tried contacting consultants with my questions directly, with rather mixed results, so now I either enter a ticket and brace myself for a long wait or do not bother with a ticket and attempt to get my answers through other means.
I would like to suggest to add questions concerning the allocation time.
Keep up the best work you are doing.
1) Notifications about downtime. Edison uptime in November + December was abysmal. it's very hard to run long-term jobs when uptime is limited to two days at most. Plus the downtimes were far in excess of the planned windows. I can't complain since Edison was free and in 'beta', but it did make work difficult, especially since Hopper ended up being oversubscribed.2) There should be easier way to manage data on $SCRATH and $GSCRATCH. Maybe some sort of a history or date-sorted tree, so that we can tell which data was going to be purged first. Even better would be daily email digests notifying users that they have <24h before their data (and do say which data) was going to be purged. Keeping track of our large data sets was notoriously difficult and even led to some incidents.3) HPSS SRU distribution across users should be controllable by project managers. It's annoying to track down individual users if their HPSS uploads suddenly saturated the entire repo's allowance. The project managers should be able to modify the % charged to each repo and even purge some data if it's taking up too much time
The one set of benchmarks that I was not able to collect was GAMESS running under the Intel programming environment and using large pages. A bug in the Cray software (which is being worked on) prevented me from making these runs. Otherwise, there was a wealth of programming environments (Intel, native Cray, and GNU) under which I was able to run HYCOM and GAMESS. The downtimes were sometimes lengthy, but I understand that this was the shakedown period for Edison.
I suggest having a "I did not know about this" as an option.I did not realize that NERSC had a mobile app until I answered the question as "I do not use this".
No
It would be good to know what's the long-term plan regarding scientific gateways support and NEWT
If all the installed softwares including the commercial ones could be used by us even if we don't have a license ourselves, that would be great.
There should be more feedback in the allocation process. It's currently a black box, but if there was some feedback we would know what it is the allocation committee is looking for and we could fine tune our applications to that.
Sometimes things fall between the cracks. For example, on Edison, and now on the present system on Hopper, SuperLU_dist does not work from within PETSc. This affects us, because we use a block Jacobi preconditioner based on SuperLU_DIST. We just get a segmentation fault. NERSC thinks it is PETSc''s problem. PETSc thinks it is a SuperLU_DIST problem. It is not getting solved for months.
I am part of a number of projects that (attempt to) use multiple programming environments (pgi, cray, gnu) and multiple systems (hopper, edison, ...), or each of the projects focuses on a different infrastructure. Maintaining build logic that works for all of these is very consuming, and it breaks often when environments or libraries are upgraded. I do not know what can be done, but this is my primary irritation working at NERSC. Things are not necessarily better elsewhere, but the number of options supported at NERSC, and project decisions to support more than one of these, makes this problematic. (Reasons for using multiple programming environment include gnu is preferred for C++, pgi for fortran (and CUDA fotran), cray for co-array fortran, etc.)
NA
The NERSC HPC resources are critical for my job of running turbulence simulations for fusion energy research. Our simulations (or ensembles of our simulations), while using a large number of cores (~1,000's to ~10,000s), often don't need system size core counts (>100,000s) to produce high quality physics. In fact, I would argue that it's a large number of modest core count jobs that routinely produce the highest quality physics. It is crucial to have the facilities at NERSC that allows us to run this range of core count with reasonable turn-around. I have been very happy with NERSC regarding this. All I ask is that you please continue to avoid getting caught up in an early transition to GPU systems, or in batch queue's that penalize this range of core count. I understand you have to push the forefront of high-end computer science, but to accomplish actual physics research, for the majority of the time, all that's required is high availability and rapid turn-around of many modest core count runs. Not "Oh wow, look how many cores my code uses" jobs.Thank you!
none.
Batch queues longer than 24 hours on Edison and Hopper would be of help
I have a couple of issues that have arisen recently.First of all, let me say that I have been overall quite happy with NERSC over the last year. However, I have issues with the changes that are occurring with the start of the 2014 allocation year. My first problem is with the elimination of the charge reduction for large (above about 17k processor) jobs on Hopper. I have used the large queues on Hopper extensively in my work and have enjoyed the reduction of the charge factor in these queues. If the allocations do not increase according, the elimination of the reduced charge factor greatly hinders my ability to do cutting edge research on NERSC and would perhaps motivate me to work more with Titan an Oak Ridge for my large jobs. In fact, I am actually unable to finish my current work on Hopper because of this change. I had planned to use about 6 M hours (with the old queue policy). Obviously, I will now require approximately 10 M hours to complete this work (which I will not have access to - (I am a postdoc, not a PI)). Furthermore, the fact that Edison will only have a 0.75 reduction for large jobs is perplexing to me (given that Hopper was 0.6) and the fact that Edison will effectively become a premium queue (since it is going to double the charge of hopper). I am sure that I do not understand the inner workings of the supercomputer queues and policies. Quite frankly, it seems odd to have a new, cutting edge computing resource (which is about 2x as fast as Hopper on most jobs I have seen) that you eliminate the benefit of running on, by doubling the charge factor.A more detailed issue....I believe that I used the 3rd largest amount of computing time on NERSC this year (about 40 M hours), so I have a good bit of experience watching the queue. I have concerns about the effect of "thruput" queue jobs on larger (REG_MED and REG_BIG queue) jobs. At the end of the year, a few users submitted hundreds of thruput jobs in the queue which ( I can not quantify this exactly) significantly slow down how quickly large jobs go through the queue. It appears to have little effect on REG_SMALL jobs, but it appears to extend the wait time for large jobs by probably a factor of 3 (relative to when there are no thruput jobs). Once again, I can not quantify this, it is just an observation from experience and it seems like this should not be the case.
I very much appreciate it when I can identify a person or group at NERSC and build a direct relationship with them to help me get my projects done. There are some very excellent people in the NERSC organization.
home directory has to be increased to say 100 GB
Nothing comes to mind
I would like to see the Data Transfer Node capability significantly expanded. Data ingest/export at the scale of tens to thousands of terabytes is becoming steadily more important, and increasing data transfer performance typically takes time (i.e. just-in-time deployment for additional DTNs is probably not the right strategy).
very thorough survey
I have found the modules to be confusing. The module names don't always match the website instructions. So far it is easiest to simply use the default programming environment on HOPPER. However, for EDISON I found my code was not performing well, so I have avoided running jobs there.
Having something like nx for mobile devises would be great.
Overall metadata servers seem to be a bottleneck for the disk systems I frequently use (global homes and /projectb). In particular, global homes becomes almost unusable at least once per day. For example, vim open and closing can take minutes which make using the system extremely frustrating. The queue configuration on genepool has changed substantially over the past 18 months and while it is substantially improved in terms of utilization, coherency and usability it is still not possible to simply submit a job to the cluster and expect it to run on appropriate resources. Determining qsub parameters needed to get a job to run require detailed knowledge of the hardware and cluster configuration which ideally would not be the case. I would like to be able to submit a job and have it run. I have started trying to use task farmer (taskfarmermq) and while I can see huge potential in this software it is currently not reliable. I would love to see this become a fully supported service suitable for production work.I have been very happy with our onsite NERSC consultants . I think they bring tremendous value to jgi and have been a force for massive positive change in our hardware and software infrastructure and our ongoing productive working relationship with NERSC. Below, I briefly highlight a few examples. The modifications to the module system and migration of our software onto the module system have been very valuable. NERSC led efforts to support software repositories tied to the module system have also been very successful. The procmon system appears to have very great potential for helping us optimize our use of hardware and identify targets for software and workflow improvements but it has not yet lived up to this potential. The migration off of our 2PB Isilon system was a huge effort which certainly would not have succeeded without massive support of our NERSC consultants and that effort enabled and justified developing an incredibly useful interface to the tape archive. I hope in the coming year we can finally begin using a collection of canary jobs to more proactively monitor system usability and performance and that we can use data collected by procmon to begin tuning our software and hardware.
In the past we have asked about setting up a separate filesystem for software (see NERSC Consulting incident number 40343). This question has lingered unanswered for months.
Queue times on Hopper and Edison make them much less valuable than they would otherwise be. Its bad when you have to wait several hours to run a job that takes less than 1 hour and only uses a handful of nodes.
It would be great if batch allocation could give an option for better controlling physical placement of jobs across nodes, to avoid performance variation. Of course, I realize that algorithms will have to become more nonblocking in order to reduce the effects of performance variation.
Using applications already optimized for GPU use.