Comments
What does NERSC do best? How does NERSC distinguish itself from other computing centers you have used?
In their comments:
- 65 users mentioned ease of use, good consulting, good staff support and communications;
- 50 users mentioned computational resources or HPC resources for science;
- 20 mentioned good software support
- 15 queue management or job turnaround;
- 15 overall satisfaction with NERSC;
- 14 good documentation and web services;
- 9 data services (HPSS, large disk space, data management);
- 8 good networking, access and security
Their responses have been grouped for display as follows:
- NERSC's hardware and services are good / is overall a good center
- Provides good machines and cycles
- Good support services and staff
- Good web documentation
- Good software / easy to use environment
- Good networking and security
- Other comments
What can NERSC do to make you more productive?
40: | Improve Franklin stability and performance / less down time |
37: | Provide faster turnaround / more computing resources / architecture suggestions |
16: | Data Storage suggestions |
16: | Job scheduling suggestions |
13: | Software suggestions |
11: | Allocations suggestions |
10: | More or Better Services |
9: | PDSF suggestions |
4: | Network suggestions |
If there is anything important to you that is not covered in this survey, please tell us about it
6: | Areas not covered by the survey |
6: | Additional feedback - Franklin |
4: | Additional feedback - allocations |
7: | Additional feedback - other |
What does NERSC do best? How does NERSC distinguish itself from other computing centers you have used? 132 respondents
- NERSC's hardware and services are good / is overall a good center
-
The software and hardware is top notch
Very easy to use. The excellent website is very helpful as a new user. Ability to run different jobsizes, not only 2048*x as on the BG/P. In an ideal world I'd only run at NERSC!
Excellent support, hardware and software geared toward scientific applications.
Organization is top notch. Queuing is excellent.
Nersc is good at communicating with its users, provides large amounts of resources, and is generally one of the most professional centers I've used.
EVERYTHING !!! From the computing centers that I have used NERSC is clearly a leader.
NERSC have very good machines and very good staff, both make the difference from other computing centers
Overall the service is very good.
keep doing!
Availability of HPC resources, and application software management and performance optimization.
pdsf interactive with supported software
NERSC tends to be more attuned to the scientific community than other computer centers. Although it has taken years of complaining to achieve, NERSC is better at providing 'permanent' disk storage on its systems than other places.
NERSC researches & supports high-performance networking & data storage in addition to pure number-crunching.
NERSC's strengths are quick responses from consulting, quick network connection (to Berkeley campus), and no onerous security procedures like SecureID tokens.
The machines are very well-run and well documented. There is a wealth of chemistry software available and compiling our own is easy; the support is great. Allocations are both fair and simple, and we are given plenty of hours to support our projects. The large pool of memory and CPUs per node on Bassi makes it a great machine for the software we use.
Franklin is a superior machine, with lots of cycles for its users. That is, given you have time on the machine, the wait queue is reasonable.
The consultant staff is almost always available during their stated time frame, is courteous and evidently aims to please. In my opinion, this is very important for the success of the institution.Provides a stable long-term environment with hassle-free continuation of the allocation from year to year.
Writing as the PI of a moderate sized repo, NERSC provides a vital computational resource with lightweight admin/management overhead: we are able to get on with our science. User support is very good compared to other centers.
Enable massively-parallel computing with easy-to-learn, transparent procedures.
NERSC's documentation is very good and the consultants are very helpful. A nice thing about NERSC is that they provide a number of machines of different scale with a relatively uniform environment which can be accessed from a global allocation. This gives NERSC a large degree of flexibility compared to other computational facilities.
As a user of PDSF, I have at NERSC all the resources to analyze the STAR data in a speedy and reliable way, knowing that NERSC keep the latest version of data analysis software like ROOT. Thank you for the support.
Speed, both in terms of computing performances and in terms of technical support
Fair and balancing queuing on a robust platform (bassi), and the support for technical questions is good.
Customer support is the best. And NERSC has much more resources for access than other computing centers.
NERSC has very reliable hardware, excellent administration, and a high throughput. Consultants there have helped me very much with projects and problems and responded with thoughtful messages for me and my problem, as opposed to terse or cryptic pointers to information elsewhere. The HPSS staff helped me set up one of the earliest data sharing archives in 1998, now part of a larger national effort toward Science Gateways. (see: http://www.lbl.gov/cs/Archive/news052609b.html) This archive has a venerable place in the lattice community and is known throughout the community as "The NERSC Archive". In fact until recently, the lingua franca for exchanging lattice QCD data was "NERSC format", a protocol developed for the archive at NERSC.
Resources and software are superior.
I mostly used franklin for my computing. Franklin was stable most of the time except that period when it changed duel core to qual core. I think nersc has done a great job to keep the supercomputers stable 24x7 which is very important to increase our production. Also nersc consulting support is great in comparison with other computing centers.
I have been using NERSC facilities for over a decade and I acknowledge gratefully that NERSC facility is sine quo non for my research in the investigation of Physics and Chemistry of Superheavy elements. The Relativistic coupled-Cluster calculations carried out by us at NERSC for the atomic and molecular systems of the superheavy elements(SHE) Rutherfordium ( Z=104) through Eka-plutonium element 126 are well nigh impossible to perform at any other computing facility.This is due to extraordinary demands not only on CPU but also on disk storage and Memory requirements.
We have published some of our recent results on the various SHE and this has been possible only due to the untiring efforts , help and advice of David Turner and most generous grants of additional CPU times by Dr. Sid Coon and currently by Dr. Ted Barnes. Ms. Francesca Verdier has been a tower of strength and always willing and ready to iron out when we ran into problems . Last but not least I am most grateful to my Principal Investigator and distinguished colleague Prof. Walter Loveland who has most generously supported my theoretical research in the SHE. It is impossible for me to pay my debt to Prof. Loveland except by expressing once again my sincerest thanks to him for his guidance, advice and encouragement throughout our research supported by the US DOE Division of Theory of Nuclear Physics.
In conclusion, I express my sincerest thanks especially to all those mentioned above and other very kind and helpful men and women who have made NERSC a most user-friendly place to work in.I look forward to continue using the state-of the art second to none NERSC Supercomputing facility for my research for many years.NERSC generally provides a reliable computing environment with expert consultants. The hardware is more reliable than NCCS and the consultants are more informed.
Provide the start-of-the-art computing facilities and necessary scientific softwares for the purpose of conducting frontier research.
- Provides good machines and cycles
-
Top of the line production cycle provider in a high performance supercomputing environment
Unbelievably fast and sincere maintenance of systems dedicated to scientific users.
access to a range of systems (Bassi, Jacquard, Franklin) suitable for relatively small jobs (a handfull of cores) to large jobs that need (tens of) thousands of cores.
I had a very pleasant computing experience at NERSC, especially on Franklin. I admire how well and reliably I can run both small and large (several thousand procs) jobs on Franklin. A good thing is, that is is convenient to run also smaller jobs (8-128 procs) which is advantageous for development and testing or for the running of lots of small jobs each with very good parallel performance. Also the available time a job can spend in the queue varies on a reasonably large scale. There is practically no limit on the number of jobs I can submit for consecutive executing each taking a relative short time, utilizing temporarily available processors.
NERSC provides resources that would not otherwise be available.
The NERSC machines are more reliable in terms of uptime.
Size of the clusters.
I like HPSS.
Providing me with the computational resources I need. NERSC is the best managed supercomputing center I know.
I am mostly satisfied with NERSC. Please keep on running the servers well.
NERSC has the most powerful computers I have access to; therefore my research works can't be done without NERSC.
I mostly use PDSF. There, the focus is on data analysis/production, and the computing emphasis reflects this: availability and uptime, which (in my opinion) are excellent.
NERSC provides exceptional computing power and remote data storage. However, these resources still (over the last year) have not reached an acceptable level of reliability. I have not used other computing centers.
Providing large amount of computer power difficult to find elsewhere in a relatively stable fashion. I think the queues work very efficiently, at least compared to other systems I've used.
It's quicker than other computer I have used.
With NERSC I have access to larger machines (franklin) than anywhere else.
This is the only computing center I use. I am pleased with the resources I can use, although uptime on Franklin can be an issue.
Excellent management of the Franklin computing system along with rapid turnaround on medium to large jobs. Scratch files are saved longer than on comparble computers elsewhere.
the connect with pdsf and disk space seem best.
I like Bassi most that is very good for my shared memory parallel jobs with somewhat MPI.
The best thing is the power of clustering in terms of numbers of processors, resources, ...
allow me to run jobs that would be impossible to run on a local machine
convenience of getting an allocation if one works for DOE
NERSC provides accessible large-scale (>2000 core) machines.
The software I am using is well optimized to help fast calculations for my project. It is also much faster than local resource available so that I can get results soon.
Short queue times! Teragrid queues are at least 3+ days. I've never waited more than 12 hours for a job to start on Jacquard.
NERSC has high quality machines and plenty of option for interactive debugging and development.
a fantastic system!!!
I have been very pleased with the queue times on Franklin.
For me, the main distinguishing feature is that big jobs (thousands of processors) go through the queue much faster at NERSC than at other centers.
Support and turnaround times for 'medium size' MPP jobs of a few hundred to a few thousand cpus. (So far, I have just used the few hundred). Since understanding the physics requires parameter scans, this is much more useful than one very large job. Also, since I run highly nonlinear fluid-based simulations, the time step is closely related to the spatial resolution and the medium resolution at this size job runs in a reasonable wall clock time. A large job would require proportionately more time to cover the same simulated interval (changing from a few weeks for a fairly complete run to several months). This is not really affordable. Running several smaller, faster jobs that are designed to be compared against each other also means that software bugs and other problems are more quickly recognized and solved. This is important for the continuing improvement of MPP computer systems. This is a very important computational service that NERSC should continue to support.
- Good support services and staff
-
Very competent and timely user support.
Very helpful, knowledgeable support staff and consultants.
NERSC consulting is the most responsive of any computing center I have used.
NERSC is very responsive to both individual questions and problems and system issues. I get the feeling that there is a team of people trying very hard to keep the computers up and running and the users able to use them.
The best technical and consulting support !
Consulting. Advice. And software updates.
user support
The quality of the technical staff is outstanding. They are competent, professional, and they can answer questions ranging from the trivial to the complex.
Local resource. In general very adaptive to specific needs.
NERSC is user-friendly, its web-site is good (though not great), its staff is very knowledgeable.
NERSC is a great example for user support and outreach.
Getting users started! it can take months on other systems.
Very good at providing access to HPC. Very helpful staff.
Far and away it is the people that work for NERSC and the service they provide, from data analytics to the help desk and everything in between.
t seems like the account support is very helpful and quick to respond.
NERSC is doing a super job on supporting the users. It is this user-friendly environment that keeps me with them all the time. I should add that their action is all for the users, even if that means more complications for them. I appreciate what they are doing.
The support team at NERSC is great, far better than other computing centers I have used!
Mostly satisfied. By its quality of service.
The consultants are very friendly and very helpful.
I feel like the response time is very quick and professional. The fact that it's in the same timezone probably helps on the quick response.
The support is the best that I've experienced, absolutely fantastic. Keep up the good work.
Consulting support! The best among all HPC centers that I know.
better consulting services than others. Generally easier to use.
Resource Management. Advance messages about any updates or downtime of the system. MOTD is important and useful. Can easily find the status of all system in one click.
I am grateful that it is easy to get accounts for new users quickly, even if they are not U.S. citizens.
I have always found that NERSC responds very quickly to all requests for assistance, including help desk requests and also requests to Francesca Verdier for information on how to get additional allocations.Serves a range of users.
NERSC is doing excellent jobs on account support!
NERSC has been very responsive to comments.
NERSC is very user friendly and the staff is excellent, in striking contrast to most other computing centers.
NERSC excels in support, and in active engagement with users. I have not only received responses to my questions but have been called by technical staff actively looking for ways to streamline our computing process which has been very helpful. We have a very productive collaboration with the visualization staff.
MOTD
Consulting
Keeping users informed
More than one batch queueNERSC is excellent at responding to service requests and being flexible about dealing with problems. They are better at communicating with users than other centers.
NERSC provides quick feedback on issues, regarding information from a team of experts.
Nersc is very good at responding when I email them with concerns or questions.
good user support, good on-line documents
Your technical support staff is really on the ball!
good technical support
good user supportservice/consulting is helpful and prompt;
appears to be efficient at solving and dealing problems (e.g., system failures) - Good web documentation
-
The information on the website and the reliability of that information.
Great tutorials and user guides on the web pages
A nice and clear website.
A strong and quick response team of support.Very good documentation of systems and available software. Important information readily available on single web page that also contains links to the original documentation.
Things seem to be well organized in NERSC. The web interface is very user-friendly and well maintained.
Web page is nice compared to other computer centers I have used.
user friendly web site service and comprehensive information.
efficient use and allocation of computer resourceNERSC provides excellent information on its website on how to use its resources. Further, whenever I've called for help, the staff have been fantastic at helping me track down problems. Both of these features help NERSC to distinguish itself from other computing facilities I've used.
- Good software / easy to use environment
-
NERSC has great tech support and supports their software well. Other computing centers don't install anything and leave you to suffer through building MPI libraries and ScaLAPACK and all kinds of painful things like that. NERSC always has that nasty-to-build, nasty-to-install stuff already built and installed for you, which is very helpful. They are also impressively available at all hours of the day and night for account issues.
NERSC does an excellent job providing state-of-the-art popular applications softwares for most common uses, such as in quantum-chemistry and materials simulations.
NERSC also does a good job communicating upgrades/problems with users.
Account support is very customer-oriented.Software support
Disk space management
Consultant supportNearby. Good for code development.
NERSC has consistently been a more stable place for development than other computing centers I use. Unfortunately, NERSC is a victim of their own success because then a lot of people try to use the resources, which results in slow turn-around.
Good maintenance about software.
more useful software and STAR environment.
Well-compiled quantum mechanics codes, stable math libraries.
Programs can be easily compiled.
NERSC is a mature system and is relatively easy to use.
- Good networking and security
-
It is the convenience of access that makes NERSC distinguished from other centers. Still maintaining the overall security, NERSC provides excellent points of access that makes users more comfortable with their use of computing facilities. An excellent institute indeed is NERSC.
The non-firewalled network configuration at NERSC is extremely valuable. I can always use scp on my laptop to get results from PDSF disks. Compare this to e.g. the BNL cluster. User home or data disks are not visible from the "gateway" nodes that are the only externally accessible ones. If a laptop is also behind a firewall there is no easy way to to get data from BNL to the laptop.
Ease of login without a SecureID or equivalent makes using NERSC machines much more enjoyable. It also greatly simplifies data transfers when the home institution (PNNL) has very tight security that can get in the way if both sides have very tight security (such as when doing transfers to/from NCAR).
Network.
NERSC is much easier to use than other centers, in particular because of the absence of key fobs.
Relatively open and easy to use. No crazy security hoops to jump through, which is nice.
- Other comments
-
not spelling. it's spelling is undistinguished.
i use only one other large computing center, at LLNL, and that not enough to draw a meaningful comparison.I am ok with current status.
One of the things that NERSC has been doing extremely well is the emphasis on the scientific aspects of the research projects the center supports.
NERSC has been the best computing center I have used. However the I/O issues on Franklin and the fact that the fortran compilers on Franklin are not fully F95 compliant makes life difficult.
It's the only one I have!
In the past I have found franklin to be unreliable (crashes). In addition, before I stopped using Franklin, my jobs would sit in queues for days. I think that queues/allocation should be such that most jobs begin within 1 day.
The pending time is a little bit longer than other computing centers.
What can NERSC do to make you more productive? 113 respondents
- Improve Franklin stability and performance / less down time: 40 comments
-
improve stability / up time:
Less problems with franklin. Speedier resolution when it goes offline.
Improve stability in Franklin.
If FRANKLIN could be more stable and require much less frequent hardware maintenance, my efficiency would be much more improved.
more stable system and ...
[A batch queues with less QC (< 64 nodes, <256 processors) and larger Max Wallclock (3~7 days).] Of course, firstly, the system should be stable enough.
Improve Franklin's runtime! It's incredibly unreliable, and there is at least a shutdown per week...it migth be a very fast machine, but you can't trust it, cause it goes down unexpectedly so often...The bad functioning of franklin has seriously affected the performance of my work, and of many other users of franklin that I've talked to.
More system stable.
Less downtime.
More stability of the Franklin system
Franklin uptime may be improved.
Improve Franklin uptime, ...
keep improving machine stability and decreasing down time.
Less downtime and ...
The beginning of the year had a lot of down time that got in the way of productivity.
Job failure rates on franklin have been crippling. I know you're doing what you can to mitigate this, but I'm still seeing very high failure rates. ...
Improved reliability of leadership machines (this seems to have improved lately). ...
It would be nice, if there are less down times of computers.
Make Franklin more stable, [increase memory per processor.]
A few months ago I would have said "Fix Franklin please!!" but this has been done since then and Franklin is a LOT more stable. Thanks...
Franklin up-time has been a bit a stumbling block, but that's obviously not a NERSC-only problem.
Stable machine up time
Continue `hardening' Franklin (I probably did not have to write this.)
The only problem I have is that Franklin was often down when I needed to use it, but that has gotten better.
... better uptimes ...
[Increase the memory size on franklin] and improve her stability
Franklin stability has been largely improved, which is most critical to the productivity.
For any users needing multiple processors, Franklin is the only system. The instability, both planned and unplanned downtimes, of Franklin is *incredibly* frustrating. Add in the 24 hour run time limit, it is amazing that anyone can get any work done.
1) Stop jobs from crashing (maybe you already did this) ...
avoid system and hardware crashes
Avoid node failures!
Franklin is a terrible computer. I often have jobs die and the solution is to resubmit them with no changes. ...
... Better MPP computational reliability. Although it has improved since the worst levels earlier this year, I still regularly have jobs fail periodically for unknown and irreproducible reasons. I write restart files very frequently, particularly for large size jobs, which probably not very efficient even with parallel io. This is roughly 4x more than in seaborg/bassi days (every 500 times steps versus every 2000), while 6-8 hr wallclock jobs now run 4000-6000 time steps, at higher resolution instead of 2000-4000.
improve stability, I/O, and performance:
Improve uptime and file I/O performance of Franklin, [and make these top priorities for the next supercomputer procurement.]
It would be nice to to see higher stability and scalability
Fix the I/O issues on Franklin ...
... The login nodes are very underpowered, I had issues in April with two htars overloading the node. I have often found myself waiting for an 'ls' to complete. I put htars into batch scripts because they will exceed the interactive time limit.
scheduled downtime issues:
Keep scheduled maintenance to a minimum. It's nice that Franklin is getting more stable finally.
... much less frequent hardware maintenance
... and having maintenance on monday instead. thank you
NERSC is about to take out Bassi and Jacquard, but Franklin is most of the time on maintenance; so the only reliable computer that will be left is Davinci. Can you do something to fix Franklin maintenance schedule, the maintenance frequency is too high....it happens to often, and this is not good for the long run.
- Provide faster turnaround / more computing resources / architecture suggestions: 37 comments
-
Improve queue turnaround times:
Shorten the queue time on Bassi
... Larger number of jobs in queue [Bassi user]
The main problem was the long waiting queue time esp. on bassi, faster turn around time in queue would increase our productivity.
Reduce the waiting time of the scheduled jobs. [Bassi user]
The time wait on queue is too long. ...[Bassi user]
the chief limit for me is allocation and batch wait time. i do not see how you can make improvements here. [Bassi user]
PLEASE!!!! Change the Queue system on Bassi. It is not only slow, but I can't put enough jobs into the queue to make working there at all useful. I much prefer the system on Franklin, which allows me to run more jobs more quickly.
... Faster queue throughput is always appreciated! [Franklin / Bassi user]
Shorter queues [Jacquard / Franklin user]
have more machines of different specialties to reduce the queue (waiting) time [Franklin / Jacquard / DaVinci user]
As the number of users inevitably increases, I hope that the queuing time goes inversely proportional with the increasing user number counterintuitively. [Franklin / Jacquard / Bassi user]
decrease the queue time per job. [Franklin user]
... and somehow reduce queue wait time for the average user on Franklin.
Fix the batch and queue system. The queues in the past have been absurdly long..forcing me to use the debug queue over and over and limiting what I can run at NERSC. [Franklin user]
... faster turnaround ... [Franklin user]
... 2) Decrease pending job time [Franklin user]
Architecture suggestions:
Highly reliable, very stable, high performance architectures like Bassi and Jacquard.
Provide more resources that have 95+% uptime.
[Improve uptime and file I/O performance of Franklin,] and make these top priorities for the next supercomputer procurement.
Keep BASSI.
Most of our codes will port seamlessly to Franklin, but decommissioning Bassi will inevitably hit our projects hard.
have more machines of different specialties to reduce the queue (waiting) time
The majority of our cpu cycles are spent on ab initio electronic structure calculations. In principle Jacquard and Franklin would be very attractive systems for us to run on. Unfortunately, these applications are very I/O intensive. The global scratch space on these clusters makes running these electronic structure codes on them very inefficient. We have attempted to run these codes (primarily Molpro) in parallel across more than one node on Franklin and Jacquard, and this has proved to be extremely inefficient on Franklin. Trying to do this on Jacquard crashed the compute nodes. This makes running big jobs at NERSC largely counterproductive.
When purchasing new systems, there are obviously many factors to consider. I believe that more weight should be given to continuity of architecture and OS. For example, the transition from Seaborg to Bassi was almost seemless for me, whereas the transition from Bassi to Franklin is causing a large drop in productivity, ie porting codes and learning how to work with the new system. I estimate my productivity has dropped by 50% for 6 months. To be clear, this is NOT a problem with Franklin, but rather the cost of porting, and learning how to work on a different architecture.
At the moment, my group has shifted most of our supercomputing to NASA Ames, where the available systems (Columbia, Pleiades, and Schirra) and the visualization hardware and staff are better suited to our needs. I hope that NERSC will upgrade to more powerful systems like these soon.
Get some data processing machines [and tools] that actually work
NERSC needs a large vector processor machine to go with the Cray XT multi-core machine
provide more memory:
Larger memory quota for HOME directory. ... [Jacquard user]
Increase the per-core memory of the machines.
Put more memory per core on large-scale machines (>8 GB/core). ...
... , increase memory per processor. [Franklin / Bassi user]
Increase the memory size on franklin ...
provide more cycles:
Enhance the computing power to meet the constrained the needs of high performance computation.
... Have a bigger computer!!! [Franklin user]
Build more Franklin type machines. ...
Get more computers.
keep Franklin running, more computer hardware
- Data Storage suggestions: 16 comments
-
more quota / more disk capacity / better stability:
Larger disk quota on scratch directory. ... [Bassi user]
More quota. [Bassi / Frankllin user]
... Sometimes I need TBs-of disk space, quick availability of extended disk spaces for limited time is good (when needed). ... [Franklin user]
More Scratch Space. ... [Franklin user]
Increased disk capacity.
[Larger memory quota for HOME directory.] Stable SCRATCH system. [Jacquard user]
more stable scratch file systems [Franklin / Jacquard user]
Save scratch files still longer [Franklin user]
Franklin access to NGF:
Make more permanent disk space available on Franklin. It needs something like the project disk space to be visible to the compute nodes. ...
Get the compute nodes on Franklin to see NGF or get a new box.
Improved Franklin/NGF integration. [Better remote download services]
... I am very much looking forward to universal home directories and to having franklin:/project accessible during batch job runs.
Improve HPSS interfaces:
... better interface to the mass storage system [Franklin user]
The slowing of network access to NERSC may be understandable as the result of increased usage, but the denial of service attack by archival storage that has recently interfered with my work is not so readily explained.
Apparently, archive now refuses service for any more than one hsi session (to my UID). This eliminates (as though by design) the option of uploads to archive from multiple UIUC machines. This also potentially eliminates NERSC archival usability for access to outputs from our projects, whether or not generated on NERSC.
The failure of hsi on data transfer nodes dtn0[1,2].nersc.gov is an additional unpleasant surprise from NERSC. The Web pages indicate that this should work, but it does not.... Data storage tools. htar does not work on longer file names, which are the easiest way to transparently index different simulations. The old pipeline commands from hsi to tar no longer seem to work to read files out of hpss. The entire tar file is read out to disk. so I have had to cut down the size of tar files and try to avoid accessing some of the larger old files. Different computers (eg, Franklin and davinci) have problems with each other's hsi/tar utilities, so the file has to be read out on the computer it was stored on, then transferred. This requires both computers to be up simultaneously. ...
- Job scheduling suggestions: 16 comments
-
more support for mid range jobs / longer wall times:
... The policies need to be changed to be more friendly to the user whose jobs use 10's or 100's pf processors, and stop making those of us who can't allocate 1000's of processors to a single job feel like second-class users. It should be at least as easy to run 100 50 CPU jobs as one 5000 CPU job. The current queue structure makes it difficult if not impossible for some of us to use our allocations. [Franklin / Bassi user]
A batch queues with less QC (< 64 nodes, <256 processors) and larger Max Wallclock (3~7 days). .... [Franklin user]
It would be nice if a subset of nodes allowed wallclock times up to 3 or 4 days. [Franklin / Jacquard user]
Since Franklin is now stabilized, longer wall-time limits for queues will attract more jobs.
[Put more memory per core on large-scale machines (>8 GB/core).] Increase allowed wall clock times to 48 or 96 hours.
... Add in the 24 hour run time limit, it is amazing that anyone can get any work done. [Franklin user]
Many of the cases I simulate have to run for a longer time (several days) but do not use a tremendous amount of nodes (say 500). I wonder whether it is feasible to have a queue for such long-time runs. [Franklin user]
More simultaneous aprun commands. [Franklin user]
more interactive / debug support:
It can be difficult to get interactive time for tests & debugging, particularly if more than a few nodes are needed. The 30 minute limit is fine, but more nodes should be available. ... [Franklin / Bassi user]
... Also, it would be useful to be able to run longer visualization jobs without copying large data sets from one systems /scratch to another. This would be for running visualization code that can't be run on compute nodes; for instance, some python packages require shared libraries. [Franklin user]
... Continue efforts to allow large - memory and long-running serial analysis tasks without undue load on launch nodes (e.g. IDL). [Franklin user]
Enable longer interactive jobs on Franklin login nodes. Some compile jobs require more than 60 minutes, making building a large code base -- or diagnosing problems with the build process -- difficult. ...
The login nodes are very underpowered, ... I put htars into batch scripts because they will exceed the interactive time limit. [Franklin user]
better job information:
it would be useful if it was easier to see why a job crashed. I find the output tends to be a little terse. [Franklin user]
Queue wait times are not always consistent. I suggest that an estimated wait time be given after a job is queued, either on the website queue list or with the qstat command. Also, it would make my life easier if the job list invoked by the qstat command on franklin showed the number of cores for each job. Right now that column is blank. [Franklin / Jacquard user]
some more featured job/queue monitors [Franklin user]
- Software suggestions: 13 comments
-
... and more support for various computational chemistry codes. [Franklin user]
NERSC does an excellent job in adding new software as it becomes available. It is important to continue doing so.
I would like for NERSC to add Gaussview 4. [Bassi / Franklin / Jacquard user]
Install more development tools, like Git and Valgrind.
i would like to be able to use numpy, python and sm all together at nersc. [DaVinci user]
... Better remote download services
I'd like to see a fully developed gfortran environment. This would be compatible with the linux, open software systems many of us use, and I think could be more stable and responsive than what is available from some of the for-profits. gcc is at the heart of a great deal our OSs, seems like gfortran might satisfy our scientific computing needs. [Franklin user]
The whole compiling paradigm is not as much productive as it could be (as it is in other computing centers I have used). Compilers themselves are great but module loading should be easier and more effective. [Bassi / Jacquard user]
Continue and resolve work with HDF5 developers on parallel I/O issues with flexible domains. ...
My only complaint so far is the lack of distributed version control software, such as Mercurial: http://www.selenic.com/mercurial/wiki/ It's pretty ridiculous that the best repository option you have is subversion. [Jacquard user]
Get some data processing [machines and] tools that actually work
... and install the Intel Compliers on Franklin
All of theses are also more general limitations of MPP computation)
Better interactive MPP debugging tools for MPP codes !!!! The only really usable method on Franklin is print statements, since. much of the code development requires testing on multiple processors. DDT, especially recent versions (last year or so) are not very informative. Totalview was impossible over the web. (NX tunneling works very well on davinci to speed up interactive guis - consider installing it on the MPP computers).
Better visualization and data analysis for larger MPP jobs. The tools I use now are at their limits. I have a lot more data that I am unable to digest for presentation. For example, I would like to make movies of a number of quantities from my simulations, but I would have to extract and match the frames by hand from many different files, one for each time slice. ... - Allocations suggestions: 11 comments
-
need a larger allocation:
the chief limit for me is allocation and ...
More allocation and more effective use of the allocation.
Increase my allocation... Seriously, you're very good.
Larger allocation?
Please allocate more CPU time to me.
provide larger allocations
the only thing that NERSC can make is to put me unlimit time, but I know that is impossible and I'm very satisfy with NERSC
Allocate more time!
Faster allocation: Currently, we do not have enough available computational time on NERSC, and we need a new allocation. NERSC provides excellent computational resources, and we will be happy to use them as soon as the allocation process is successfully finished.
improve allocation management:
Get me a grant :^). More flexibility in the allocation use would be nice. Sometimes we front-load our research and other times it's towards the end. We are punished for not using the resource at a constant rate and sometimes research just doesn't work that way.
for long term users with proven productivity make the allocation process easier. If people are producing peer reviewed papers using NERSC resources, make it QUICKER to get allocations when possible.
- More or Better Services: 10 comments
-
improved web and communications services:
Up to date help pages!
It is becoming significantly less important now, but NERSC could have done much better at easing the learning curve of using the systems. I could have accomplished a great deal more already if I had known exactly how each system worked. Make sure all the information pages are up to date, and include comprehensive information, not just random tidbits.
NERSC could improve user's manuals.
Keep doing what you are doing. I'm particularly interested in the development of the Science Gateways.
It would be good if the NIM website and the www.nersc.gov website did not require separate login to go between them. ...
... The search tool for the web site does not work very well.
They should allow the users (1) to upload their published papers on-line and (2) to have annual user conference to communicate each other and explain to the general public.
1. Make this survey shorter!
2. Send announcements (e.g. for maintenance) per email with an attached iCal or ics file such that it can easily be imported in a calendar program. Can you create an online calendar to which one can subscribe with common calendar programs?more consulting help:
More willing to provide consulting help that requires more than 5 minutes of a consultant's attention. We are all pretty good with computers, so the problems that plague us may take several or many hours to resolve and your help is much appreciated. [Bassi / Franklin user]
We can always use man power to improve the performance and scaling of our codes.
- PDSF suggestions: 9 comments
-
Better support for ATLAS jobs.
Increased I/O performance to central discs.
Increased Network I/O performance.
Propper Integration in OSG/LCG GRID.standardize the pdsf OS
The main problem I have is not being able to run any batch jobs when usage is heavy. It seems that when both STAR an d ATLAS are running jobs, it is impossible for my group's jobs to be run. This means that our jobs may sit in the queue for days at a time without any progress. It is very frustrating for our work to be brought to a complete standstill when other groups are using the system, especially since our needs are very small in comparison.
Shorter procurement cycles at PDSF.
make sure the safe of the hard disk where the data are saved.
The interactive session to the PDSF usually extremely slow. I'm not sure this is due to the network connectivity since it has also been seen when I've connected from the LBNL site. The speed of the connection is even slower than I've connected to the other places, like RCF at BNL. It would be very helpful for us to do the data analysis at the PDSF by improving the slowness of interactive session.
extend hard disk storage capability
larger disk space
faster and smarter NERSC: My 1st hope is to make pdsf more efficient. Because I am a Asian user, I hope I can run more jobs at our night. When I get up in the morning, I can deal with the gotten results. Could our server align the jobs by the world time zone? On the other hand, I feel the data transfer rate is not fast enough for me, when I transfer the big files from US to China. So my other hope is for it to be faster in some day. Anyway, I wish NERSC keep going dynamically.
- Network suggestions: 4 comments
-
... increased file transfer speed for the case of very large files between NERSC and ANL would also be a nice feature, though I usually extract from large files what I need and transfer only that.
Help users overcome the impact of high-latency network connections for terminal settings. Home connections, hotel connections, etc. all become close to unusable because of latency.
... Improved bandwidth between NERSC and other computational facilities (esp. DOE facilities).
Increase download speed. [North Carolina State University user]
- No suggestions: 2 comments
-
Continue doing what they have been doing.
Not much.
If there is anything important to you that is not covered in this survey, please tell us about it. 23 respondents
- Areas not covered by the survey: 6 comments
-
pdsf website
More survey about allocation time will be important to users like us.
No survey on the individual software satisfaction.
Runtimes on Franklin vary a lot for the same job (this is after the upgrade).
I think it would be wise to ask the users about what they would like to see in the next procurements, from the next gen viz machine to replace davinci to the big iron.
Changes in service over the last year, five years. What did you do well before that you don't do well now? What are you doing well now that was a problem before? New acquisitions and the transition from old to new can be addressed in detail. This is a big computer science type issue that physical scientists need help with.
- Additional feedback - Franklin: 6 comments
-
I've been using NERSC for 12 years, and this is first time when the whole Scratch file system was lost!!! And you've lost it on both Franklin and Jacquard! What are you doing there? It will take me a lot of time to recover all lost data and code upgrades.
Franklin has vastly improved over the last year, I hope the stability gains continue.
Franklin is the only one I am using now. It is not always stable. I don't know why.
I do not know where to vent my frustration with the poor performing franklin login nodes. The login nodes are relatively slow compared to basic workstations as well as being highly used, which of course makes them even slower. Very difficult to compile C++ code and do other basic tasks... Of course, some of this login node slowness, but not all, is likely due to lustre. And I didn't see where I could report that lustre has been difficult to work with. Losing all of my data on /scratch was particularly painful. The amount of space (and especially inodes) given to users is simply too small. I realize users can request more space (and I do), but I don't feel that the work I'm doing is particularly special with regard to disk space. It just seems unbalanced to have such a powerful machine and such a small amount of space to work with. I wish there was an easier way to give/take files to users on the machine. Creating a /project directory is too much overhead for simple give/takes.
Congratulations for Franklin and the general maintenance of NERSC systems!
There was a period that franklin was quite unstable. I am satisfied with franklin except for this problem.
- Additional feedback - allocations: 4 comments
-
It is very important to renew allocation time (get more resources).
It would be much better to accept applications for large allocations quarterly. With the annual application currently used, one has to guess what funding will be in place to be able to use the allocation and plan ahead. Then, if the funding does not match up with expectations, one is left with a lot of left over cycles.
About the allocations process - I am a junior faculty member at a University, and I would like to comment about the allocation reductions. I realize that it is important to have a program whereby unused or underused hours should be reallocated. However, as a junior faculty member who is establishing a research group, a good portion of my computational resources will be used over the summer months, at least until my students get established in research. As such, I am finding that I am becoming susceptible to the first and second quarter allocation reductions. The current system negatively impacts junior faculty members disproportionately. I am not sure what to suggest to make it better, though perhaps some lenience could be given to junior faculty members as their research groups are established. Thanks!
Could I get more CPU time from other PIs who have a lot of surplus during the first half year? And this CPU time may be just specified that to Franklin so that such a management would not affect others.
- Additional feedback - other: 7 comments
-
The ticket system is designed to support individual users, but fails badly when there is a group-wide issue. One should be able to make it possible for others to add comments on one's ticket, but currently there is no way to even make a ticket visible to other users.
I am running NCAR climate models, and I guess there are other people who do that. I wish there is a web page (I think there used to be a web page but I cannot find it any more) so that we can get some help from it.
Thanks for such a fantastic resource (people and systems)!!! Mike Barad
NERSC is the BEST!
PDSF is pricing itself out of the market.
Many of the services are used by others in my group, I am a low level user so my answers may not be the most informative for some categories.
We do almost all of our post-processing using NCL which does not work well on davinci at all right now. This one fact renders NERSC practically useless to me.