Persson Group Handbook
  • Persson Group Handbook
  • (Materials Project Handbook)
  • Getting Started
    • Before you Arrive
    • After you Arrive
    • Coming Onsite
    • Getting a Computer
    • Workspace Setup
    • Getting Settled
  • Group Policies
    • Meetings
    • Leave Time
    • Purchasing
    • Travel
    • Writing/Submitting Your Paper
    • Group Agreements
  • Computing
    • Setting up your computer
    • Software
    • High Performance Computing (HPC)
      • NERSC (Perlmutter)
        • Conda Environments
        • Frequently encountered blockers
      • BRC (Savio)
      • LRC (Lawrencium)
      • NREL (Kestrel)
    • Simulation Codes
      • VASP
      • Q-Chem
      • Gaussian
      • OpenMM
      • LAMMPS
      • GROMACS
    • Fireworks & Atomate
      • Setup
      • Fireworks: Advanced Usage
    • Machine Learning (ML)
  • Group Administration
    • Group Meeting Schedule
    • Group Jobs (Fall 2024)
    • Physical Resources (printers, etc.)
    • Undergraduate Research Mentoring
    • Contributing to Handbook
    • Group Roster
  • Group resources
    • Tutorials and Helpful Links
    • Group Member Expertise
    • Professional development
    • Health and wellbeing
  • Graduate Students
    • UAW 4811 Contracts
    • Research Credits (298/299)
    • Course Recommendations
    • How to Qual (MSE)
    • How to Graduate
Powered by GitBook
On this page
  • To get started with calculations at NERSC:
  • Running Jobs on NERSC
  • Automatic job submission on NERSC: scrontab
  • Running Jupyter Notebooks on perlmutter
  • Automatic Job Packing with FireWorks
  • Jobs on Premium QOS
Export as PDF
  1. Computing
  2. High Performance Computing (HPC)

NERSC (Perlmutter)

This page describes setup for running calculations at NERSC's Perlmutter HPC.

PreviousHigh Performance Computing (HPC)NextConda Environments

Last updated 8 months ago

To get started with calculations at NERSC:

  1. Ask Kristin about whether you will be running at NERSC and, if so, under what account / repository to charge.

  2. Request a NERSC account through the NERSC homepage (Google “NERSC account request”).

  3. A NERSC Liason or PI Proxy will validate your account and assign you an initial allocation of computing hours.

  4. At this point, you should be able to log in, check CPU-hour balances, etc. through “NERSC NIM” and “My NERSC” portals

  5. In order to log in and run jobs on the various machines at NERSC, review the NERSC documentation.

  6. In order to load and submit scripts for various codes (VASP, ABINIT, Quantum Espresso), NERSC has lots of information to help. Try Google, e.g. [“NERSC VASP”]().

    ... * Note that for commercial codes such as VASP, there is an online form that allows you to enter your VASP license, which NERSC will confirm and then allow you access to. Log in to , select "Open Request", and fill out the "VASP License Confirmation Request" form.

  7. Please make a folder inside your project directory and submit all your jobs there, as your home folder has only about 40GB of space. For example, for matgen project, your work folder path should be something like the following:

    /global/cfs/projectdirs/matgen/YOUR_NERSC_USERNAME

  8. You can also request a mongo database for your project to be hosted on NERSC. Google [“MongoDB on NERSC”]() for instructions. Patrick Huck can also help you get set up and provide you with a preconfigured database suited for running Materials Project style workflows.

  9. (Optional)

Running Jobs on NERSC

This tutorial provides a brief overview of setting yourself up to run jobs on NERSC. If any information is unclear or missing, feel free to edit this document or contact Kara Fong.

Setting up a NERSC account:

Contact the group’s NERSC Liaison (currently Rohith Srinivaas Mohanakrishnan and Howard Li, see [Group Jobs list]()). They will help you create an account and allocate you computational resources. You will then receive an email with instructions to fill out the Appropriate Use Policy form, set up your password, etc.

Logging on (Setup):

  1. Download the script from NERSC to your home folder

  2. At the terminal type ./sshproxy.sh -u <nersc_username>

  3. Enter your password and OTP

You should now be able to log in without authenticating for 24 hours!

Logging on to Perlmutter

You can set up an alias for Perlmutter, or you can ssh into Perlmutter by running the following command in the terminal

alias perlmutter="ssh <your_username>@perlmutter.nersc.gov"

ssh perlmutter-p1.nersc.gov

Transferring files to/from NERSC:

For small files, you can use SCP (secure copy). To get a file from NERSC, use:

scp user_name@dtn01.nersc.gov:/remote/path/myfile.txt /local/path

To send a file to NERSC, use:

scp /local/path/myfile.txt user_name@dtn01.nersc.gov:/remote/path

Running and monitoring jobs:

The following instructions are for running on Perlmutter.

Most jobs are run in batch mode, in which you prepare a shell script telling the batch system how to run the job (number of nodes, time the job will run, etc.). NERSC’s batch system software is called SLURM. Below is a simple batch script example, copied from the NERSC website:

#!/bin/bash -l

#SBATCH -N 2          #Use 2 nodes
#SBATCH -t 00:30:00   #Set 30 minute time limit
#SBATCH -q regular    #Submit to the regular QOS
#SBATCH -L scratch    #Job requires $SCRATCH file system
#SBATCH -C cpu        #Use cpu nodes

srun -n 32 -c 4 ./my_executable

To submit your batch script, use sbatch myscript.sl in the directory containing the script file.

Below are some useful commands to control and monitor your jobs:

sqs -u username (Lists jobs for your account)
scancel job_id     (Cancels a job from the queue)

For Perlmutter GPU, the job scripts will look similar:

#!/bin/bash -l

#SBATCH -N 2          #Use 2 nodes
#SBATCH -t 00:30:00   #Set 30 minute time limit
#SBATCH -q regular    #Submit to the regular QOS
#SBATCH -C GPU        #Use GPU

srun -n 8 -c 32 --cpu-bind=cores -G 8 ./my_executable

For more options in the executable, please refer to NERSC documentation. To work with the high-throughput infrastructure, please refer to "Fireworks & Atomate" in this handbook.

Choosing a QOS (quality of service):

You specify which queue to use in your batch file. Use the debug queue for small, short test runs, and the regular queue for production runs.

Automatic job submission on NERSC: scrontab

In atomate, by using --maxloop 3 for example when setting rocket_launch in your my_qadapter.yaml, after 3 trials in each minute if there are still no READY jobs available in your Launchpad Fireworks would stop the running job on NERSC to avoid wasting computing resources. On the other hand, if you have Fireworks available with the READY state and you have been using crontab for a few days, even if the jobs you submitted a few days ago start running on NERSC, they would pull any READY Fireworks and start RUNNING them reducing the turnaround from a few days to a few hours! So how to setup crontab? Please follow the instructions here: 1. ssh to the node where you want to setup the crontab; try one that is easy to remember such as cori01 or edison01; for logging in to a specific node just do for example “ssh cori01” after you log in to the system (Cori in this example).

  1. Type and enter: scrontab -e

  2. Now you can setup the following command in the opened vi editor. What it does is basically running the SCRIPT.sh file every 120 minutes of every day of every week of every month of every year (or simply /120 *):

    */120 * * * * /bin/bash -l PATH_TO_SCRIPT.sh >> PATH_TO_LOGFILE
  3. Setup your SCRIPT.sh like the following: (as a suggestion, you can simply put this file and the log file which keeps a log of submission states in your home folder):

    source activate YOUR_PRODUCTION_CONDA_ENVIRONMENT FW_CONFIG_FILE=PATH_TO_CONFIG_DIR/FW_config.yaml
    cd PATH_TO_YOUR_PRODUCTION_FOLDER
    qlaunch --fill_mode rapidfire -m 1000 --nlaunches 1

Running Jupyter Notebooks on perlmutter

Automatic Job Packing with FireWorks

DISCLAIMER: Only use job packing if you have trouble with typical job submission. The following tip is not 100% guaranteed to work., and is based on limited, subjective experience on Cori. Talk to Alex Dunn (ardunn@lbl.gov) for help if you have trouble.

The Cori queue system can be unreasonably slow when submitting many (e.g., hundreds, thousands) of small (e.g., single node or 2 nodes) jobs with qos-normal priority on Haswell. In practice, we have found that the Cori job scheduler will give your jobs low throughput if you have many jobs in queue, and you will often only be able to run 5-30 jobs at a time, while the rest wait in queue for far longer than originally expected (e.g., weeks). While there is no easy way to increase your queue submission rate (AFAIK), you can use FireWorks job-packing to “trick” Cori’s SLURM scheduler into running many jobs in serial on many parallel compute nodes with a single queue submission, vastly increasing throughput.

You can use job packing with the “multi” option to rlaunch. This command launches N parallel python processes on the Cori scheduling node, each which runs a job using M compute nodes.

The steps to job packing are: 1. Edit your my_qadapter.yaml file to reserve N * M nodes for each submission. For example, if each of your jobs takes M = 2 nodes, and you want a N = 10 x speedup, reserve 20 nodes per queue submission. 2. Change your rlaunch command to:

rlaunch -c /your/config multi N

To have each FireWorks process run as many jobs as possible in serial before the walltime, use the --nlaunches 0 option. To prevent FireWorks from submitting jobs with little walltime left (causing jobs to frequently get stuck as “RUNNING”), set the --timeout option. Make sure --timeout is set so that even a long running job submitted at the end of your allocation will not run over your walltime limit. Your my_qadapter.yaml should then have something similar to the following lines:

rocket_launch: rlaunch -c /your/config multi 10 --nlaunches 0 --timeout 169200
nodes: 20

Typically, setting N <= 10 will give you a good N-times speedup with no problems. There are no guarantees, however, when N > 10-20. Use N > 50 at your own risk!

Jobs on Premium QOS

By default, premium QOS access is turned off for everyone in the group. When there is a scientific emergency (for example, you need to complete a calculation ASAP for a meeting with collaborators the next day), premium QOS can be utilized. In such cases, please contact Howard (hli98@lbl.gov or on Slack) to request premium QOS access. The access will be then turned off automatically after three weeks or after the emergency has been dealt with.

Once your account is set up, you can manage it at NERSC's [iris]().

You must use the SSH protocol to connect to NERSC. Make sure you have SSH installed on your local computer (you can check this by typing which ssh). You will also need to . This will allow you to generate "one time passwords" (OTPs). You will need append a OTP to the end of your NIM password each time you log on to a NERSC cluster.

We also advise you to configure the NERSC so that you only have to log into NERSC with a OTP once per 24 hours (helpful if you are scp-ing things). To do this:

To move a larger quantity of data using a friendlier interface, use .

You can also "mount" NERSC's filesystem in VSCode following the [guide here]()

Here, the first line specifies which shell to use (in this case bash). The keyword #SBATCH is used to start directive lines ( for a full description of the sbatch options you can specify). The word “srun” starts execution of the code.

In order to automatically manage job submission at NERSC, you can use [scrontab](). You can submit jobs periodically even when you are not signed in to any NERSC systems and perhaps reduce the queue time from 5-10 days to a few hours. This is possible because of the way jobs are managed in atomate/fireworks. Please make sure you feel comfortable submitting individual jobs via atomate before reading this section.

The last line of this 3-line file is really what submitting your job inside your production folder with the settings that you set in FW_config.yaml file. See for more info.

Please make sure to set your PRODUCTION_FOLDER under /global/project/projectdirs/ that has much more space than your home folder and it is also backed up. Make sure to keep an eye on how close you are to disk space and file number limitations by checking periodically.

Jupyter notebooks are quickly becoming an indispensable tool for doing computational science. In some cases, you might want to (or need to) harness NERSC computing power inside of a jupyter notebook. To do this, you can use NERSC’s new Jupyterhub system at . These notebooks are run on special jupyter nodes on Perlmutter, and can also submit jobs to the batch queues (see [here]() for details). All of your files and the project directory will be accessible from the Jupyterhub, but your conda envs won’t be available before you do some configuration.

To set up a conda environment so it is accessible from the Jupyterhub, activate the environment and setup an ipython kernel. To do this, run the command “pip install ipykernel”. More info can be found at .

https://docs.nersc.gov/applications/vasp/
https://help.nersc.gov/
https://docs.nersc.gov/services/databases/
Set up a conda environment.
https://materialsproject.gitbook.io/persson-group-handbook/group-resources/group-jobs
https://iris.nersc.gov/
set up multi-factor authentication with NERSC
sshproxy script
Globus Online
https://code.visualstudio.com/docs/remote/ssh
click here
https://docs.nersc.gov/jobs/workflow/scrontab/
atomate documentation
https://my.nersc.gov/
https://jupyter.nersc.gov/
https://docs.nersc.gov/services/jupyter/
http://bit.ly/2yoKAzB