arrow-left

All pages
gitbookPowered by GitBook
1 of 7

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Frequently encountered blockers

Find solutions to frequently encountered issues here

hashtag
SLURM is rejecting my jobs because my home directory is full

e.g. "home directory over quota" ,

We are limited to 40 GB of files in our home directories. This error indicates you have to many files.

hashtag
Steps to diagnose:

  1. Run showquota or myquota to see your file system space usage. If 'home' is 40GB or greater that's your issue.

  2. Run du -sh * in your home directory and look for any large directories.

hashtag
What to do if Conda env dirs are too large?

  • Quick solution: conda clean --all

  • Permanent Solution:

hashtag
What to do if .cache is too large?

  • Quick solution: Delete it! rm -rf .cache

    • Is this safe to do? Yes () but you may want to move a backup of the directory to a project directory if you're worried about it.

Run du -sh .[^.]* in your home directory and look for any large dot directories. Common issues are large .cache directory and large .conda directory
  • Alternatively, you can use the Ncurses Disk Utility tool via shifter to view your home directory usage with an ncurses GUI:

  • Change conda env directory to a project directoryarrow-up-right
    probablyarrow-up-right
    shifterimg pull bytesco/ncdu
    shifter --entrypoint --image=bytesco/ncdu

    NERSC (Perlmutter)

    This page describes setup for running calculations at NERSC's Perlmutter HPC.

    hashtag
    To get started with calculations at NERSC:

    1. Ask Kristin about whether you will be running at NERSC and, if so, under what account / repository to charge.

    2. Request a NERSC account through the NERSC homepage (Google “NERSC account request”).

    3. A NERSC Liason or PI Proxy will validate your account and assign you an initial allocation of computing hours.

    4. At this point, you should be able to log in, check CPU-hour balances, etc. through “NERSC NIM” and “My NERSC” portals

    5. In order to log in and run jobs on the various machines at NERSC, review the NERSC documentation.

    6. In order to load and submit scripts for various codes (VASP, ABINIT, Quantum Espresso), NERSC has lots of information to help. Try Google, e.g. [“NERSC VASP”]().

      ... * Note that for commercial codes such as VASP, there is an online form that allows you to enter your VASP license, which NERSC will confirm and then allow you access to. Log in to , select "Open Request", and fill out the "VASP License Confirmation Request" form.

    7. Please make a folder inside your project directory and submit all your jobs there, as your home folder has only about 40GB of space. For example, for matgen project, your work folder path should be something like the following:

      /global/cfs/projectdirs/matgen/YOUR_NERSC_USERNAME

    8. You can also request a mongo database for your project to be hosted on NERSC. Google [“MongoDB on NERSC”]() for instructions. Patrick Huck can also help you get set up and provide you with a preconfigured database suited for running Materials Project style workflows.

    9. (Optional)

    hashtag
    Running Jobs on NERSC

    This tutorial provides a brief overview of setting yourself up to run jobs on NERSC. If any information is unclear or missing, feel free to edit this document or contact Kara Fong.

    hashtag
    Setting up a NERSC account:

    Contact the group’s NERSC Liaison (currently Rohith Srinivaas Mohanakrishnan and Howard Li, see [Group Jobs list]()). They will help you create an account and allocate you computational resources. You will then receive an email with instructions to fill out the Appropriate Use Policy form, set up your password, etc.

    Once your account is set up, you can manage it at NERSC's [iris]().

    hashtag
    Logging on (Setup):

    You must use the SSH protocol to connect to NERSC. Make sure you have SSH installed on your local computer (you can check this by typing which ssh). You will also need to . This will allow you to generate "one time passwords" (OTPs). You will need append a OTP to the end of your NIM password each time you log on to a NERSC cluster.

    We also advise you to configure the NERSC so that you only have to log into NERSC with a OTP once per 24 hours (helpful if you are scp-ing things). To do this:

    1. Download the script from NERSC to your home folder

    2. At the terminal type ./sshproxy.sh -u <nersc_username>

    3. Enter your password and OTP

    You should now be able to log in without authenticating for 24 hours!

    hashtag
    Logging on to Perlmutter

    You can set up an alias for Perlmutter, or you can ssh into Perlmutter by running the following command in the terminal

    ssh perlmutter-p1.nersc.gov

    hashtag
    Transferring files to/from NERSC:

    For small files, you can use SCP (secure copy). To get a file from NERSC, use:

    To send a file to NERSC, use:

    To move a larger quantity of data using a friendlier interface, use .

    You can also "mount" NERSC's filesystem in VSCode following the [guide here]()

    Running and monitoring jobs:

    The following instructions are for running on Perlmutter.

    Most jobs are run in batch mode, in which you prepare a shell script telling the batch system how to run the job (number of nodes, time the job will run, etc.). NERSC’s batch system software is called SLURM. Below is a simple batch script example, copied from the NERSC website:

    Here, the first line specifies which shell to use (in this case bash). The keyword #SBATCH is used to start directive lines ( for a full description of the sbatch options you can specify). The word “srun” starts execution of the code.

    To submit your batch script, use sbatch myscript.sl in the directory containing the script file.

    Below are some useful commands to control and monitor your jobs:

    For Perlmutter GPU, the job scripts will look similar:

    For more options in the executable, please refer to NERSC documentation. To work with the high-throughput infrastructure, please refer to "Fireworks & Atomate" in this handbook.

    hashtag
    Choosing a QOS (quality of service):

    You specify which queue to use in your batch file. Use the debug queue for small, short test runs, and the regular queue for production runs.

    hashtag
    Automatic job submission on NERSC: scrontab

    In order to automatically manage job submission at NERSC, you can use [scrontab](). You can submit jobs periodically even when you are not signed in to any NERSC systems and perhaps reduce the queue time from 5-10 days to a few hours. This is possible because of the way jobs are managed in atomate/fireworks. Please make sure you feel comfortable submitting individual jobs via atomate before reading this section.

    In atomate, by using --maxloop 3 for example when setting rocket_launch in your my_qadapter.yaml, after 3 trials in each minute if there are still no READY jobs available in your Launchpad Fireworks would stop the running job on NERSC to avoid wasting computing resources. On the other hand, if you have Fireworks available with the READY state and you have been using crontab for a few days, even if the jobs you submitted a few days ago start running on NERSC, they would pull any READY Fireworks and start RUNNING them reducing the turnaround from a few days to a few hours! So how to setup crontab? Please follow the instructions here: 1. ssh to the node where you want to setup the crontab; try one that is easy to remember such as cori01 or edison01; for logging in to a specific node just do for example “ssh cori01” after you log in to the system (Cori in this example).

    1. Type and enter: scrontab -e

    2. Now you can setup the following command in the opened vi editor. What it does is basically running the SCRIPT.sh file every 120 minutes of every day of every week of every month of every year (or simply /120 *):

    3. Setup your SCRIPT.sh like the following: (as a suggestion, you can simply put this file and the log file which keeps a log of submission states in your home folder):

    hashtag
    Running Jupyter Notebooks on perlmutter

    Jupyter notebooks are quickly becoming an indispensable tool for doing computational science. In some cases, you might want to (or need to) harness NERSC computing power inside of a jupyter notebook. To do this, you can use NERSC’s new Jupyterhub system at . These notebooks are run on special jupyter nodes on Perlmutter, and can also submit jobs to the batch queues (see [here]() for details). All of your files and the project directory will be accessible from the Jupyterhub, but your conda envs won’t be available before you do some configuration.

    To set up a conda environment so it is accessible from the Jupyterhub, activate the environment and setup an ipython kernel. To do this, run the command “pip install ipykernel”. More info can be found at .

    hashtag
    Automatic Job Packing with FireWorks

    DISCLAIMER: Only use job packing if you have trouble with typical job submission. The following tip is not 100% guaranteed to work., and is based on limited, subjective experience on Cori. Talk to Alex Dunn (ardunn@lbl.gov) for help if you have trouble.

    The Cori queue system can be unreasonably slow when submitting many (e.g., hundreds, thousands) of small (e.g., single node or 2 nodes) jobs with qos-normal priority on Haswell. In practice, we have found that the Cori job scheduler will give your jobs low throughput if you have many jobs in queue, and you will often only be able to run 5-30 jobs at a time, while the rest wait in queue for far longer than originally expected (e.g., weeks). While there is no easy way to increase your queue submission rate (AFAIK), you can use FireWorks job-packing to “trick” Cori’s SLURM scheduler into running many jobs in serial on many parallel compute nodes with a single queue submission, vastly increasing throughput.

    You can use job packing with the “multi” option to rlaunch. This command launches N parallel python processes on the Cori scheduling node, each which runs a job using M compute nodes.

    The steps to job packing are: 1. Edit your my_qadapter.yaml file to reserve N * M nodes for each submission. For example, if each of your jobs takes M = 2 nodes, and you want a N = 10 x speedup, reserve 20 nodes per queue submission. 2. Change your rlaunch command to:

    To have each FireWorks process run as many jobs as possible in serial before the walltime, use the --nlaunches 0 option. To prevent FireWorks from submitting jobs with little walltime left (causing jobs to frequently get stuck as “RUNNING”), set the --timeout option. Make sure --timeout is set so that even a long running job submitted at the end of your allocation will not run over your walltime limit. Your my_qadapter.yaml should then have something similar to the following lines:

    Typically, setting N <= 10 will give you a good N-times speedup with no problems. There are no guarantees, however, when N > 10-20. Use N > 50 at your own risk!

    hashtag
    Jobs on Premium QOS

    By default, premium QOS access is turned off for everyone in the group. When there is a scientific emergency (for example, you need to complete a calculation ASAP for a meeting with collaborators the next day), premium QOS can be utilized. In such cases, please contact Howard (hli98@lbl.gov or on Slack) to request premium QOS access. The access will be then turned off automatically after three weeks or after the emergency has been dealt with.

    The last line of this 3-line file is really what submitting your job inside your production folder with the settings that you set in FW_config.yaml file. See atomate documentationarrow-up-right for more info.

  • Please make sure to set your PRODUCTION_FOLDER under /global/project/projectdirs/ that has much more space than your home folder and it is also backed up. Make sure to keep an eye on how close you are to disk space and file number limitations by checking https://my.nersc.gov/arrow-up-right periodically.

  • https://docs.nersc.gov/applications/vasp/arrow-up-right
    https://help.nersc.gov/arrow-up-right
    https://docs.nersc.gov/services/databases/arrow-up-right
    Set up a conda environment.
    https://materialsproject.gitbook.io/persson-group-handbook/group-resources/group-jobsarrow-up-right
    https://iris.nersc.gov/arrow-up-right
    set up multi-factor authentication with NERSCarrow-up-right
    sshproxy scriptarrow-up-right
    Globus Onlinearrow-up-right
    https://code.visualstudio.com/docs/remote/ssharrow-up-right
    click herearrow-up-right
    https://docs.nersc.gov/jobs/workflow/scrontab/arrow-up-right
    https://jupyter.nersc.gov/arrow-up-right
    https://docs.nersc.gov/services/jupyter/arrow-up-right
    http://bit.ly/2yoKAzBarrow-up-right
    alias perlmutter="ssh <your_username>@perlmutter.nersc.gov"
    scp user_name@dtn01.nersc.gov:/remote/path/myfile.txt /local/path
    scp /local/path/myfile.txt user_name@dtn01.nersc.gov:/remote/path
    #!/bin/bash -l
    
    #SBATCH -N 2          #Use 2 nodes
    #SBATCH -t 00:30:00   #Set 30 minute time limit
    #SBATCH -q regular    #Submit to the regular QOS
    #SBATCH -L scratch    #Job requires $SCRATCH file system
    #SBATCH -C cpu        #Use cpu nodes
    
    srun -n 32 -c 4 ./my_executable
    sqs -u username (Lists jobs for your account)
    scancel job_id     (Cancels a job from the queue)
    #!/bin/bash -l
    
    #SBATCH -N 2          #Use 2 nodes
    #SBATCH -t 00:30:00   #Set 30 minute time limit
    #SBATCH -q regular    #Submit to the regular QOS
    #SBATCH -C GPU        #Use GPU
    
    srun -n 8 -c 32 --cpu-bind=cores -G 8 ./my_executable
    */120 * * * * /bin/bash -l PATH_TO_SCRIPT.sh >> PATH_TO_LOGFILE
    rlaunch -c /your/config multi N
    rocket_launch: rlaunch -c /your/config multi 10 --nlaunches 0 --timeout 169200
    nodes: 20
    source activate YOUR_PRODUCTION_CONDA_ENVIRONMENT FW_CONFIG_FILE=PATH_TO_CONFIG_DIR/FW_config.yaml
    cd PATH_TO_YOUR_PRODUCTION_FOLDER
    qlaunch --fill_mode rapidfire -m 1000 --nlaunches 1

    Conda Environments

    Using Conda environments on NERSC systems

    hashtag
    What are conda environments?

    Conda envs are a great way to manage package/library versions. Frequently we need a specific configuration and package versions for one project's needs can conflic with another project's needs. Conda envs allow us to create separate "environments" where you can be free to install any package version you like without it affecting anything outside of the environment.

    hashtag
    Reminder: Put conda on PATH

    By default, conda is not on PATH and you may get an error when trying to call `conda ...`. Get it on PATH by:

    Note: you can that line to your ~/.bash_profle to prevent you having to do this each time you log in.

    hashtag
    Create Conda env

    To create a new named conda enviornment, use the following commands. In this example we create an enviornment named my_env with python 3.8

    conda create -n my_env python=3.8

    hashtag
    Use a Conda env you've created

    To enter your environment, activated it using its name:

    conda activate my_env

    To list your available enviornments:

    conda env list

    hashtag
    Change default conda settings (optional)

    hashtag
    Change environment directory

    By default, if we create a new enviroment, it will be stored in the $HOME directory (e.g. /global/homes/m/<username>/.conda/envs). Each of us has a quota of 40G for $HOME, and sometimes conda environments can get quite big, which can cause out-of-quota problem. So, let's change the default environment directory to avoid this.

    You should have access to /global/common/software/matgen/ (or /global/common/software/jcesr, depending on the account you have access to). Create a directory under your username (to store all your software), e.g.

    Within your directory, create a directory to store conda environemts (assuming we want to store it at .../<username>/conda/envs):

    Then, config conda to prepend to envs_dirs what we've created:

    This is all you need to do.

    To ensure it's successful, you can view conda settings by

    You will find something like

    Alternatively, you can open ~/.condarc to see all the changes you've made. You can even directly edit it to remove the changes or add new ones.

    hashtag
    Change package directory

    When you install a package, the package will first be downloaded to $HOME, (e.g. /global/homes/m/<username>/.conda/pkgs). You can change the default package storage directory as well:

    Agian, you may need to change matgen to the accout you have access to, and, of course, change <username> to your username.

    $ module load python
    $ cd /global/common/software/matgen   # or "cd /global/common/software/jcesr"  
    $ mkdir <username>   # change <username> to your NERSC user name 
    $ chmod g-w <username>   # remove group write access to avoid others changing it 
    $ cd <username> 
    $ mkdir conda && mkdir conda/envs
    $ conda config --prepend envs_dirs /global/common/software/matgen/<username>/conda/envs
    $ conda config --show
    envs_dirs:
      - /global/common/software/matgen/<username>/conda/envs
      - /global/homes/m/<username>/.conda/envs
      - /global/common/software/nersc/pm-2021q4/sw/python/3.9-anaconda-2021.11/envs
    $ mkdir /global/common/software/matgen/<username>/conda/pkgs
    $ conda config --prepend pkgs_dirs /global/common/software/matgen/<username>/conda/pkgs

    BRC (Savio)

    This page describes how to get setup at the Berkeley Research Computing center (BRC) on Savio.

    Berkeley Research Computing (BRC) hosts the Savio supercomputing cluster. Savio operates on a condo computing model, in which many PI's and researchers purchase nodes to add to the system. Nodes are accessible to all who have access to the system, though priority access will be given to contributors of the specific nodes. BRC provides 3 types of allocations: Condo - Priority access for nodes contributed by the condo group. Faculty Computing Allowance (FCA) - Limited computing time provided to each Faculty member using Savio.

    hashtag
    Setting up a BRC account

    Please make sure you will actually be performing work on Savio before requesting an account. To get an account on Savio, navigate to the BRC , register an account, make sure to select the appropriate allocation, and wait for approval from the BRC team. Typically, most students and postdocs will be running on co_lsdi. For more information, visit (Berkeley Research Computing)[]

    After your account is made, you'll need to . This will allow you to generate "one time passwords" (OTPs). You will need append a OTP to the end of your NIM password each time you log on to a NERSC cluster. We recommend using Google Authenticator, although any OTP manager will work.

    hashtag
    Logging on (Setup):

    You must use the SSH protocol to connect to BRC. Make sure you have SSH installed on your local computer (you can check this by typing which ssh). Make sure you have a directory named $HOME/.ssh on your local computer (if not, make it).

    We also advise you to configure a ssh socket so that you only have to log into BRC with a OTP only once per session (helpful if you are scp-ing things). To do this:

    1. Create the directory ~/.ssh/sockets if it doesn't already exist.

    2. Open your ssh config file /.ssh/config (or create one if it doesn't exist) and add the following:

    After your account is made, you'll need to set up 2-factor authentication. We recommend using Google Authenticator, although any OTP manager will work.

    You should now be ready to log on!

    hashtag
    Logging on to BRC

    To access your shiny new savio account, you'll want to SSH onto the system from a terminal.

    You will be prompted to enter your passphrase+OTP (e.g. <your_password><OTP> without any spaces). This will take you to your home directory. You may also find it useful to set up an alias for signing on to HPC resources. To do this, add the following line to your bash_profile:

    Now you will be able to initialize a SSH connection to Savio just by typing savio in the command line and pressing enter.

    hashtag
    Running on BRC

    Under the condo account co_lsdi, we have exclusive access to 28 KNL nodes. Additionally, we have the ability to run on other nodes at low priority mode.

    hashtag
    Accessing Software binaries

    Software within BRC is managed through modules. You can access precompiled, preinstalled software by loading the desired module.

    To view a list of currently installed programs, use the following command:

    To view the currently loaded modules use the command:

    Software modules can be removed by using either of the following two commands:

    Accessing In-House software packages

    The Persson Group maintains their own versions of popular codes such as VASP, GAUSSIAN, QCHEM and LAMMPS. To access these binaries, ensure that you have the proper licenses and permissions, then append the following line to the .bashrc file in your root directory:

    hashtag
    Using Persson Group Owned KNL nodes

    To run on the KNL nodes, use the following job script, replacing with the desired job executable name. To run vasp after loading the proper module, use vasp_std, vasp_gam, or vasp_ncl.

    hashtag
    Running on Haswell Nodes(on Low Priority)

    To run on Haswell nodes, use the following slurm submission script:

    High Performance Computing (HPC)

    This section and its subpages describe how to set up and run calculations on various computing resources that our group has access to, as well as how to backup your data.

    hashtag
    Overview

    Our group’s main computing resources are:

    • (the LBNL supercomputing center, one of the biggest in the world)

  • Savioarrow-up-right / Berkeley Research Computing (BRC)

  • Lawrenciumarrow-up-right / Berkeley Lab Research Computing (LRC)

  • Peregrinearrow-up-right (the NREL supercomputing center)

  • Argonne Leadership Computing Facility (sometimes)

  • Oak Ridge Leadership Computing Facility (sometimes)

  • At any time, if you feel you are computing-limited, please contact Kristin so she can work with you on finding solutions.

    hashtag
    Setup Guides (see subpages)

    hashtag
    Back-Up Data Frequently

    hashtag
    Mongo DB

    You should back-up your Mongo DB data frequently. The Mongo DB NERSC offers is not backed-up automatically. It's important to run regular backups during the course of your research. For Mongo DB you can:

    • Get a free education license for Studio 3Tarrow-up-right, right click your database, and click export or,

    • use the “mongodump” command line tool, tutorials available online — it’s a one line command

    module load mongodb; mongodump --host=<host e.g. mongo01.nersc.gov> --port=27017 --username=<username> --password="<password>" --db="<db_name>" --authenticationDatabase="<db_name (same as --db flag)>" --out="<path to directory>"

    hashtag
    NERSC Cori, BRC Savio, LRC Lawrencium

    You should back-up your "scratch" directory data frequently.

    Each cluster has different methods of deleting old data on your scratch directory:

    • Savio deletes data older than 6 monthsarrow-up-right

    • Lawrencium does not back-up the scratch directory but backs up other types of directoriesarrow-up-right

    • Cori has a purging policy on the scratch directoryarrow-up-right

    For NERSC users, the data archive system HPSSarrow-up-right is an excellent resource for putting your data long-term. There are command line tools for you to directly compress your large directory and send to HPSS in your scratch nodes. Alternatively, you could also use Globus (listed below) to transfer your data, but note that you have to compress your folders first rather than moving thousands of small documents over. This could be an alternatvie to moving data to /project directory under matgen nodes, because the latter would use up lots of inodes and NERSC will complain. The only downside is that others couldn't access your easily in HPSS.

    You can use one of the following popular tools for backing up raw data in our group:

    • Rclonearrow-up-right

    • Globusarrow-up-right

    Both of these tools can help backup your raw calculation files to your Google Drive, or Box Drive, or even external hard drives.

    hashtag
    Additional resources

    Other Persson group members and the NERSC website are both excellent resources for getting additional help. If that fails, you can reach out to the NERSC Operations staff:

    • 1-800-666-3772 (or 1-510-486-8600)

    • Computer Operations = menu option 1 (24/7)

    • Account Support = menu option 2, accounts@nersc.gov

    • HPC Consulting = menu option 3, or consult@nersc.gov

    Special thanks to the original authors of this page: Kara Fong, Eric Sivonxay, and John Dagdelen

    NERSCarrow-up-right
    NERSC (Perlmutter)chevron-right
    BRC (Savio)chevron-right
    LRC (Lawrencium)chevron-right
    portalarrow-up-right
    http://research-it.berkeley.edu/services/high-performance-computingarrow-up-right
    set up 2-factor authenticationarrow-up-right
    Host *.brc.berkeley.edu
    ControlMaster auto
    ControlPath ~/.ssh/sockets/%r@%h-%p
    ControlPersist 600
    ssh your_username@hpc.brc.berkeley.edu
    alias savio="ssh your_username@hpc.brc.berkeley.edu"
    module load <module_name>
    module avail
    module list
    module unload <module_name>
    module purge
    export MODULEPATH=${MODULEPATH}:/global/home/groups/co_lsdi/sl7/modfiles
    #!bin/bash -l
    #SBATCH --nodes=1                 #Use 1 node
    #SBATCH --ntasks=64               #Use 64 tasks for the job
    #SBATCH --qos=lsdi_knl2_normal    #Set job to normal qos
    #SBATCH --time=01:00:00           #Set 1 hour time limit for job
    #SBATCH --partition=savio2_knl    #Submit to the KNL nodes owned by the Persson Group
    #SBATCH --account=co_lsdi         #Charge to co_lsdi accout
    #SBATCH --job-name=savio2_job     #Name for the job
    
    mpirun --bind-to core <executable>
    #!bin/bash -l
    #SBATCH --nodes=1                 #Use 1 node
    #SBATCH --ntasks_per_core=1       #Use 1 task per core on the node
    #SBATCH --qos=savio_lowprio       #Set job to low priority qos
    #SBATCH --time=01:00:00           #Set 1 hour time limit for job
    #SBATCH --partition=savio2        #Submit to the haswell nodes
    #SBATCH --account=co_lsdi         #Charge to co_lsdi accout
    #SBATCH --job-name=savio2_job     #Name for the job
    
    mpirun --bind-to core <executable>

    LRC (Lawrencium)

    This page describes how to get set up to run calculations at LRC on the Lawrencium cluster.

    Lawrence Berkeley National Laboratory's Laboratory Research Computing (LRC) hosts the Lawrencium supercomputing cluster. LRC operates on a condo computing model, in which many PI's and researchers purchase nodes to add to the system. Nodes are accessible to all who have access to the system, though priority access will be given to contributors of the specific nodes. BRC provides 3 types of allocations: Condo - Priority access for nodes contributed by the condo group. PI Computing Allowance (PCA) - Limited computing time provided to each PI member using Lawrencium.

    hashtag
    Persson Group ES1 GPU Node Specs:

    hashtag
    Setting up an LRC account

    Please make sure you will actually be performing work on Lawrencium before requesting an account. To get an account on Lawrencium, navigate to , register an account and wait for approval form the LRC team. and the to a one-time password token generator and your account. You will also need to .

    hashtag
    Before logging on (setup)

    You must use the SSH protocol to connect to Lawrencium. Make sure you have SSH installed on your local computer (you can check this by typing which ssh). Make sure you have a directory named $HOME/.ssh on your local computer (if not, make it).

    After your account is made, you'll need to set up 2-factor authentication. We recommend using Google Authenticator, although any OTP manager will work.

    You should now be ready to log on!

    hashtag
    Logging on to LRC

    To access your shiny new Lawrencium account, you'll want to SSH onto the system from a terminal.

    You will be prompted to enter your pin+OTP (e.g. <your_pin><OTP> without any spaces). This will take you to your home directory. You may also find it useful to set up an alias for signing on to HPC resources. To do this, add the following line to your bash_profile:

    Now you will be able to initialize a SSH connection to Savio just by typing lawrencium in the command line and pressing enter.

    hashtag
    Running on LRC

    Under the condo accounts condo_mp_cf1 (56 cf1 nodes) and condo_mp_es1 (1 gpu node), we have exclusive access to certain Lawrencium nodes. If you do not know which of these node groups you are supposed to be running on, you probably shouldn't be running on Lawrencium. Additionally, we have the ability to run on ES1 GPU nodes at low priority mode (es_lowprio).

    hashtag
    Accessing Software binaries

    Software within LRC is managed through modules. You can access precompiled, preinstalled software by loading the desired module.

    To view a list of currently installed programs, use the following command:

    To view the currently loaded modules use the command:

    Software modules can be removed by using either of the following two commands:

    hashtag
    Using Persson Group Owned LRC nodes

    To run on the nodes, use the following job script, replacing with the desired job executable name.

    hashtag
    Using Non-Persson Owned LRC Nodes

    In addition to using Persson owned nodes (lower wait times, limited capacity), you can also submit directly to the main LRC queue. For jobs that aren't on a time crunch low-turnaround time, this can be a great option because it will not saturate our condo nodes. All of the instructions are identical to above except the accoutn should be set to pc_mp instead of lr_mp.

    hashtag
    Interactive Jobs on the Group GPU Condo Node

    To run an interactive session on the GPU node, use the following two commands to provision and log in to the node: salloc --time=24:00:00 --nodes=1 -p es1 --gres=gpu:2 --cpus-per-task=8 --qos=condo_mp_es1 --account=lr_mp srun --pty -u bash -i

    NREL (Kestrel)

    This page describes how to get setup at the National Renewable Energy Laboratory (NREL) on Kestrel.

    If you encounter an error with NREL, there is most likely documentation on https://nrel.github.io/HPC/Documentation/Systems/Kestrel/arrow-up-right.

    hashtag
    Setting up NREL account

    Visit website https://www.nrel.gov/hpc/user-accounts.htmlarrow-up-right to get started on making a user account. You will need to setup a MFA before proceeding.

    Send an email to HPC-Help@nrel.gov to get added on the correct projects. Ask Kristin or the NREL coordinators of the group to see which projects you belong to.

    hashtag
    Logging on

    For the purposes of this example, assume your NREL account username is: johndoe

    hashtag
    Kestrel login:

    Once in, the command line will prompt you to type in your account password+OTP.

    hashtag
    General NREL HPC login:

    Use this for loggin into the NREL HPC systems. From there, you can connect to any other NREL HPC system like Swift, Eagle, etc.

    hashtag
    Conda Environment Setup

    I personally would recommend setting up miniconda instead of using their default conda modules. If you would like to use their default conda module, execute module load conda and proceed to the conda install steps.

    hashtag
    Miniconda

    You can find these install instructions on

    You should now see a (base) or (miniconda) on the lefthand side of your command prompt.

    hashtag
    Setting up your conda environment

    Warning: This is different for every user. Please check with your mentor before proceeding.

    The following conda setup is designed for VASP users who will be using atomate2 with fireworks.

    Please refer to for setting up the fireworks yaml files and the fw_config directory.

    In addition, also setup these two additional files:

    hashtag
    atomate2.yaml

    hashtag
    jobflow.yaml

    Please use trajectory store version if you need to run AIMD/MD type calculations with atomate2

    GPU Computing Node

    Processors

    Dual-socket, 4-core, 2.6GHz Intel 4112 processors (8 cores/node)

    Memory

    192GB (8 X 8GB) 2400Mhz DDR4 RDIMMs

    Interconnect

    56Gb/s Mellanox ConnectX5 EDR Infiniband interconnect

    GPU

    2 ea. Nvidia Tesla v100 accelerator boards

    Hard Drive

    500GB SSD (Local swap and log files)

    LRC portalarrow-up-right
    user agreementarrow-up-right
    set up a MFA token for your accountarrow-up-right
    https://docs.anaconda.com/miniconda/#quick-command-line-installarrow-up-right
    https://materialsproject.gitbook.io/persson-group-handbook/computing/atomate-setup/setup#set-up-an-environmentarrow-up-right
    ssh your_username@lrc-login.lbl.gov
    alias lawrencium="ssh <your_username>@lrc-login.lbl.gov"
    module load <module_name>
    module avail
    module list
    module unload <module_name>
    module purge
    #!/bin/bash
    # Job name:
    #SBATCH --job-name=<job_name>
    #
    # Partition:
    #SBATCH --partition=cf1
    #
    # QoS:
    #SBATCH --qos=condo_mp_cf1
    #
    # Account:
    #SBATCH --account=lr_mp
    #
    # Nodes (IF YOU CHANGE THIS YOU MUST CHANGE ntasks too!!!):
    #SBATCH --nodes=1
    #
    # Processors (MUST BE 64xNUM_NODES ALWAYS!!!):
    #SBATCH --ntasks=64
    #
    # Wall clock limit:
    #SBATCH --time=24:00:00
    
    ## Run command
    
    module load vasp/6.prerelease-vdw
    export OMP_PROC_BIND=true
    export OMP_PLACES=threads
    export OMP_NUM_THREADS=1 # NEVER CHANGE THIS!!
    
    mpirun --bind-to core <executable>
    #!/bin/bash
    # Job name:
    #SBATCH --job-name=<job_name>
    #
    # Partition:
    #SBATCH --partition=es1
    #
    # QoS:
    #SBATCH --qos=condo_mp_es1
    #
    # Account:
    #SBATCH --account=lr_mp
    #
    # GPUs:
    #SBATCH --gres=gpu:2
    #
    # CPU cores:
    #SBATCH --cpus-per-task=8
    #
    # Constraints:
    #SBATCH --constraint=es1_v100
    #
    # Wall clock limit:
    #SBATCH --time=24:00:00
    
    export CUDA_VISIBLE_DEVICES=0,1
    module load cuda/10.0
    #!/bin/bash
    # Job name:
    #SBATCH --job-name=<job_name>
    #
    # Partition:
    #SBATCH --partition=es1
    #
    # QoS:
    #SBATCH --qos=es_lowprio
    #
    # Account:
    #SBATCH --account=lr_mp
    #
    # GPUs:
    #SBATCH --gres=gpu:2
    #
    # CPU cores:
    #SBATCH --cpus-per-task=8
    #
    # Constraints:
    #SBATCH --constraint=es1_v100
    #
    # Wall clock limit:
    #SBATCH --time=24:00:00
    
    export CUDA_VISIBLE_DEVICES=0,1
    module load cuda/10.0
    ssh johndoe@kestrel.nrel.gov
    ssh johndoe@hpcsh.nrel.gov
    cd ~/ #ensures you start off in your home dir
    
    mkdir -p ~/miniconda3
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
    rm ~/miniconda3/miniconda.sh
    
    ~/miniconda3/bin/conda init bash
    ~/miniconda3/bin/conda init zsh
    
    source ~/.bashrc
    conda create -n cms python=3.9 pandas seaborn numpy scipy matplotlib
    conda activate cms 
    
    pip install git+https://github.com/materialsproject/atomate2.git #replace with your desired way to downloading atomate2
    pip install fireworks
    pip install pydantic==2.4.2 #to avoid some pydantic bugs with fireworks...
    VASP_CMD: srun -n 104 -c 1 vasp_std 
    VASP_GAMMA_CMD: srun -n 104 -c 1 vasp_gam
    VASP_CMD: srun -n 104 -c 1 --cpu-bind=cores --gpu-bind=single:1 -G 4 vasp_std
    VASP_GAMMA_CMD: srun -n 104 -c 1 --cpu-bind=cores --gpu-bind=single:1 -G 4 vasp_std
    VASP_CMD: srun -n 104 -c 2 vasp_std 
    VASP_GAMMA_CMD: srun -n 104 -c 2 vasp_gam
    JOB_STORE:
      docs_store:
        type: MongoStore
        database: johndoe_general
        host: mongodb03.nersc.gov
        port: 27017
        username: johndoe_general_admin
        password: ***PASSWORD***
        collection_name: kestrel_outputs
      additional_stores:
        data:
          type: GridFSStore
          database: johndoe_general
          host: mongodb03.nersc.gov
          port: 27017
          username: johndoe_general_admin
          password: ***PASSWORD***
          collection_name: kestrel_outputs_blobs
    JOB_STORE:
      docs_store:
        type: MongoStore
        database: johndoe_general
        host: mongodb03.nersc.gov
        port: 27017
        username: johndoe_general_admin
        password: ***PASSWORD***
        collection_name: kestrel_outputs
      additional_stores:
        data:
          type: GridFSStore
          database: johndoe_general
          host: mongodb03.nersc.gov
          port: 27017
          username: johndoe_general_admin
          password: ***PASSWORD***
          collection_name: kestrel_outputs_blobs
        trajectory:
          type: GridFSStore
          database: johndoe_general_admin
          host: mongodb03.nersc.gov
          port: 27017
          username: johndoe_general_admin
          password: s***PASSWORD***
          collection_name: kestrel_trajectory_blobs