1 of 3

Fireworks & Atomate

This page describes the setup procedure for using fireworks and atomate on high performance computing systems. These instructions are intended for first time users.

Please see the following subsections on fireworks/atomate setup and usage:

name: NERSC_fworker
category: ''
query: '{}'
env:
    db_file: /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'srun -n 64 -c 4 --cpu_bind=cores vasp_std'
    gamma_vasp_cmd: 'srun -n 64 -c 4 --cpu_bind=cores vasp_gam'
    scratch_dir: /global/cscratch1/sd/YOUR_USERNAME
    incar_update:

name: savio_fworker
category: ''
query: '{}'
env:
    db_file: /global/home/users/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'mpirun --bind-to core vasp_std'
    gamma_vasp_cmd: 'mpirun --bind-to core vasp_gam'
    scratch_dir: /global/scratch/YOUR_USERNAME/
    incar_update:

name: lrc_fworker
category: ''
query: '{}'
env:
    db_file: /global/home/users/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'mpirun --bind-to core vasp_std'
    gamma_vasp_cmd: 'mpirun --bind-to core vasp_gam'
    scratch_dir: /global/scratch/users/YOUR_USERNAME/
    incar_update:

name: gpu_fworker
category: ''
query: '{}'
env:
    db_file: /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'srun -n 4 -c 16 --gpu-bind=single:1 vasp_std'
    gamma_vasp_cmd: 'srun -n 4 -c 16 --gpu-bind=single:1 vasp_gam'
    scratch_dir: /pscratch/sd/FIRST_LETTER/YOUR_USERNAME
    incar_update:

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/my_fworker.yaml singleshot
nodes: 1
walltime: '24:00:00'
account: matgen
job_name: knl_launcher
signal: SIGINT@60
qos: regular
constraint: 'knl'
pre_rocket: |
            source activate cms
            module load vasp-tpc/5.4.4-knl
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/home/users/YOUR_USERNAME/fw_config/my_fworker.yaml singleshot
nodes: 1
walltime: '24:00:00'
account: co_lsdi
job_name: knl_launcher
queue: savio2_knl
qos: lsdi_knl2_normal
ntasks: 64
pre_rocket: |
            source activate cms
            module load vasp
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1
            post_rocket: null

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/home/users/YOUR_USERNAME/fw_config/my_fworker.yaml singleshot
nodes: 1
ntasks: 64
queue: cf1
walltime: '06:00:00'
account: lr_mp
job_name: lrc_cpu_launcher
signal: SIGINT@60
qos: condo_mp_cf1
pre_rocket: |
            source activate cms
            module load vasp/6.prerelease
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/perlmutter/my_fworker.yaml singleshot
nodes: 1
walltime: '06:00:00'
account: matgen_g
job_name: gpu_launcher
signal: SIGINT@60
qos: regular
constraint: 'gpu'
pre_rocket: |
            source activate cms
            module load vasp/6.2.1-gpu
            export SLURM_CPU_BIND="cores"
            unset MPICH_GPU_SUPPORT_ENABLED

case $NERSC_HOST in
    "cori")
        : # settings for Cori
        module load python/3.9-anaconda-2021.11
	export PMG_VASP_PSP_DIR='/project/projectdirs/matgen/POTCARs'
	export FW_CONFIG_FILE='/global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/FW_config.yaml'
	alias cdconfig='cd ~/fw_config'
        ;;
    "perlmutter")
        : # settings for Perlmutter
        module load python
	export PMG_VASP_PSP_DIR='/global/cfs/cdirs/matgen/POTCARs'
	export FW_CONFIG_FILE='/global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/perlmutter/FW_config.yaml'
        alias cdconfig='cd ~/fw_config'
        ;;
  esac

module load python/3.8
export PMG_VASP_PSP_DIR=/clusterfs/mp/software/POTCARs
export FW_CONFIG_FILE='/global/home/users/YOUR_USERNAME/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

module load python/3.9.12
export PMG_VASP_PSP_DIR=/clusterfs/mp/software/POTCARs
export FW_CONFIG_FILE='/global/home/users/YOUR_USERNAME/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

export FW_CONFIG_FILE='/HOME_DIRECTORY_PATH/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

Setup Fireworks: Advanced Usage

Setup

This page describes how to set up fireworks and atomate.

This page is intended to help you get set-up the first time you are using fireworks and atomate so you can learn how these softwares work. This set-up is not ideal for most research applications. Please refer to Fireworks: Advanced Usage if you are looking for practical advice and tips on how to run calculations once you understand the basics of fireworks.

Note: These instructions are based on the original version of atomate. atomate2 is a rising code that is still in development. Generally these research codes are under continuous improvement, so if something isn't working here please ask! It is likely the documentation needs to be updated.

Set up an environment:

Create the conda environment:

module load python
conda create -n cms python=3.10

module load python/3.8
conda create -n cms python=3.9

module load python/3.9.12
conda create -n cms python=3.9

module load python
conda create -n cms python=3.10

module load python
conda create -n cms python=3.10

Note: cms is an abbreviation for "computational materials science," but feel free to pick your own environment name!

Activate the environment and install the base libraries:

source activate cms
conda install numpy scipy matplotlib pandas

To install atomate2 on the environment, use one of the following:

(For Most People) If you are just using atomate and fireworks, with no plans to develop code/ workflows, use the following:

pip install atomate2 fireworks
pip install ruamel.yaml==0.17.40

For developers:

source activate cms
cd ~/.conda/envs/cms
mkdir code
cd code
git clone https://github.com/materialsproject/pymatgen.git
git clone https://github.com/materialsproject/pymatgen-db.git
git clone https://github.com/materialsproject/fireworks.git
git clone https://github.com/materialsproject/custodian.git
git clone https://github.com/hackingmaterials/atomate.git

cd pymatgen
python setup.py develop
pip install -r requirements.txt
cd ..
cd pymatgen-db
python setup.py develop
pip install -r requirements.txt
cd ..
cd fireworks
python setup.py develop
pip install -r requirements.txt
cd ..
cd custodian
python setup.py develop
pip install -r requirements.txt
cd ..
cd atomate
python setup.py develop
pip install -r requirements.txt
cd ..

Getting Started Fireworks Configuration:

Create Fireworks configuration files

Note: You will want to follow the steps in this section on both your local computer and the supercomputer where you will be running your calculations (e.g. making a fw_config folder in your home directory that contains the 3 files: FW_config.yaml, db.json, and my_launchpad.yaml) so you can easily interface with fireworks from both your local computer and the supercomputer.

Make directory in your home directory called fw_config:

cd ~
mkdir fw_config
cd fw_config

Now create 3 files: FW_config.yaml, db.json, and my_launchpad.yaml with the following contents. Make sure you replace YOUR_USERNAME with your username; on NERSC you will also need to find your home folder, which is typically under another higher folder with the name of the FIRST LETTER of your username. Note: you can view your filesystem online. Also, consider setting up access to your NERSC filesystem locally on your computer: Check out "Mounting NERSC's file system locally" here:#running-jobs-on-nersc

FW_config.yaml (fireworks configuration file)

CONFIG_FILE_DIR: /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config
QUEUE_UPDATE_INTERVAL: 5
ECHO_TEST: "cms fw_config activated"

CONFIG_FILE_DIR: /global/home/users/YOUR_USERNAME/fw_config
QUEUE_UPDATE_INTERVAL: 5
ECHO_TEST: "cms fw_config activated"

CONFIG_FILE_DIR: /HOME_DIRECTORY_PATH/fw_config
QUEUE_UPDATE_INTERVAL: 5
ECHO_TEST: "cms fw_config activated"

CONFIG_FILE_DIR: /home/YOUR_USERNAME/fw_config/
QUEUE_UPDATE_INTERVAL: 5
ECHO_TEST: "cms fw_config activated"

db.json (Database file)

{
    "host": "mongodb07.NERSC.gov",
    "port": 27017,
    "database": "fw_db_name",
    "collection": "tasks",
    "admin_password": "<admin_password>",
    "admin_user": "<admin_username>",    
    "readonly_password": "<readonly_password>",
    "readonly_user": "<readonly_username>",
    "aliases_config": null
}

my_launchpad.yaml (Fireworks LaunchPad file)

host: mongodb07.NERSC.gov
port: 27017
name: 'fw_db_name'
username: '<admin_username>'
password: ‘<admin_password>’
logdir: null
Istrm_lvl: DEBUG
user_indices: []
wf_user_indices: []

Create Fireworker and Queue Launchers:

Fireworker file (my_fworker.yaml)

name: NERSC_fworker
category: ''
query: '{}'
env:
    db_file: /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'srun -n 64 -c 4 --cpu_bind=cores vasp_std'
    gamma_vasp_cmd: 'srun -n 64 -c 4 --cpu_bind=cores vasp_gam'
    scratch_dir: /global/cscratch1/sd/YOUR_USERNAME
    incar_update:

name: savio_fworker
category: ''
query: '{}'
env:
    db_file: /global/home/users/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'mpirun --bind-to core vasp_std'
    gamma_vasp_cmd: 'mpirun --bind-to core vasp_gam'
    scratch_dir: /global/scratch/YOUR_USERNAME/
    incar_update:

name: lrc_fworker
category: ''
query: '{}'
env:
    db_file: /global/home/users/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'mpirun --bind-to core vasp_std'
    gamma_vasp_cmd: 'mpirun --bind-to core vasp_gam'
    scratch_dir: /global/scratch/users/YOUR_USERNAME/
    incar_update:

name: gpu_fworker
category: ''
query: '{}'
env:
    db_file: /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'srun -n 4 -c 16 --gpu-bind=single:1 vasp_std'
    gamma_vasp_cmd: 'srun -n 4 -c 16 --gpu-bind=single:1 vasp_gam'
    scratch_dir: /pscratch/sd/FIRST_LETTER/YOUR_USERNAME
    incar_update:

name: NREL_fworker
category: ''
query: '{}'
env:
    db_file: /home/YOUR_USERNAME/fw_config/db.json
    vasp_cmd: 'srun -n 64 -c 1 --cpu_bind=cores vasp_std'
    gamma_vasp_cmd: 'srun -n 64 -c 1 --cpu_bind=cores vasp_gam'
    scratch_dir: /projects/PROJECT_NAME/YOUR_USERNAME
    incar_update:

Queue Adapter (my_qadapter.yaml):

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/my_fworker.yaml singleshot
nodes: 1
walltime: '24:00:00'
account: matgen
job_name: knl_launcher
signal: SIGINT@60
qos: regular
constraint: 'cpu'
pre_rocket: |
            source activate cms
            module load vasp-tpc/5.4.4-knl
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/home/users/YOUR_USERNAME/fw_config/my_fworker.yaml singleshot
nodes: 1
walltime: '24:00:00'
account: co_lsdi
job_name: knl_launcher
queue: savio2_knl
qos: lsdi_knl2_normal
ntasks: 64
pre_rocket: |
            source activate cms
            module load vasp
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1
            post_rocket: null

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/home/users/YOUR_USERNAME/fw_config/my_fworker.yaml singleshot
nodes: 1
ntasks: 64
queue: cf1
walltime: '06:00:00'
account: lr_mp
job_name: lrc_cpu_launcher
signal: SIGINT@60
qos: condo_mp_cf1
pre_rocket: |
            source activate cms
            module load vasp/6.prerelease
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/perlmutter/my_fworker.yaml singleshot
nodes: 1
walltime: '06:00:00'
account: matgen_g
job_name: gpu_launcher
signal: SIGINT@60
qos: regular
constraint: 'gpu'
pre_rocket: |
            source activate cms
            module load vasp/6.2.1-gpu
            export SLURM_CPU_BIND="cores"
            unset MPICH_GPU_SUPPORT_ENABLED

_fw_q_type: SLURM
rocket_launch: rlaunch -w /home/YOUR_USERNAME/fw_config/my_fworker.yaml rapidfire --max_loops 5
nodes: 1
ntasks_per_node: 128
walltime: '12:28:00'
account: PROJECT_NAME
job_name: job_name_var
exclude_nodes: 'c5-17,c1-39,c5-27,c1-61,c12-6,c1-37,c10-27,c10-18,c1-36,c9-13,c4-6,c1-58,c6-43,c9-15,c4-48,c5-55,c1-38,c7-8,c1-63,c10-49,c4-22'
signal: SIGINT@60
queue: standard
pre_rocket: |
            export ATOMATE2_CONFIG_FILE="/home/YOUR_USERNAME/fw_config/atomate2/config/atomate2.yaml"
            export JOBFLOW_CONFIG_FILE="/home/YOUR_USERNAME/fw_config/atomate2/config/jobflow.yaml"
            export FW_CONFIG_FILE="/home/YOUR_USERNAME/fw_config/FW_config.yaml"
            # load conda environment below if needed
            source activate cms
            module load vasp/cpu_6.4.2-intel
            module load gcc
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1
            export KMP_AFFINITY=balanced

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /home/USERNAME/fw_config/my_fworker.yaml rapidfire --max_loops 5
nodes: 1
walltime: '24:00:00'
job_name: vasp_launcher
signal: SIGINT@60
queue: standard
account: drxkp
pre_rocket: |
                conda activate at2
                module load vasp/6.4.2
                module load craype-x86-spr  #specifies sapphire rapids architecture
                export OMP_NUM_THREADS=1
                export KMP_AFFINITY=balanced # for codes built with intel compilers

                ulimit -Ss unlimited #specific to resolve for SIGSEGV error

Available queues, partitions, and qos descriptions can be found at the following links:

Note: specifying singleshot in the queue adapter will limit each reserved job to running only one firework (even if other fireworks are ready and could run with your remaining wall time). This can be changed to rapidfire but this may result in lost runs (fireworks that do not complete because they run out of wall time).

Configure Bash profile:

Append the following lines to the .bashrc or .bashrc.ext file, which is located in your home directory, e.g. /global/homes/s/sivonxay:

module load python
export PMG_VASP_PSP_DIR='/global/cfs/cdirs/matgen/POTCARs'
export FW_CONFIG_FILE='/global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

module load python/3.8
export PMG_VASP_PSP_DIR=/clusterfs/cloudcuckoo/software/POTCARs
export FW_CONFIG_FILE='/global/home/users/YOUR_USERNAME/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

module load python/3.9.12
export PMG_VASP_PSP_DIR=/clusterfs/mp/software/POTCARs
export FW_CONFIG_FILE='/global/home/users/YOUR_USERNAME/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

export FW_CONFIG_FILE='/HOME_DIRECTORY_PATH/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

module load python
export PMG_VASP_PSP_DIR='/global/cfs/cdirs/matgen/POTCARs'
export FW_CONFIG_FILE='/global/homes/FIRST_LETTER/YOUR_USERNAME/fw_config/FW_config.yaml'
alias cdconfig='cd ~/fw_config'

Note: the line alias cdconfig='cd ~/fw_config' is optional but is recommended so you can more easily locate the directory with your configuration files stored by typing cdconfig into the command line.

Setup API key and Add POTCAR Directory to pymatgen

Go to materialsproject.org and get an API key. Make sure you are using the environment setup above to run these commands in terminal with your cms python environment activated.

pmg config --add PMG_MAPI_KEY <USER_API_KEY>
pmg config --add PMG_VASP_PSP_DIR /global/cfs/cdirs/matgen/POTCARs

pmg config --add PMG_MAPI_KEY <USER_API_KEY>
pmg config --add PMG_VASP_PSP_DIR /clusterfs/cloudcuckoo/POTCARs

pmg config --add PMG_MAPI_KEY <USER_API_KEY>
pmg config --add PMG_VASP_PSP_DIR /clusterfs/mp/software/POTCARs

Running these commands will create a .pmgrc.yaml file (if it doesn’t exist already) containing these configuration settings in your home directory

Using Fireworks and Atomate:

The Materials Project Workshop has some great sessions introducing how to use fireworks and atomate. It might be helpful to watch the youtube recordings before trying out launching your own calculations!

2019 MP Workshop: Atomate Basics

2021 MP Workshop: Automating DFT

Initializing Fireworks Database:

First, let's confirm that you have your fireworks configuration set-up correctly to point to the right MongoDB database for your launchpad. Check your default fireworks configuration file with echo.

echo $FW_CONFIG_FILE

Copy the file path printed out and check the contents of the specified FW_config.yaml file to find your FW_CONFIG_DIR using cat.

cat /HOME_DIRECTORY_PATH/fw_config/FW_config.yaml

You should see something like this printed out. Copy the file path specified by CONFIG_FILE_DIR.

CONFIG_FILE_DIR: /HOME_DIRECTORY_PATH/fw_config
QUEUE_UPDATE_INTERVAL: 5
ECHO_TEST: "cms fw_config activated"

Print out the contents of the my_launchpad.yaml file in your CONFIG_FILE_DIR using cat.

cat YOUR_FW_CONFIG_FILE_PATH/my_launchpad.yaml

If you do not see something that resembles your my_launchpad.yaml file written earlier, there may be something wrong with your fireworks configuration that needs to be resolved before moving forward.

BE EXTREMELY CAREFUL WHEN RUNNING THIS RESET COMMAND. It will wipe all existing entries in your fireworks database in the fireworks, workflows, and launches collections. You must activate your python environment where fireworks is installed before running this command in terminal.

lpad reset

Using Atomate to Create a Workflow

The following code blocks are python code that should be run in jupyter or via python from the terminal. Make sure you have the environment setup above activated. The computer running the code should have access to mongodb07.NERSC.gov; this can be disregarded when running directly on NERSC or when connected to to the LBNL intranet. For computers outside of LBNL, a VPN will need to be used.

Run the following python code in a jupyter notebook or by creating a file named make_workflow.py and running the command python make_workflow.py in terminal:

from pymatgen.ext.matproj import MPRester
from fireworks import LaunchPad
from atomate.vasp.workflows.presets.core import wf_static
from atomate.vasp import powerups

mpr = MPRester()

structure = mpr.get_structure_by_material_id("mp-145")
wf = wf_static(structure)
wf = powerups.add_common_powerups(wf, {"scratch_dir":">>scratch_dir<<"})

lp = LaunchPad.from_file('YOUR_FW_CONFIG_FILE_PATH/fw_config/my_lp.yaml')
lp.add_wf(wf)

This adds a static energy calculation for Hexagonal Lu to your job database.

To check if this workflow has been added to your launch pad, run this command in terminal:

lpad get_fws -s READY

You can also check if this workflow has been added successfully by looking at your MongoDB fireworks collection or the launchpad webgui.

Launching a Workflow

Use qlaunch command to reserve jobs with SLURM’s scheduler. qlaunch has 3 modes: singleshot, rapidfire, and multi. Singleshot is used to launch one job, rapidfire is used to launch multiple jobs in quick succession, and multi creates one job with multiple fireworks runs. You’ll probably want to use rapidfire (where it is important to add the --nlaunches flag to specify how many fireworks to run). Here is example code for launching one job from the command line.

Note: Make sure you are in the directory you want your calculations to be run from before you run these commands, you may want to create a new folder for this. Do not run these commands from your home directory.

conda activate cms
echo $FW_CONFIG_FILE
qlaunch singleshot

conda activate cms
echo $FW_CONFIG_FILE
qlaunch rapidfire --nlaunches 1

Monitoring Your Launchpad (Fireworks Database)

Terminal Commands

Status of running fireworks can be determined by using the command:

lpad get_fws -s RUNNING

If a firework has failed or fizzled, you can rerun it by returning its state to ready to be launched again.

lpad rerun_fws -i <fw_id>

lpad rerun_fws -s FIZZLED

Launchpad Web GUI

Many prefer to view their jobs in the web GUI using the following command on their local computer in terminal with your python environment activated. Note this requires having your FW_CONFIG_FILE variable set-up to point correctly to your MongoDB database through the my_launchpad.yaml file. Otherwise you must run this command in a folder contain your my_launchpad.yaml file.

lpad webgui

If you have not yet made a local copy your my_launchpad.yaml file, you can download it.

scp <username>@dtn01.NERSC.gov:~/fw_config/my_launchpad.yaml .

Viewing Database and Outputs:

Download and install Robo 3T from the following link

Setup connection using mongodb credentials

For security reasons, if you want to connect to the database, you will either:

Have to connect via the NERSC network, e.g. enter your details in the "SSH" tab and connect via a data transfer node such as dtn01.nersc.gov.
Have to connect via the LBL network, e.g. by being on-site or using the LBL VPN. In this case you will have to add -ext to your database addressed, for example mongodb03-ext.nersc.gov

Special thanks to the original authors of this page: Eric Sivonxay, Julian Self, Ann Rutt, and Oxana Andriuc

Fireworks: Advanced Usage

This page builds upon the Fireworks + Atomate Setup page to provide practical tips and advice for more advanced use cases which may be helpful once your start running a larger number of calculations.

Note: atomate2 is a rising code that is still in development. Generally these research codes are under continuous improvement, so if something isn't working here please ask! It is likely the documentation needs to be updated.

Running VASP Best Practices:

For more information on best practices for running VASP on multiple nodes (i.e. how to set vasp_cmd in my_fworker.yaml based on the number of nodes requested in my_qadapter) see the NERSC vasp training.

Choose the appropriate number of nodes, processes, and cores/process:

Follow this flowchart to decide how many nodes to request, the number of parallel processes to use, and the number of cores to use for each parallel process. The rule of thumb to remember is that you should use all of the cores for every node that you request. For example, if there are 256 cores per node, you request N nodes, n processes, and c cores/process, then the following relationship must hold:

256N = nc

Specify $N$ in my_qadapter.yaml and specify $n$ and $c$ in my_fworker.yaml within your configuration folder.

Multiple Fireworks Configurations:

In research you may need the ability to run different types of calculations where you will want to easily switch between various set-ups. This section builds upon the previous content to extend it for setting-up running fireworks with multiple configurations.

For demonstrations purposes, here we will illustrate how to set-up two fireworks configurations for switching between running calculations with 1 node vs. 4 nodes. We will make a few other assumptions based on common best practices:

You have applied tags ("1N" or "4N") to specify which fireworks should be run with the 1N vs. 4N firework configuration
You have created separate python environments (cms_1N and cms_4N) for running each type of calculation

These basic steps can be adapted for making many more different custom fireworks configurations for every type of calculation you need!

Fireworks Config Directory

Based on the Getting Started Fireworks Configuration section you should have a directory for your fireworks configuration files named fw_config in your home directory.

The db.json file in your fw_config directory set-up in the previous section can be used as is assuming you want all your results to be saved in the same database for your various fireworks configurations.

We will need to create a directory for each firework configuration with the specific files that will outline the settings for that configuration (e.g. FW_config.yaml, my_fworker.yaml, my_launchpad.yaml, my_qadapter.yaml). We will have a directory named 1N for the single node configuration and another directory named 4N for the four node configuration. So the fw_config directory in your home directory should contain the following:

db.json (file)
1N (directory)
4N (directory)

Now within the 1N and 4N directories you will need to create the following firework configuration files with the specific settings you want for each fireworks configuration. Note given the variations between different supercomputers, specifics such as file paths, VASP commands, etc. have been omitted here and the format for NERSC has been used. If needed, see the Getting Started Fireworks Configuration section for examples of appropriate firework configuration files (FW_config.yaml, my_fworker.yaml, my_launchpad.yaml, my_qadapter.yaml) for each specific supercomputer system.

1N Configuration Files

CONFIG_FILE_DIR: /<home_directory_path>/fw_config/1N/
ECHO_TEST: "1N fw_config activated!"

name: 1N_fworker
category: ''
query: '{"spec.tags":{"$all":["1N"]}}'
env:
    db_file: /<home_directory_path>/fw_config/db.json
    vasp_cmd: '<appropriate VASP command>'
    gamma_vasp_cmd: '<appropriate VASP command>'
    scratch_dir: /<scratch_directory_path>/
    incar_update:

host: mongodb07.NERSC.gov
port: 27017
name: 'fw_db_name'
username: '<admin_username>'
password: ‘<admin_password>’
logdir: null
Istrm_lvl: DEBUG
user_indices: []
wf_user_indices: []

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -c /<home_directory_path>/fw_config/1N/ rapidfire --nlaunches 1 --max_loops 3
nodes: 1
walltime: '24:00:00'
account: matgen
job_name: 1N
signal: SIGINT@60
qos: regular
constraint: 'knl'
pre_rocket: |
            source activate cms_1N
            module load vasp-tpc/5.4.4-knl
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1

4N Configuration Files

CONFIG_FILE_DIR: /<home_directory_path>/fw_config/4N/
ECHO_TEST: "4N fw_config activated!"

name: 4N_fworker
category: ''
query: '{"spec.tags":{"$all":["4N"]}}'
env:
    db_file: /<home_directory_path>/fw_config/db.json
    vasp_cmd: '<appropriate VASP command>'
    gamma_vasp_cmd: '<appropriate VASP command>'
    scratch_dir: /<scratch_directory_path>/
    incar_update:

host: mongodb07.NERSC.gov
port: 27017
name: 'fw_db_name'
username: '<admin_username>'
password: ‘<admin_password>’
logdir: null
Istrm_lvl: DEBUG
user_indices: []
wf_user_indices: []

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -c /<home_directory_path>/fw_config/4N/ rapidfire --nlaunches 1 --max_loops 3
nodes: 4
walltime: '48:00:00'
account: matgen
job_name: 1N
signal: SIGINT@60
qos: regular
constraint: 'knl'
pre_rocket: |
            source activate cms_4N
            module load vasp-tpc/5.4.4-knl
            export OMP_PROC_BIND=true
            export OMP_PLACES=threads
            export OMP_NUM_THREADS=1

Switching Between Firework Configurations

Now that you have your firework configuration files set-up, we need the ability to specify which firework configuration we want to use upon launching queue jobs. This is done based on the value of the bash environment variable FW_CONFIG_FILE. You can print out what is the value of this variable by typing the following command...

echo $FW_CONFIG_FILE

Note that in the Getting Started Fireworks Configuration instructions, you were instructed to modify your .bashrc or .bashrc.ext file located in your home directory to automatically activate a fireworks configuration by default. Here you should modify that file so you can type a command to activate a specific fireworks configuration such as 1N_fw_config or 4N_fw_config by adding the following lines:

alias 1N_fw_config='export FW_CONFIG_FILE="/<home_directory_path>/fw_config/1N/FW_config.yaml"'
alias 4N_fw_config='export FW_CONFIG_FILE="/<home_directory_path>/fw_config/4N/FW_config.yaml"'

export FW_CONFIG_FILE='/<home_directory_path>/fw_config/FW_config.yaml'
alias 1N_fw_config='export FW_CONFIG_FILE="/<home_directory_path>/fw_config/1N/FW_config.yaml"'
alias 4N_fw_config='export FW_CONFIG_FILE="/<home_directory_path>/fw_config/4N/FW_config.yaml"'

Now you should be ready to use either firework configuration to launch jobs into the supercomputer queue! Here are some examples of what you might do...

conda activate cms_1N
1N_fw_config
echo $FW_CONFIG_FILE
qlaunch rapidfire --nlaunches 10

conda activate cms_4N
4N_fw_config
echo $FW_CONFIG_FILE
qlaunch rapidfire --nlaunches 2

Continuous Job Submission:

tmux

We recommend using a tmux or screen when submitting jobs to preserve queue submission when the main ssh session is terminated. This allows one to keep the queue saturated with jobs. For example, for tmux:

module load tmux
tmux new -s background_launcher
source active cms
mkdir FireworksTest
cd FireworksTest

To exit the tmux session, press the “control” and “b” keys together, followed by “d”.

To re-enter the tmux session:

tmux attach -t background_launcher

For more tmux commands, see the tmux cheatsheet.

qlaunch commands

Use qlaunch command to reserve jobs with SLURM’s scheduler. Qlaunch has 3 modes; singleshot, rapidfire, and multi. You will want to use rapidfire. Some useful flags to set are:

-m to specify maximum # of jobs in queue at any given time
--nlaunches to specify how many fireworks to run.

qlaunch rapidfire --nlaunches infinite -m 50

qlaunch -fm rapidfire --nlaunches infinite -m 50

Tips for High-Throughput Computing

Fireworks Left in Running State

When you are running many fireworks in rapidfire mode, jobs may hang and not complete. This is referred to as "lost runs" in fireworks. To check for lost runs, use the command:

lpad detect_lostruns

To rerun these jobs, add the --rerun flag to the command.

lpad detect_lostruns --rerun

Fizzled Fireworks

You can easily reset all fizzled fireworks back to the ready state to be run again with this command in terminal.

lpad rerun_fws -s FIZZLED

When you are running many fireworks, sometimes a calculation that fizzled the first time will successfully complete if you just rerun it. You can check the error message for why a firework fizzled through the MongoDB launches collection through the following path in the document:

action -> stored_data -> _exception -> _stacktrace

Checking the error can help you decided whether simply rerunning the firework may work.

Job Packing

If you have hundreds or thousands of calculations to run, you may be interested in "job packing" to take advantage of the big job discount offered by NERSC or improve your queue wait times by reducing the number of jobs you submit to slurm (e.g. 1x 1024 node job will be scheduled faster than 1024x single node jobs). The group has python scripts that will automatically distribute running fireworks when multiple nodes are requested for a job. Consult Jason Munro for access to these github job packing scripts and an introduction if you are interested in setting up job packing for running calculations.