Fireworks: Advanced Usage

This page builds upon the Fireworks + Atomate Setup page to provide practical tips and advice for more advanced use cases which may be helpful once your start running a larger number of calculations.

Note: atomate2 is a rising code that is still in development. Generally these research codes are under continuous improvement, so if something isn't working here please ask! It is likely the documentation needs to be updated.

Running VASP Best Practices:

For more information on best practices for running VASP on multiple nodes (i.e. how to set vasp_cmd in my_fworker.yaml based on the number of nodes requested in my_qadapter) see the NERSC vasp training.

Choose the appropriate number of nodes, processes, and cores/process:

Follow this flowchart to decide how many nodes to request, the number of parallel processes to use, and the number of cores to use for each parallel process. The rule of thumb to remember is that you should use all of the cores for every node that you request. For example, if there are 256 cores per node, you request N nodes, n processes, and c cores/process, then the following relationship must hold:

256N=nc256N = nc

Specify NNin my_qadapter.yaml and specify nn and cc in my_fworker.yaml within your configuration folder.

Multiple Fireworks Configurations:

In research you may need the ability to run different types of calculations where you will want to easily switch between various set-ups. This section builds upon the previous content to extend it for setting-up running fireworks with multiple configurations.

For demonstrations purposes, here we will illustrate how to set-up two fireworks configurations for switching between running calculations with 1 node vs. 4 nodes. We will make a few other assumptions based on common best practices:

  • You have applied tags ("1N" or "4N") to specify which fireworks should be run with the 1N vs. 4N firework configuration

  • You have created separate python environments (cms_1N and cms_4N) for running each type of calculation

These basic steps can be adapted for making many more different custom fireworks configurations for every type of calculation you need!

Fireworks Config Directory

Based on the Getting Started Fireworks Configuration section you should have a directory for your fireworks configuration files named fw_config in your home directory.

The db.json file in your fw_config directory set-up in the previous section can be used as is assuming you want all your results to be saved in the same database for your various fireworks configurations.

We will need to create a directory for each firework configuration with the specific files that will outline the settings for that configuration (e.g. FW_config.yaml, my_fworker.yaml, my_launchpad.yaml, my_qadapter.yaml). We will have a directory named 1N for the single node configuration and another directory named 4N for the four node configuration. So the fw_config directory in your home directory should contain the following:

  • db.json (file)

  • 1N (directory)

  • 4N (directory)

Now within the 1N and 4N directories you will need to create the following firework configuration files with the specific settings you want for each fireworks configuration. Note given the variations between different supercomputers, specifics such as file paths, VASP commands, etc. have been omitted here and the format for NERSC has been used. If needed, see the Getting Started Fireworks Configuration section for examples of appropriate firework configuration files (FW_config.yaml, my_fworker.yaml, my_launchpad.yaml, my_qadapter.yaml) for each specific supercomputer system.

1N Configuration Files

CONFIG_FILE_DIR: /<home_directory_path>/fw_config/1N/
ECHO_TEST: "1N fw_config activated!"

4N Configuration Files

CONFIG_FILE_DIR: /<home_directory_path>/fw_config/4N/
ECHO_TEST: "4N fw_config activated!"

Switching Between Firework Configurations

Now that you have your firework configuration files set-up, we need the ability to specify which firework configuration we want to use upon launching queue jobs. This is done based on the value of the bash environment variable FW_CONFIG_FILE. You can print out what is the value of this variable by typing the following command...

echo $FW_CONFIG_FILE

Note that in the Getting Started Fireworks Configuration instructions, you were instructed to modify your .bashrc or .bashrc.ext file located in your home directory to automatically activate a fireworks configuration by default. Here you should modify that file so you can type a command to activate a specific fireworks configuration such as 1N_fw_config or 4N_fw_config by adding the following lines:

alias 1N_fw_config='export FW_CONFIG_FILE="/<home_directory_path>/fw_config/1N/FW_config.yaml"'
alias 4N_fw_config='export FW_CONFIG_FILE="/<home_directory_path>/fw_config/4N/FW_config.yaml"'

Now you should be ready to use either firework configuration to launch jobs into the supercomputer queue! Here are some examples of what you might do...

conda activate cms_1N
1N_fw_config
echo $FW_CONFIG_FILE
qlaunch rapidfire --nlaunches 10

Continuous Job Submission:

tmux

We recommend using a tmux or screen when submitting jobs to preserve queue submission when the main ssh session is terminated. This allows one to keep the queue saturated with jobs. For example, for tmux:

module load tmux
tmux new -s background_launcher
source active cms
mkdir FireworksTest
cd FireworksTest

To exit the tmux session, press the “control” and “b” keys together, followed by “d”.

To re-enter the tmux session:

tmux attach -t background_launcher

For more tmux commands, see the tmux cheatsheet.

qlaunch commands

Use qlaunch command to reserve jobs with SLURM’s scheduler. Qlaunch has 3 modes; singleshot, rapidfire, and multi. You will want to use rapidfire. Some useful flags to set are:

  • -m to specify maximum # of jobs in queue at any given time

  • --nlaunches to specify how many fireworks to run.

qlaunch rapidfire --nlaunches infinite -m 50

Tips for High-Throughput Computing

Fireworks Left in Running State

When you are running many fireworks in rapidfire mode, jobs may hang and not complete. This is referred to as "lost runs" in fireworks. To check for lost runs, use the command:

lpad detect_lostruns

To rerun these jobs, add the --rerun flag to the command.

lpad detect_lostruns --rerun

Fizzled Fireworks

You can easily reset all fizzled fireworks back to the ready state to be run again with this command in terminal.

lpad rerun_fws -s FIZZLED

When you are running many fireworks, sometimes a calculation that fizzled the first time will successfully complete if you just rerun it. You can check the error message for why a firework fizzled through the MongoDB launches collection through the following path in the document:

action -> stored_data -> _exception -> _stacktrace

Checking the error can help you decided whether simply rerunning the firework may work.

Job Packing

If you have hundreds or thousands of calculations to run, you may be interested in "job packing" to take advantage of the big job discount offered by NERSC or improve your queue wait times by reducing the number of jobs you submit to slurm (e.g. 1x 1024 node job will be scheduled faster than 1024x single node jobs). The group has python scripts that will automatically distribute running fireworks when multiple nodes are requested for a job. Consult Jason Munro for access to these github job packing scripts and an introduction if you are interested in setting up job packing for running calculations.

Last updated