qlaunch rapidfire --nlaunches infinite -m 50
This page describes the setup procedure for using fireworks and atomate on high performance computing systems. These instructions are intended for first time users.
Please see the following subsections on fireworks/atomate setup and usage:
This page describes how to set up fireworks and atomate.
This page is intended to help you get set-up the first time you are using fireworks and atomate so you can learn how these softwares work. This set-up is not ideal for most research applications. Please refer to Fireworks: Advanced Usage if you are looking for practical advice and tips on how to run calculations once you understand the basics of fireworks.
Note: These instructions are based on the original version of atomate. atomate2 is a rising code that is still in development. Generally these research codes are under continuous improvement, so if something isn't working here please ask! It is likely the documentation needs to be updated.
Create the conda environment:
Note: cms is an abbreviation for "computational materials science," but feel free to pick your own environment name!
Activate the environment and install the base libraries:
To install atomate2 on the environment, use one of the following:
(For Most People) If you are just using atomate and fireworks, with no plans to develop code/ workflows, use the following:
For developers:
Note: You will want to follow the steps in this section on both your local computer and the supercomputer where you will be running your calculations (e.g. making a fw_config folder in your home directory that contains the 3 files: FW_config.yaml, db.json, and my_launchpad.yaml) so you can easily interface with fireworks from both your local computer and the supercomputer.
Make directory in your home directory called fw_config:
Now create 3 files: FW_config.yaml, db.json, and my_launchpad.yaml with the following contents. Make sure you replace YOUR_USERNAME with your username; on NERSC you will also need to find your home folder, which is typically under another higher folder with the name of the FIRST LETTER of your username. Note: you can view your filesystem online. Also, consider setting up access to your NERSC filesystem locally on your computer: Check out "Mounting NERSC's file system locally" here:#running-jobs-on-nersc
Available queues, partitions, and qos descriptions can be found at the following links:
Note: specifying singleshot in the queue adapter will limit each reserved job to running only one firework (even if other fireworks are ready and could run with your remaining wall time). This can be changed to rapidfire but this may result in lost runs (fireworks that do not complete because they run out of wall time).
Append the following lines to the .bashrc or .bashrc.ext file, which is located in your home directory, e.g. /global/homes/s/sivonxay:
Note: the line alias cdconfig='cd ~/fw_config'
is optional but is recommended so you can more easily locate the directory with your configuration files stored by typing cdconfig into the command line.
Go to materialsproject.org and get an API key. Make sure you are using the environment setup above to run these commands in terminal with your cms python environment activated.
Running these commands will create a .pmgrc.yaml file (if it doesn’t exist already) containing these configuration settings in your home directory
The Materials Project Workshop has some great sessions introducing how to use fireworks and atomate. It might be helpful to watch the youtube recordings before trying out launching your own calculations!
2019 MP Workshop: Atomate Basics
2021 MP Workshop: Automating DFT
First, let's confirm that you have your fireworks configuration set-up correctly to point to the right MongoDB database for your launchpad. Check your default fireworks configuration file with echo.
Copy the file path printed out and check the contents of the specified FW_config.yaml file to find your FW_CONFIG_DIR
using cat.
You should see something like this printed out. Copy the file path specified by CONFIG_FILE_DIR
.
CONFIG_FILE_DIR: /HOME_DIRECTORY_PATH/fw_config
QUEUE_UPDATE_INTERVAL: 5
ECHO_TEST: "cms fw_config activated"
Print out the contents of the my_launchpad.yaml
file in your CONFIG_FILE_DIR
using cat.
If you do not see something that resembles your my_launchpad.yaml
file written earlier, there may be something wrong with your fireworks configuration that needs to be resolved before moving forward.
BE EXTREMELY CAREFUL WHEN RUNNING THIS RESET COMMAND. It will wipe all existing entries in your fireworks database in the fireworks, workflows, and launches collections. You must activate your python environment where fireworks is installed before running this command in terminal.
The following code blocks are python code that should be run in jupyter or via python from the terminal. Make sure you have the environment setup above activated. The computer running the code should have access to mongodb07.NERSC.gov; this can be disregarded when running directly on NERSC or when connected to to the LBNL intranet. For computers outside of LBNL, a VPN will need to be used.
Run the following python code in a jupyter notebook or by creating a file named make_workflow.py
and running the command python make_workflow.py
in terminal:
This adds a static energy calculation for Hexagonal Lu to your job database.
To check if this workflow has been added to your launch pad, run this command in terminal:
You can also check if this workflow has been added successfully by looking at your MongoDB fireworks
collection or the launchpad webgui.
Use qlaunch command to reserve jobs with SLURM’s scheduler. qlaunch has 3 modes: singleshot, rapidfire, and multi. Singleshot is used to launch one job, rapidfire is used to launch multiple jobs in quick succession, and multi creates one job with multiple fireworks runs. You’ll probably want to use rapidfire (where it is important to add the --nlaunches flag to specify how many fireworks to run). Here is example code for launching one job from the command line.
Note: Make sure you are in the directory you want your calculations to be run from before you run these commands, you may want to create a new folder for this. Do not run these commands from your home directory.
Status of running fireworks can be determined by using the command:
If a firework has failed or fizzled, you can rerun it by returning its state to ready to be launched again.
Many prefer to view their jobs in the web GUI using the following command on their local computer in terminal with your python environment activated. Note this requires having your FW_CONFIG_FILE
variable set-up to point correctly to your MongoDB database through the my_launchpad.yaml
file. Otherwise you must run this command in a folder contain your my_launchpad.yaml
file.
If you have not yet made a local copy your my_launchpad.yaml
file, you can download it.
Download and install Robo 3T from the following link
For security reasons, if you want to connect to the database, you will either:
Have to connect via the NERSC network, e.g. enter your details in the "SSH" tab and connect via a data transfer node such as dtn01.nersc.gov.
Have to connect via the LBL network, e.g. by being on-site or using the LBL VPN. In this case you will have to add -ext
to your database addressed, for example mongodb03-ext.nersc.gov
Special thanks to the original authors of this page: Eric Sivonxay, Julian Self, Ann Rutt, and Oxana Andriuc
This page builds upon the Fireworks + Atomate Setup page to provide practical tips and advice for more advanced use cases which may be helpful once your start running a larger number of calculations.
Note: atomate2 is a rising code that is still in development. Generally these research codes are under continuous improvement, so if something isn't working here please ask! It is likely the documentation needs to be updated.
For more information on best practices for running VASP on multiple nodes (i.e. how to set vasp_cmd in my_fworker.yaml based on the number of nodes requested in my_qadapter) see the NERSC vasp training.
Follow this flowchart to decide how many nodes to request, the number of parallel processes to use, and the number of cores to use for each parallel process. The rule of thumb to remember is that you should use all of the cores for every node that you request. For example, if there are 256 cores per node, you request N nodes, n processes, and c cores/process, then the following relationship must hold:
Specify in my_qadapter.yaml and specify and in my_fworker.yaml within your configuration folder.
In research you may need the ability to run different types of calculations where you will want to easily switch between various set-ups. This section builds upon the previous content to extend it for setting-up running fireworks with multiple configurations.
For demonstrations purposes, here we will illustrate how to set-up two fireworks configurations for switching between running calculations with 1 node vs. 4 nodes. We will make a few other assumptions based on common best practices:
You have applied tags ("1N"
or "4N"
) to specify which fireworks should be run with the 1N vs. 4N firework configuration
You have created separate python environments (cms_1N
and cms_4N
) for running each type of calculation
These basic steps can be adapted for making many more different custom fireworks configurations for every type of calculation you need!
Based on the Getting Started Fireworks Configuration section you should have a directory for your fireworks configuration files named fw_config
in your home directory.
The db.json
file in your fw_config
directory set-up in the previous section can be used as is assuming you want all your results to be saved in the same database for your various fireworks configurations.
We will need to create a directory for each firework configuration with the specific files that will outline the settings for that configuration (e.g. FW_config.yaml
, my_fworker.yaml
, my_launchpad.yaml
, my_qadapter.yaml
). We will have a directory named 1N
for the single node configuration and another directory named 4N
for the four node configuration. So the fw_config
directory in your home directory should contain the following:
db.json (file)
1N (directory)
4N (directory)
Now within the 1N
and 4N
directories you will need to create the following firework configuration files with the specific settings you want for each fireworks configuration. Note given the variations between different supercomputers, specifics such as file paths, VASP commands, etc. have been omitted here and the format for NERSC has been used. If needed, see the Getting Started Fireworks Configuration section for examples of appropriate firework configuration files (FW_config.yaml
, my_fworker.yaml
, my_launchpad.yaml
, my_qadapter.yaml
) for each specific supercomputer system.
Now that you have your firework configuration files set-up, we need the ability to specify which firework configuration we want to use upon launching queue jobs. This is done based on the value of the bash environment variable FW_CONFIG_FILE
. You can print out what is the value of this variable by typing the following command...
echo $FW_CONFIG_FILE
Note that in the Getting Started Fireworks Configuration instructions, you were instructed to modify your .bashrc or .bashrc.ext file located in your home directory to automatically activate a fireworks configuration by default. Here you should modify that file so you can type a command to activate a specific fireworks configuration such as 1N_fw_config
or 4N_fw_config
by adding the following lines:
Now you should be ready to use either firework configuration to launch jobs into the supercomputer queue! Here are some examples of what you might do...
We recommend using a tmux or screen when submitting jobs to preserve queue submission when the main ssh session is terminated. This allows one to keep the queue saturated with jobs. For example, for tmux:
To exit the tmux session, press the “control” and “b” keys together, followed by “d”.
To re-enter the tmux session:
For more tmux commands, see the tmux cheatsheet.
Use qlaunch command to reserve jobs with SLURM’s scheduler. Qlaunch has 3 modes; singleshot, rapidfire, and multi. You will want to use rapidfire. Some useful flags to set are:
-m
to specify maximum # of jobs in queue at any given time
--nlaunches
to specify how many fireworks to run.
When you are running many fireworks in rapidfire mode, jobs may hang and not complete. This is referred to as "lost runs" in fireworks. To check for lost runs, use the command:
To rerun these jobs, add the --rerun flag to the command.
You can easily reset all fizzled fireworks back to the ready state to be run again with this command in terminal.
When you are running many fireworks, sometimes a calculation that fizzled the first time will successfully complete if you just rerun it. You can check the error message for why a firework fizzled through the MongoDB launches
collection through the following path in the document:
action -> stored_data -> _exception -> _stacktrace
Checking the error can help you decided whether simply rerunning the firework may work.
If you have hundreds or thousands of calculations to run, you may be interested in "job packing" to take advantage of the big job discount offered by NERSC or improve your queue wait times by reducing the number of jobs you submit to slurm (e.g. 1x 1024 node job will be scheduled faster than 1024x single node jobs). The group has python scripts that will automatically distribute running fireworks when multiple nodes are requested for a job. Consult Jason Munro for access to these github job packing scripts and an introduction if you are interested in setting up job packing for running calculations.