4. Substeps

The essence of the workflow is to break up large experiments into several sub-steps, for a more robust and easier handling

First, the experiment is split into several much shorter simulations, that e.g. a multi-decade climate experiment typically consists of several simulations, each covering a period of one month.
Second, the workflow further split each simulation into sub-steps, namely the pre-processing, the actual simulation, the post-processing, and the finishing, to increase the performance and to modularise the workflow as much as possible.

4.1. Pre-processing

The pre-processing (prepro) sub-step is intended to encompass all tasks that need to be performed prior to the actual simulation and therefore usually includes the processing of the forcing data, which is highly individual. An example of this could be the resampling of the raw forcing data to the computational grid of the component model used.

4.2. Simulation

The simulation sub-step is the core step of the workflow. Scripts in this step set up the run directory (rundir), run the simulation, clean up by moving the simulation results to the simulation results directory (simres), and log the exact workflow used to enable reproduction.

In detail, an identifiable, individual run directory is created for each simulation as a subdirectory of rundir/ to allow the user to run multiple simulations in parallel. Usually this subdirectory is named after the current simulation date, so if the monthly simulation for January in 1950 is run, the directory rundir/19500101/ is created.
Then all the necessary files are copied to the run directory, such as the model executables, namelists, auxiliary files, restart files, and (some) static files. In turn, the raw model output from the model components is also dumped into the run directory.
The namlists are adapted according to the current simulation, by replacing predefined flags with the correct values. This could be the correct simulation date, for example, which varies from simulation to simulation, and the workflow takes care to set the correct values here.
Once everything is set up, the simulation is started.
After the simulation has finished, the raw model output is moved from the run directory to the simulation results directory (simres). Again, a subdirectory is created with the same name as the run directory. This is done to keep only the simulation results and to avoid storing e.g. big, redundant static files. Next to the simulation results, some log files are created to keep track of the exact workflow used, by logging the repository used, the commit, and a git diff of unstaged files. Further restart files are copied to a dedicated restart directory to allow the next simulation to start correctly.
Finally, the run directory is removed, as it is no longer needed, to keep the workflow directory structure clean.

4.3. Post-processing

The post-processing (postpro) step is used to provide higher level products than the raw model data. Examples include calculating variables that are not a direct output of the model (e.g. discharge for ParFlow), aggregating model output for defined time periods (e.g. writing monthly files), adding experiment specific metadata (e.g. CF-convention), storing output in a specific data structure (e.g. CMORized), or generating some quick insitu quality check plots for monitoring. Some of these tasks are quite mandatory, such as adding specific metadata to keep track of what is stored with the files and to make the data easily available to others. Other tasks are performed to save time. Because the post-processing step is performed immediately after the simulation and is a separate slurm job, time-consuming calculations could be performed here without losing performance of the entire experiment, compared to running simulation and post-processing in one job, or even running the post-processing after the entire experiment.

4.4. Finishing

The finishing substep is somewhat optional. In principle, all substeps are self-contained and will, for example, clean up temporarily generated data at the end of the specific substep. However, sometimes it is useful to execute some tasks after all other substeps. In this workflow those tasks are the checksum calculation and the compression.
So when all the previous tasks have been completed, the checksum of the generated data is calculated and is stored in a file within the same directory. This is done to allow the user to verify that the data is not corrupted. And this task is performed after all other substeps, simply for performance reasons, e.g. so that the next simulation can start, without waiting for the checksum to be calculated.
The compressing task is simply to compress the data in simres/, as these are no longer needed (post-processing is done) to prepare for archiving.