3. Directory Structure
As trivial and minor as a proper directory structure may sound, it is very
important.
As the name suggests, a directory structure structures the workflow. Directory
names do dictate where to find certain files, where to store simulation results,
forcing data, etc. This helps the maintainer to develop the source code, and
also helps the user to get started and become familiar with the workflow.
The directory structure of this workflow consists of a single layer only and contains the following directories:
TSMP_WorkflowStarter/
|
|---- ctrl
| |---- namelist
| |---- env
|---- forcing
|---- geo
|---- monitoring
|---- postpro
|---- rundir
|---- simres
|---- src
There may be other subdirectories, but those are not part of the mentioned directory structure and may vary from setup to setup.
All of the directories are named very strictly according to what they are aimed for. With some experience of the workflow, these names will become very intuitive. Each directory is described in detail below.
3.1. ctrl/
ctrl/
(control) contains all the scripts needed to control the
workflow, as well as scripts written specifically for this workflow, such as
post-processing scripts.
3.1.1. ctrl/namelist/
ctrl/namelist/
, clearly contains namlists for the individual component
models used within the workflow. As these namelists do control the model
behaviour and are specific to the workflow, this is a subdirectory of ctrl/
.
3.1.2. ctrl/env/
ctrl/env/
contains the enviroment files used. Since each simulation
depends on a specific set of programs (e.g. python) and libraries (e.g. netCDF)
in a specific version, we need to provide this information to the user of the
workflow. Environment files does list these dependencies and ensure that the
required environment is set up correctly.
3.2. forcing/
forcing/
is a directory containing any forcing files needed. This could
be an atm. forcing dataset driving the land surface model CLM or lateral
boundary conditions needed by the atm. Model COSMO.
3.3. geo/
geo/
contains files required by the component models that define the model
domain. This could be topographic data, land cover data, soil properties, grids
defining the spatial extent of the domain and many more. Often this data is
referred to as static files
, but as some of the required data sets, such as
the land cover, may change over time, static
could be misleading, hence the
name geo/
.
3.4. monitoring/
monitoring/
contains the output of some monitoring functions.
Monitoring of simulations is a crucial aspect to ensure accurate and reliable
results. Various factors can impact the simulation outcomes, ranging from
simulation interruptions and crashes to subtle corruptions of the results.
Manually reviewing simulation results periodically can be extremely
time-consuming, especially considering the large size of simulation outputs and
the potentially lengthy duration of simulations, which may run for several
months. To address this challenge, a monitoring functionality has been
incorporated into this workflow.
The monitoring functionality automatically generates summary plots at regular
intervals, providing a concise overview of the simulation progress. A detailed
description of the implementation could be found within the
Simulation Monitoring section.
Those monitoring plots are stored in the monitoring/
directory, allowing users
to conveniently monitor
the simulation directly by browsing through them. It is also conceivable that
one could upload these plots to a web server, enabling even more accessible
monitoring of the simulation. Scripts providing this functionality have been
intentionally designed with simplicity and robustness in mind. While they may
not generate publication-ready plots, they serve the purpose of providing
essential information about the simulation. Users should bring a basic
understanding of the simulation results to effectively utilize these scripts.
3.5. postpro/
postpro/
simply contains the post-processed simulation results. The
post-processing step is thereby very individual for each simulation and can vary
from simple aggregation of simulation results (to e.g. monthly files), to the
calculation of further diagnostics derived from the original simulation results.
3.6. rundir/
rundir/
is the directory in which the actual simulation runs. In order to
run a simulation, you need a directory where everything is put together, i.e.
static files, executables for individual component models, namelist, etc. for
that particular simulation. Most of the time the actual run directory is even a
subdirectory of rundir/
, automatically created by the workflow, allowing you
to run multiple simulations in parallel.
3.7. simres/
simres
simply contains the raw (not post processed) simulation
results. In addition, some log files are stored with each simulation
results, containing information about which workflow was used to generate those
simulation results. If the workflow is used correctly, this log file will
contain all the information needed to reproduce the simulation result.
3.8. src/
src/
contains source code used within the workflow. The most
prominent of these is the cloned and build TSMP,
but other external code is also placed here.
3.9. export_paths.sh
Not directly part of the directory structure, but an important aspect of why
this structure is used, is the export_path.sh
script located in ctrl/
. This
script is one of the core pieces of code in this workflow, and allows you to run
the workflow from any location, and even change the location during runtime.
export_paths.sh
is loaded at the beginning of each simulation and exports the
absolute paths to the main directories (the ones above) in environment
variables. Each script within this workflow in turn uses these environment
variables to refer to other directories and scripts. This avoids the problem of
using hard-coded paths, and gives the user full flexibility in where the
simulation is run.