6. Auxiliary Scripts
Sometimes you need to perform a task manually within the workflow. This may be
because the workflow cannot perform the task automatically, such as migrating
data to the archive, or for some other reason. As dealing with the data
generated within the workflow is always a task of dealing with large datasets,
this can be very time-consuming.
In many of the simulations that have been run using this workflow as a
template, some of these tasks have recurred and we have written scripts to
make them easier to perform. Below is a brief summary of the existing scripts
to provide an overview.
6.1. aux_MigrateFromScratch.sh
Since the workflow runs on compute nodes, it is not possible to automatically
migrate simulation results to the tape archive, because the archive can only
be accessed via login nodes at JSC.
So aux_MigrateFromScratch.sh
migrates provided directories to a given target.
Migrating in this context means:
Pack source directory in a tar-ball
Put tar-ball to target location
Delete source directory
Link new created tar-ball to source location
Usage:
# in the current shell
bash aux_MigrateFromScratch.sh PATH/TO/TARGET PATH/TO/SOURCES/WILDCARDS/ARE/POSSIBL*
# in the background
nohup bash aux_MigrateFromScratch.sh PATH/TO/TARGET PATH/TO/SOURCES/WILDCARDS/ARE/POSSIBL* &
6.2. aux_UnTarManyTars.sh
Complementary to migrating data to the archive, you can also extract data from
the archive back to $SCRATCH
again. As the migration stores the data in
tar-balls, you need to untar / unpack them.
So aux_UnTarManyTars.sh
unpacks the provided tar-balls to a given target
location.
Usage:
# in the current shell
bash aux_UnTarManyTars.sh PATH/TO/TARGET PATH/TO/TARBALLS/WILDCARDS/ARE/POSSIBL*
# in the background
nohup bash aux_UnTarManyTars.sh PATH/TO/TARGET PATH/TO/TARBALLS/WILDCARDS/ARE/POSSIBL* &
6.3. aux_restageTape.sh
If the data was moved to the archive a long time ago, the related tape file
may already be detached, physically unplugged, from the filesystem. This is
common with tape archives, and the related tape must first be reactivated, so
plugged back into the filesystem, in order to access the data. This process is
started automatically when a file on that related tape is requested, but it
can take some time. So if you need data that is no longer available on
spinning disk, use aux_restageTape.sh
to restage this data.
Usage:
nohup bash aux_restageTape.sh PATH/TO/DATA/WILDCARDS/ARE/POSSIBL* &
6.4. aux_gzip.sh and aux_gunzip.sh
You may need to compress or uncompress data within the workflow. An example
would be if you need to do some extra post-processing, but the data in
simres
is already compressed by the workflow. In this case you can use
gzip
or gunzip
, but usually it’s a lot of data to process, which will
take a long time.
So aux_gunzip.sh
and aux_gzip.sh
does provide an auxiliary script to
run gzip
and gunzip
on a computenode, using all available CPUs, thus
increasing compression speed drastically.
Usage:
# compressing
sbatch ./aux_gzip.s TARGET/FILES/WILDCARDS/ARE/POSSIBL*
# uncompressing
sbatch ./aux_gunzip.s TARGET/FILES/WILDCARDS/ARE/POSSIBL*
Even if the above two scripts are quiet easy, they can be very powerfull. For
example, can you imagine what the below call of aux_gunzip.sh
is doing?
sbatch aux_gunzip.sh ${BASE_ROOT}/simres/ProductionV1/*/{clm,cosmo,parflow}/{*/*,*}
6.5. aux_sha512sum.sh
You may need to (re)calculate the checksum for some or many of the data in
the workflow. Then aux_sha512sum.sh
simply allows you to run this calculation
on a computenode to speed things up.
Usage:
sbatch ./aux_sha512sum.sh TARGET/FILES/WILDCARDS/ARE/POSSIBL*