toolBox.py

toolBox - submodule of SLOTH

sloth.toolBox.calc_catchment(slopex, slopey, x, y)[source]

Calculate the catchment area associated with a given outlet pixel using x- and y-slope data.

This function implements a simple yet efficient algorithm to determine the catchment area by starting with a list of pixels that initially contains only the outlet pixel(s). It iteratively processes the pixels in the list, marking them as part of the catchment and identifying the surrounding pixels that drain into the current pixel and are not already included in the catchment. Foun pixels are added to the list and the algorithm continues until all pixels belonging to the catchment area are discovered.

Parameters:
  • slopex (ndarray) – 2D slopes in x-direction

  • slopey (ndarray) – 2D slopes in y-direction

  • x (int) – index in x-direction to calulate catchment from

  • y (int) – index in y-direction to calulate catchment from

Returns:

catchment – 2D ndarray of the same size as slopex/y. 0 = not part of catchment; 1 = part of catchment

Return type:

ndarray

sloth.toolBox.fill(data, invalid=None, transkargs={})[source]

Fill invalid data points with nearest neighbor interpolation.

This function replaces invalid data points in the input array (data) with the value of the nearest valid data point. Invalid data points are indicated by the invalid array or, if not provided, by NaN (Not a Number) values in data.

Parameters:
  • data (ndarray) – An array containing the data to be filled.

  • invalid (ndarray, optional) – A binary array of the same shape as data. Data values are replaced where invalid is True. If not provided, invalid data points are identified using np.isnan(data) (default).

  • transkargs (dict, optional) – Additional arguments to pass to the distance_transform_edt() function. Default is an empty dictionary.

Returns:

A filled ndarray, where invalid data points have been replaced by the value of the nearest valid data point.

Return type:

ndarray

Notes

This function uses nearest neighbor interpolation to fill invalid data points in the input array. It calculates the Euclidean distance transform of the invalid array to find the indices of the nearest valid data points. The original data array is then updated with the values from the nearest valid data points.

If the shapes of data and invalid are not equal, an error is raised, as filling is not possible.

See also: https://stackoverflow.com/questions/5551286/filling-gaps-in-a-numpy-array

Example

>>> import numpy as np
>>> data = np.array([1.0, 2.0, np.nan, 4.0, np.nan])
>>> fill(data)
array([1., 2., 4., 4., 4.])
>>> invalid = np.isnan(data)
>>> fill(data, invalid)
array([1., 2., 4., 4., 4.])
sloth.toolBox.find_nearest_Index_2D(point, coord2D)[source]

Find the nearest index in a 2D array.

Given a 2D array coord2D and a specific point value, this function returns the index of the 2D array that holds the closest values to the point.

Parameters:
  • point (scalar) – The value for which to find the nearest index in the 2D array.

  • coord2D (ndarray) – The 2D array in which to search for the nearest index.

Returns:

The index in the first and second dimensions of coord2D that holds the closest values to the point.

Return type:

(int, int)

Notes

This function calculates the absolute differences between coord2D and point using np.abs(). It then determines the index of the minimum value in the differences array using np.argmin(). The resulting flattened index is converted into a tuple of indices representing the original shape of coord2D using np.unravel_index().

Examples

>>> import numpy as np
>>> arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> find_nearest_Index_2D(5, arr)
(1, 1)
>>> arr = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]])
>>> find_nearest_Index_2D(0.25, arr)
(0, 1)
sloth.toolBox.get_intervalSlice(dates, sliceInterval='month')[source]

This functions calculates interval slices of a given time-series

This function calculates interval slices of a given time-series, aiming to mimic pythons default slice notation array[start:stop:step] to also enable multi dimensional slicing. The calculation takes the models dumpintervall into account, wherefore this function is working for (nearly) every time-resolution. The calculation is based on datetime-objects and therefore really calculates the slice of individual interval, even if the time-series does not start or end at the first or last of a given interval.

Use case: You got a time-series of hourly data points and want to calculate the monthly mean. This functions returns the slices needed to divide your data into monthly peaces.

Parameters:
  • dates (NDarray) – the time-series as datetime-object.

  • sliceInterval (str) – defining the interval. Supported are: ‘day’, ‘month’

Returns:

Slices – A list with length of found intervals with related slices

Return type:

list

Notes

This function assumes: i) The passed time-series covers at least on intervall! ii) At least hourly steps / dumpIntervals! iii) No seconds in dates - model output is at least on full minutes!

sloth.toolBox.get_prudenceMask(lat2D, lon2D, prudName)[source]

Return a Prudence mask based on latitude and longitude values.

This function generates a boolean mask array where True represents masked areas and False represents non-masked areas. The mask is determined based on a set of latitude and longitude values and the name of the Prudence region.

Parameters:
  • lat2D (ndarray) – 2D array containing latitude information for each pixel.

  • lon2D (ndarray) – 2D array containing longitude information for each pixel.

  • prudName (str) – Short name of the Prudence region.

Returns:

prudMask – Boolean ndarray of the same shape as lat2D, indicating masked and non-masked areas. True = masked; False = not masked.

Return type:

ndarray

Notes

The Prudence mask is created based on specific latitude and longitude ranges for each Prudence region. The function checks the prudName parameter and generates the corresponding mask using NumPy’s np.where() function.

The available Prudence region names and their corresponding latitude and longitude ranges are as follows (True is masked, Fals is not masked):

  • ‘BI’: Latitude: <50.0 or >59.0, Longitude: <-10.0 or >2.0

  • ‘IP’: Latitude: <36.0 or >44.0, Longitude: <-10.0 or >3.0

  • ‘FR’: Latitude: <44.0 or >50.0, Longitude: <-5.0 or >5.0

  • ‘ME’: Latitude: <48.0 or >55.0, Longitude: <2.0 or >16.0

  • ‘SC’: Latitude: <55.0 or >70.0, Longitude: <5.0 or >30.0

  • ‘AL’: Latitude: <44.0 or >48.0, Longitude: <5.0 or >15.0

  • ‘MD’: Latitude: <36.0 or >44.0, Longitude: <3.0 or >25.0

  • ‘EA’: Latitude: <44.0 or >55.0, Longitude: <16.0 or >30.0

If an unsupported Prudence region name is provided, an error message is printed, and the program exits.

For reference see also: http://prudence.dmi.dk/public/publications/PSICC/Christensen&Christensen.pdf p.38

sloth.toolBox.intersection_calculations(df_data, corners, area_of_interest, Name_area, crs_utm, nr_yr, nr_entries, save_dir)[source]

Calculate spatial mean values for a shapefile in interest.

This function calculates spatial mean values (example for a specific region or province) taking into account how much (ratio) of the gridpoint is actually being intersected inside our region of interest.

Parameters:
  • df_data (dataframe) – dataframe containing infromation about each gridpoint

  • corners (dataframe (can also be a nectdf or any other data type)) – containing infromation on the four longitudes and latitudes that surrounds each gridpoint in the source dataframe

  • area_of_interest (a shapefile or geodataframe) – shapefile of the area of interest

  • Name_area (the field name in the shapefile, that the dissolve will be based on)

  • crs_utm (projected coordinate reference system (utm))

  • nr_yr (variable) – number of years of interest

  • nr_entries (variable) – number of hours/days/or months etc..

  • save_dir (path) – path for saving the output

sloth.toolBox.mapDataRange_lin(X, y_min=0, y_max=1, x_min=None, x_max=None, cutMinMax=False)[source]

Map a source data range linearly to a target data range.

Perform a linear mapping of a source data range (X) to a target data range (Y). The function calculates the mapping using the formula: y = (y_max - y_min) / (x_max - x_min) * (x - x_min) + y_min.

Parameters:
  • X (ndarray) – Source data range to remap.

  • y_min (scalar, optional) – Minimum value of the target data range (default is 0).

  • y_max (scalar, optional) – Maximum value of the target data range (default is 1).

  • x_min (scalar, optional) – Minimum value of the source data range. If not provided, it is calculated based on X.

  • x_max (scalar, optional) – Maximum value of the source data range. If not provided, it is calculated based on X.

  • cutMinMax (bool, optional) – If True, cut the mapped values outside the target data range to the minimum and maximum values.

Returns:

The target data range after linear mapping.

Return type:

ndarray

Notes

This function is useful for mapping a source data range to a different target data range. It can be used for various purposes, such as normalizing data ranges to facilitate comparison with other ranges. An intermediat step is used by first transform both (X and Y) into data ranges starting from zero (X’ and Y’), as those data range can be easily mapped with

  1. y’ = y’_max / x’_max * x’

  2. y’ = y - y_min AND x’ = x - x_min

  3. y = (y_max-y_min) / (x_max-x_min) * (x-x_min) + y_min

Examples

>>> X = np.array([0, 1, 2, 3, 4, 5])
>>> mapped = mapDataRange_lin(X, y_min=10, y_max=20)
>>> print(mapped)
[10. 12. 14. 16. 18. 20.]
>>> X = np.array([100, 200, 300, 400, 500])
>>> mapped = mapDataRange_lin(X, y_min=-1, y_max=1, x_min=100, x_max=500)
>>> print(mapped)
[-1.  -0.5  0.   0.5  1. ]
sloth.toolBox.plusOneMonth(currDate)[source]

Return the passed date incremented by one month.

This function provides a simple way to calculate a date that is one month ahead of the given currDate. Unlike the bash command-line tool date, which can handle this calculation easily, Python’s datetime objects are based on hours or seconds, requiring special treatment for adding one month due to varying month lengths. This function serves as a wrapper to handle this task in a clean manner, facilitating usage in other code snippets.

Parameters:

currDate (datetime) – An arbitrary date.

Returns:

The passed date incremented by one month.

Return type:

datetime

Notes

This function determines the number of days in the month of currDate using monthrange() from the calendar module. It then returns currDate incremented by a timedelta calculated as 24 * num_days hours. This ensures that the returned date correctly reflects one month ahead while accounting for varying month lengths.

Examples

>>> import datetime
>>> currDate = datetime.datetime(2023, 5, 15)
>>> plusOneMonth(currDate)
datetime.datetime(2023, 6, 15, 0, 0)
>>> currDate = datetime.datetime(2023, 12, 31)
>>> plusOneMonth(currDate)
datetime.datetime(2024, 1, 31, 0, 0)
sloth.toolBox.spher_dist_v1(lon1, lat1, lon2, lat2, Rearth=6373)[source]

calculate the spherical / haversine distance

This function calculates the real distance (in [km]) between two points on earth given in lat/lon coordinates.

Parameters:
  • lon1 (ndarray) – Longitude value in [rad] of first point in [rad]. Could be any dim

  • lat1 (ndarray) – Latitude value in [rad] of first point in [rad]. Could be any dim

  • lon2 (ndarray) – Longitude value in [rad] of second point in [rad]. Could be any dim

  • lat2 (ndarray) – Latitude value in [rad] of second point in [rad]. Could be any dim

  • Rearth (int or float) – The earth radius in [km].

Returns:

The distance is returned in [km]

Return type:

ndarray

Notes

Source: https://www.kompf.de/gps/distcalc.html

sloth.toolBox.stampLLSM(data, invalid, LLSM, LLSMThreshold=2)[source]

stamps a LLSM to passed data

Some times its needed to force different datasets to use the same Land-Lake-Sea-Mask (LLSM). One example is combining different data sets to force a model with. If the datasets are taken fomr different sources, most propably the used LLSM is different too. This, for exmple, could lead to the situation along the coastline that one dataset is indicating water (ocean) while the other is indicating land. This inconsistency has to be fixe, especially within the realm of coupled models.

To achive this, this function is

  1. masking invalid pixels within the passed data set

  2. interpolate remaining data over the maske regions

  3. set pixel to water value according to a passed LLSM

This guaranthees each land pixel is holding land informtion and each water pixel is marked as water.

Parameters:
  • data (ndarray) – A 2D nd array holding the datset to stamp the LLSM

  • invalid (scalar) – A scalar witht the value indicating invalid pixel

  • LLSM (ndarray) – A ndarray of the same shape as data holding the LLSM (Land=2 Lake=1 Sea=0)

Returns:

A (MASKED!) ndarray of the same shape as data, with stamped LLSM

Return type:

ndarray

sloth.toolBox.trunc(values, decs=0)[source]

Truncate a passed float value or floating ndarray.

This function truncates a given float value or floating ndarray by a specified truncation precision (decimal digits).

Parameters:
  • values (ndarray) – A ndarray of any dimension (including scalar) containing float values to be truncated.

  • decs (int, optional) – An integer value defining the truncation precision. It specifies the number of decimal places to keep (default is 0).

Returns:

A ndarray of the same type as the input, with the specified number of decimal places truncated from the original values.

Return type:

ndarray

Examples

>>> import numpy as np
>>> values = np.array([2.12345, 3.45678, 4.56789])
>>> trunc(values, decs=3)
array([2.123, 3.456, 4.567])
>>> value = 2.98765
>>> trunc(value, decs=2)
2.98