Calculating the total subsurface storage anomalies using geopandas

In this example spatial mean monthly total subsurface storage for each federal state of Germany from 2011 until 2021 will be calculated taking into account how much (ratio) of the gridpoints is actually being intersected inside each federal state.

Before starting it is important to import the needed libraries.
Make sure to make geopandas and SLOTH available to import in python. If you already have added SLOTH to your PYHTONPATH, below step is not needed.

import os
import sys

cwd = os.getcwd()
sloth_path = f"{cwd}/../"
sys.path.append(sloth_path)
import geopandas as gpd
import xarray as xr
import pandas as pd
from shapely.geometry import Polygon
import sloth.toolBox 

The paths needed for the anomalies calculations are prepared next.

path = '//p/scratch/pfgpude05/belleflamme1/sim/ADAPTER_DE05_ECMWF-HRES_detforecast__FZJ-IBG3-ParFlowCLM380D_v01bJuwelsGpuProdClimatologyTl_PRhourly/postpro/data/'
fname= 'tss_DE05_ECMWF-HRES_hindcast_r1i1p2_FZJ-IBG3-ParFlowCLM380_hgfadapter-h00-v3_1hr_*-*.nc'
fname_tssAVE = 'tssAve_DE05_ECMWF-HRES_hindcast_r1i1p2_FZJ-IBG3-ParFlowCLM380_hgfadapter-h00-v3_1hr_reference-year-2011-2021.nc'

In the following step all files will be compiled together and the monthly total subsurface
storage is calculated.

ds_tss = xr.open_mfdataset(path+fname, decode_times=True)
ds_tss_mon = ds_tss.resample({'time': '1MS'}).mean()  # calculate the monthly mean
df_tss_mon = ds_tss_mon.to_dataframe()
df_tss_mon = df_tss_mon.reset_index()  # the values here represent the monthly tts for the whole 2000*2000 grid
df_tss_mon_sorted=df_tss_mon.sort_values(by=['time'])

For calculating the anomalies, the long term total subsurface storage is needed and calculated next.

ds_tssAVE = xr.open_mfdataset(path + fname_tssAVE, engine="netcdf4", decode_times=True)
ds_tssAVE = ds_tssAVE.drop('rotated_pole')
ds_tssAVE_mon = ds_tssAVE.resample({'time': '1MS'}).mean()  
df_tssAVE_mon = ds_tssAVE_mon.to_dataframe()
df_tssAVE_mon = df_tssAVE_mon.reset_index()

A dataframe is created with the lons and lats of the 2000*2000 grid (the domain that contains the total subsurface storage) in order to add the calculated anomalies later to this dataframe.

df_anomaly = pd.DataFrame(columns=['lon',
                                   'lat',])
df_anomaly['lon'] = df_tss_mon_sorted['lon'].iloc[0:4000000]
df_anomaly['lat'] = df_tss_mon_sorted['lat'].iloc[0:4000000]

The final step is to calculate the anomalies, in this loop the anomalies for each month from 2011-2022 for each grid is calculated, it loops through the years and months and add the values in a column, which means for each month a column with its anomalies values will be added. t

As shown below, the dataframe is prepared as follows:

  • each row represents a grid point in the domain

  • the columns contains information on lon and lat as well as anomalies for each month from 2011 until 2021

df_anomaly
lon lat 2011_1 2011_2 2011_3 2011_4 2011_5 2011_6 2011_7 2011_8 ... 2022_3 2022_4 2022_5 2022_6 2022_7 2022_8 2022_9 2022_10 2022_11 2022_12
0 2.864156 44.254330 -0.455078 -0.175781 0.312500 -11.029297 -37.771484 -26.111328 16.777344 65.214844 ... -0.265625 4.953125 -23.765625 -55.781250 -37.515625 -91.335938 -103.468750 -117.330078 -125.214844 -57.529297
127800604 4.254391 48.737358 -0.806641 -0.367188 0.195312 -9.496094 -33.369141 -19.093750 20.230469 51.841797 ... -0.441406 4.152344 -19.548828 -45.017578 -23.996094 -75.730469 -86.578125 -95.554688 -96.341797 -27.708984
170586722 4.320573 51.805782 -0.154297 -0.035156 0.207031 -4.126953 -6.591797 3.384766 16.972656 15.625000 ... -0.224609 1.564453 -10.769531 -14.166016 -4.728516 -36.582031 -32.123047 -2.962891 -0.312500 -0.189453
646492778 15.528274 47.738449 -0.394531 -0.123047 0.328125 -10.277344 -37.744141 -24.435547 20.156250 64.380859 ... -0.312500 4.826172 -24.535156 -53.386719 -30.810547 -84.294922 -97.890625 -110.275391 -112.976562 -38.738281
22256262 2.113726 48.209255 -0.347656 -0.041016 0.251953 -10.488281 -37.468750 -23.828125 21.552734 63.205078 ... -0.408203 4.753906 -23.535156 -51.101562 -29.115234 -81.833984 -94.062500 -104.388672 -103.269531 -26.167969
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9990224 0.359184 52.314259 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
566925424 13.920066 46.913773 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
47866008 2.631417 48.333389 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
567278116 14.007371 45.914566 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
567809288 13.674760 49.967464 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4000000 rows × 146 columns

   anomaly_list = []
    list_year = [2011, 2012, 2013, 2014,
                 2015, 2016, 2017, 2018,
                 2019, 2020, 2021, 2022
    ]
    list_month = [1, 2, 3,
                  4, 5, 6,
                  7, 8, 9,
                  10, 11, 12,
    ]

    print('calculating the anomalies')
    for i in range(len(list_year)):
        for n in range(len(list_month)):
            print(list_month[n])
            start_mon = f'{list_year[i]}-{list_month[n]}-01'
            end_mon = f'{list_year[i]}-{list_month[n]}-01'
            start_mon = pd.to_datetime(start_mon)
            end_mon = pd.to_datetime(end_mon)
            start_mon_tssAVE = f'2017-{list_month[n]}-01'
            end_mon_tssAVE = f'2017-{list_month[n]}-01'
            start_mon_tssAVE = pd.to_datetime(start_mon_tssAVE)
            end_mon_tssAVE = pd.to_datetime(end_mon_tssAVE)
            df_tss_yr = df_tss_mon[df_tss_mon['time'].between(start_mon, end_mon)]
            df_tss_yr = df_tss_yr.reset_index()
            df_tssAVE_mon_yr = df_tssAVE_mon[df_tssAVE_mon['time'].between(start_mon_tssAVE, end_mon_tssAVE)]
            df_tssAVE_mon_yr = df_tssAVE_mon_yr.reset_index()
            df_anom = df_tss_yr['tss'] - df_tssAVE_mon_yr['tssAve']

            df_anomaly[f'{list_year[i]}_{list_month[n]}'] = df_anom.values
    df_anomaly.to_csv('/p/scratch/pfgpude05/hammoudeh1/sim/ADAPTER_DE05_ECMWF-HRES_detforecast__FZJ-IBG3-ParFlowCLM380D_v01bJuwelsGpuProdClimatologyTl_PRhourly/postpro/data/time_series/tss_anomalies_DE05_2021-20230823.csv')

After the anomalies are calculated, the paths for the data are prepared.
Data needed:

  • Data containing information on the total subsurafce storage to calculate the monthly anomaly.

  • Dataframe (can also be a nectdf or any other data type) containing infromation on the four longitudes and latitudes that surrounds each gridpoint in the source dataframe. this information is needed to be able to create polygons for each gridpoint.

  • Shapefile fo Germany containing all the federal stations.

Another information is needed such as:

  • Projected coordinate reference system (utm).

  • Number of years.

  • Number of days/months etc.

  • A path to save the output.

df_data= df_anomaly
#df_data = '/p/scratch/pfgpude05/hammoudeh1/sim/ADAPTER_DE05_ECMWF-HRES_detforecast__FZJ-IBG3-ParFlowCLM380D_v01bJuwelsGpuProdClimatologyTl_PRhourly/postpro/data/time_series/tss_anomalies_DE05_2021-20230823.csv'
corners = '/p/project/pfgpude05/hammoudeh1/geo_data/corner_grids_DE06/corner_grids_DE06.nc'
area_of_interest ='/p/project/pfgpude05/hammoudeh1/geo_data/nuts2500/2500_NUTS1.shp'
crs_utm = 'EPSG:25832'
nr_yr =12
nr_entries = 12
save_dir = '/p/scratch/pfgpude05/hammoudeh1/sim/ADAPTER_DE05_ECMWF-HRES_detforecast__FZJ-IBG3-ParFlowCLM380D_v01bJuwelsGpuProdClimatologyTl_PRhourly/geo_data/anomalies_geopandas/NUTS3_tss_monthly_anomalies_DE06_ECMWF-HRES_hindcast_r1i1p2_FZJ-IBG3-ParFlowCLM380_hgfadapter-h00-v3_1hr/NUTS1_tss_monthly_anomalies_DE06_Parflow_SH_v1.shp'
shapefile
NUTS_LEVEL NUTS_CODE NUTS_NAME geometry
0 1 DE1 Baden-Württemberg MULTIPOLYGON (((591826.754 5434806.578, 591502...
1 1 DE2 Bayern POLYGON ((759919.057 5493994.847, 759656.392 5...
2 1 DE3 Berlin POLYGON ((812675.236 5831469.065, 813573.397 5...
3 1 DE4 Brandenburg POLYGON ((785946.492 5753692.786, 785689.525 5...
4 1 DE5 Bremen MULTIPOLYGON (((471292.489 5932922.759, 470685...
5 1 DE6 Hamburg MULTIPOLYGON (((463107.577 5977587.614, 462620...
6 1 DE7 Hessen POLYGON ((492636.671 5483361.184, 491734.166 5...
7 1 DE8 Mecklenburg-Vorpommern MULTIPOLYGON (((713722.584 6019028.989, 714041...
8 1 DE9 Niedersachsen MULTIPOLYGON (((508518.071 5806884.353, 508505...
9 1 DEA Nordrhein-Westfalen POLYGON ((373588.503 5612322.662, 372409.785 5...
10 1 DEB Rheinland-Pfalz POLYGON ((420154.696 5433007.302, 418891.303 5...
11 1 DEC Saarland POLYGON ((361553.412 5446433.660, 360354.267 5...
12 1 DED Sachsen POLYGON ((753976.948 5653939.390, 753924.450 5...
13 1 DEE Sachsen-Anhalt POLYGON ((669178.456 5690646.781, 668139.009 5...
14 1 DEF Schleswig-Holstein MULTIPOLYGON (((427783.160 6003522.451, 427486...
15 1 DEG Thüringen POLYGON ((726352.625 5648413.044, 726600.494 5...

The ‘NUTS_NAME’ from the table below, will be used for dissovling the field in the end.
That’s why this information is needed as an input in the tool

Name_area = "NUTS_NAME"

The intersection_calculations tool is imported from sloth.toolbox

sloth.toolbox.intersection_calculations(df_data, corners, area_of_interest, Name_area, crs_utm, nr_yr, nr_entries, save_dir)

The result is shown in the table below:

dissolve_gdf.drop('index_left',axis='columns')
geometry lon lat 2011_1 2011_2 2011_3 2011_4 2011_5 2011_6 2011_7 ... 2022_7 2022_8 2022_9 2022_10 2022_11 2022_12 area NUTS_LEVEL area_intersected weight
NUTS_NAME
Baden-Württemberg MULTIPOLYGON (((400493.888 5268816.619, 401103... 9.045352 48.537956 0.999507 -2.988537 -13.584477 -21.015722 -57.277115 -63.289494 -43.851151 ... -64.984001 -93.216202 -89.057289 -77.876862 -81.391617 -76.319214 374678.085372 1.0 369487.918176 0.986148
Bayern POLYGON ((507442.465 5491666.064, 507374.615 5... 11.420963 48.933758 1.731518 -3.034703 -13.670825 -21.404711 -57.849911 -64.082687 -44.784664 ... -63.859665 -91.726974 -87.553711 -76.597458 -80.274864 -75.176254 375050.561093 1.0 370964.112033 0.989103
Berlin POLYGON ((780348.536 5813302.828, 779911.820 5... 13.403963 52.500927 25.705936 25.638256 14.032354 9.348076 -15.815358 -17.693853 1.582355 ... -85.105247 -108.459457 -101.509254 -95.753525 -108.704613 -102.388306 375838.643631 1.0 347783.234208 0.925356
Brandenburg POLYGON ((676049.159 5877864.309, 676109.609 5... 13.397862 52.468426 5.543557 0.937342 -10.016368 -17.526293 -52.535980 -59.144440 -40.211243 ... -63.198277 -90.098953 -85.329941 -75.727402 -80.210075 -75.104263 375848.489952 1.0 369595.519417 0.983361
Bremen MULTIPOLYGON (((468226.157 5893810.567, 467760... 8.736111 53.206532 -9.506310 -12.064992 -21.316515 -31.167387 -75.887642 -81.646461 -60.356339 ... -78.357590 -107.278130 -101.842323 -84.059784 -87.900513 -85.182060 374818.598036 1.0 326689.773857 0.871589
Hamburg MULTIPOLYGON (((551170.414 5928592.519, 550963... 10.002545 53.550182 25.570534 26.046181 13.929835 10.464141 -12.814467 -16.788189 1.120710 ... -84.439415 -104.581825 -96.698769 -92.634155 -105.355164 -98.748741 374803.502594 1.0 343649.951681 0.916876
Hessen POLYGON ((420136.833 5536098.325, 419637.142 5... 9.029756 50.597702 -1.622735 -6.414443 -17.024820 -24.803610 -62.581284 -69.298172 -49.687885 ... -65.227036 -94.198318 -89.954620 -78.154289 -81.152153 -76.200714 375041.337313 1.0 367454.244201 0.979770
Mecklenburg-Vorpommern MULTIPOLYGON (((617186.776 5910703.204, 616915... 12.542992 53.756886 3.585589 -0.735577 -11.536910 -19.055853 -54.574936 -60.570900 -41.148571 ... -66.532906 -94.190651 -89.620071 -79.108841 -83.113586 -77.816010 375282.656001 1.0 364617.468794 0.971585
Niedersachsen MULTIPOLYGON (((354207.146 5814329.405, 354033... 9.159881 52.761726 7.291996 3.948156 -6.924499 -13.796718 -47.395435 -53.088966 -33.938244 ... -66.691063 -93.021286 -88.040375 -78.674881 -84.412430 -79.208153 374952.406115 1.0 368999.595577 0.984124
Nordrhein-Westfalen POLYGON ((290655.096 5623782.625, 290656.794 5... 7.558647 51.475185 4.601259 0.266528 -10.279454 -17.371027 -52.570518 -58.947182 -39.851475 ... -64.477089 -91.286812 -86.429611 -76.440918 -80.905113 -75.741623 375154.023219 1.0 369483.374360 0.984885
Rheinland-Pfalz POLYGON ((301290.602 5531392.741, 301084.142 5... 7.450834 49.910480 6.048045 2.004749 -9.018032 -16.458361 -50.644306 -56.789558 -37.998516 ... -64.440720 -91.378166 -86.539139 -76.646225 -81.347626 -76.340233 375094.153727 1.0 368624.320980 0.982752
Saarland POLYGON ((315507.595 5481970.296, 314910.921 5... 6.951203 49.383770 2.946488 -1.885279 -12.328921 -20.652977 -58.255016 -63.524227 -44.533035 ... -61.877522 -90.718102 -88.003914 -76.803322 -79.527748 -73.550659 375104.627509 1.0 359668.655686 0.958849
Sachsen POLYGON ((712923.245 5582183.306, 712733.682 5... 13.345225 51.048813 4.797780 -0.519058 -11.254671 -18.747925 -54.531155 -61.287735 -42.006641 ... -61.797779 -88.396599 -83.679886 -74.015678 -78.097069 -72.758545 375951.445976 1.0 368478.593865 0.980122
Sachsen-Anhalt POLYGON ((615254.052 5726743.256, 615250.857 5... 11.702046 52.007923 -10.956268 -16.022284 -26.324970 -35.175495 -76.511673 -83.026962 -63.564831 ... -60.799084 -92.132446 -89.499664 -74.617950 -75.833855 -71.662788 375364.559161 1.0 369012.082595 0.983077
Schleswig-Holstein MULTIPOLYGON (((426389.821 6005095.954, 426461... 9.806491 54.186234 8.032061 4.532070 -6.483582 -13.468135 -46.871441 -52.392162 -33.390022 ... -66.364571 -92.719681 -87.642197 -78.375977 -83.884872 -78.847504 374599.191754 1.0 363606.684402 0.970651
Thüringen POLYGON ((562083.355 5611197.224, 562099.302 5... 11.030182 50.901054 11.136419 8.009848 -2.874439 -9.569095 -42.398903 -48.243431 -29.178293 ... -69.037308 -94.825653 -89.288139 -80.275635 -86.308487 -80.896896 375262.305978 1.0 366870.117407 0.977638

16 rows × 151 columns