Python scripts
=====

.. warning::

   **Outdated Documentation**

   This page is currently **not up to date** and reflects an older commit (e6f69605067650bd949dfd66ae139d4e2ffa02a0) of the project. 
   The content may be inaccurate or missing recent changes. 

   Updates are planned and will appear in future releases. 
   Please refer to the project repository for the latest information.

This chapter documents the scripts executed via the snakemake routine.

.. _aggfiles:

aggregateFiles.py
------------

Variables
~~~~~~~

* ``config`` (yaml): Snakemake configuration file
* ``domain`` (string): Snakemake wildcard for domain
* ``climate_variable`` (string): name of climate variable
* ``lat_name`` (string): 'lat' for cordex or 'latitude' for era5 data
* ``lon_name`` (string): 'lon' for cordex or 'longitude' for era5 data
* ``time_name`` (string): 'time' or determined from climate data file
* ``choose_best_x_percent`` (float): Percentage of locations to be used in aggregation
* ``climate_data_file`` (xarray dataset): Input climate data file

Functions
~~~~~~~

* ``load(ds)``:

  * Loads a chunked dataset with timeout to catch crashing of dask.
  * Args:

    * ``ds`` (xarray dataset): chunked dataset to be loaded
  * Returns:

    * xarray dataset: not chunked dataset
* ``toDataFrame(merge_ds, climate_variable, sum=False)``:

  * Performs aggregation with timeout, because dask sometimes crashes.
  * Args:

    * ``merge_ds`` (xarray dataset): climate data file to be aggregated
    * ``climate_variable`` (string): name of climate variable
    * ``sum`` (bool, optional): If False, calculate mean for every geometry, else calculate sum (only for hydro inflow). Defaults to False.

  * Returns:

    * pandas dataframe: aggregated file

Returns
~~~~~~~

* pandas dataframe: aggregated file

Summary of script functionality
~~~~~~~

1. Open climate data file with appropriate chunking along the time dimension.
2. Adjust longitude values if needed.
3. Determine regions file based on configuration.
4. Read regions file as a GeoDataFrame.
5. Extract coordinates from climate data and create a DataFrame with all combinations of lon and lat.
6. Convert the DataFrame to a GeoDataFrame.
7. Find intersections between points and regions, drop duplicates, and convert to xarray.
8. For offshore wind, filter locations based on sea depth.
9. If not all data shall be aggregated, calculate full load hours and select the best x percent.
10. Convert the merged dataset to a DataFrame (sum for hydro, mean for all other).
11. Drop unnecessary columns and set the column index.
12. Write the final DataFrame to a CSV file specified in the output.

.. _biasAdopt:

biasAdopt.py
------------


Constants
~~~~~~~

* ``number_of_quantiles`` 100

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Performs bias adaption and writes file to disk with timeout, because dask sometimes crashes.
  * Args:

    * ``file`` (xarray dataset): climate file to be bias adopted
    * ``path`` (string): path to write file to 

Returns
~~~~~~~

* xarray dataset: bias adopted climate file

Summary of script functionality
~~~~~~~

1. Suppress the warning for encountering all NaN slices.
2. Load historic model data (``climate_model_hist_q``) and observed data (``climate_observed_hist_q``), reindexing the latter to match the former.
3. Load future model data (``climate_model_future``) and calculate quantiles.
4. ``climate_model_future_q_upper`` contains lower bounds for quantiles, calculate upper bounds for quantiles in ``climate_model_future_q_upper``.
5. Assign quantiles to ``climate_model_future`` data by comparing values with lower and upper bounds.
6. Ensure valid values and avoid invalid values like NaN or 0 in ``climate_model_hist_q``, ``climate_observed_hist_q``, and ``climate_model_future_q``.
7. Calculate the bias adaption factor by summing over all quantiles and multiply it with the model data.
8. Attempt to write the bias adopted dataset to a NetCDF file, handling timeouts. If a timeout occurs, print an error message and exit with code 50.

.. _buildBB:

build_backbone_input.py
------------

Variables
~~~~~~~

* ``config`` (yaml): Snakemake configuration
* ``backbone_input_file`` (string): path to input file for backbone data
* ``solar_file`` (string): path to solar data file
* ``onwind_file`` (string): path to onshore wind data file
* ``offwind_file`` (string): path to offshore wind data file
* ``hydro_inflow_file`` (string): path to hydro inflow data file
* ``avai_tpp_file`` (string): path to available TPP (Thermal Power Plant) data file
* ``path_jrc`` (string): path to JRC (Joint Research Centre) data
* ``solar_cf`` (pandas dataframe): Solar capacity factors
* ``onwind_cf`` (pandas dataframe): Onshore wind capacity factors
* ``offwind_cf`` (pandas dataframe): Offshore wind capacity factors
* ``hydro_inflow`` (pandas dataframe): Hydro inflow data
* ``jrc_list`` (pandas dataframe): Cooling types data from JRC
* ``costChange`` (pandas dataframe): Investment and efficiency changes for different cooling types
* ``unit_df`` (pandas dataframe): Unit data
* ``utAvailabilityLimits`` (pandas dataframe): Unit availability limits
* ``effLevelGroupUnit`` (pandas dataframe): Efficiency level group unit data
* ``p_gnu_io`` (pandas dataframe): Power generation unit input-output data
* ``p_unit`` (pandas dataframe): Power unit data
* ``gnugroup`` (pandas dataframe): Group data for GNU (GNU's Not Unix)
* ``ts_unit`` (pandas dataframe): Time series unit data
* ``flow`` (pandas dataframe): Dataframe containing different energy flow types
* ``flowUnit`` (pandas dataframe): Dataframe linking flow types to units
* ``ts_cf`` (pandas dataframe): Time series capacity factors
* ``capacity_hydro` (pandas dataframe):` Hydro and run-of-river capacity factors
* ``ror_cf`` (pandas dataframe): Run-of-river capacity factors adjusted by capacity factors
* ``ts_influx`` (pandas dataframe): Time series data for hydro inflow and storage
* ``demand`` (pandas dataframe): Demand data
* ``temperature`` (pandas dataframe): Temperature data (if output includes 'heat')
* ``ts_node`` (pandas dataframe): Time series data for nodes
* ``changed_sheets`` (list): List of sheets that have been modified

Returns
~~~~~~~

* excel file: modified backbone excel file

Summary of script functionality
~~~~~~~

* Modifies backbone input Excel file with updated data in the specified sheets.

.. _buildCsp:

build_csp_profile.py
------------

Variables
~~~~~~~

* ``lat_name`` (string): 'lat' for cordex or 'latitude' for era5 data
* ``lon_name`` (string): 'lon' for cordex or 'longitude' for era5 data
* ``irradiance`` (xarray dataset): Opened irradiance dataset from input file
* ``rsds_name`` (string): 'rsds' for cordex, from config for era 4
* ``config`` (yaml): Snakemake configuration

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes profile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated profile
    * ``path`` (string): path to write file to

Returns
~~~~~~~

* xarray dataset: available irradiance for csp

Summary of script functionality
~~~~~~~

1. Open the irradiance dataset from the input file.
2. Adapt variable naming based on the data source (ERA5 or cordex).
3. If the input data is ERA5, convert irradiance from J/m^2 to W/m^2.
4. Rename the irradiance variable to 'csp'.
5. Define a function ``writeToNetCdf`` to write the dataset to a NetCDF file with a timeout.
6. Attempt to write the irradiance dataset to the output file. If a timeout occurs, print an error message and exit.

.. _buildDemand:

build_demand_profile.py
------------


Variables
~~~~~~~

* ``config`` (yaml): Snakemake configuration
* ``reference_year`` (int): Year from the configuration for demand
* ``climate_data_path`` (string): Path to aggregated climate data
* ``regression_file_path`` (string): Path to regression file
* ``demand_basic_file`` (string): Path to historical demand data
* ``regions_file_path`` (string): Path to regions file based on custom bus map or doSummary of script functionality
* ``demand_basic`` (pandas dataframe): DataFrame containing historical demand data for the reference year
* ``regression_file`` (pandas dataframe): DataFrame from the regression file
* ``temperature`` (pandas dataframe): DataFrame containing daily temperatures from climate data
* ``scaling_factors`` (pandas dataframe): Empty DataFrame for country and scaling factor
* ``regions_file`` (geopandas dataframe): GeoDataFrame of regions
* ``nuts3`` (geopandas dataframe): GeoDataFrame of NUTS3 regions
* ``moreCountriesThanOne`` (list): List of country codes with more than one country
* ``overlay`` (geopandas dataframe): GeoDataFrame of overlay areas between bus and NUTS3 regions
* ``perNode`` (geopandas dataframe): Aggregated data per bus node (population, GDP, country)
* ``perCountry`` (geopandas dataframe): Aggregated data per country (population, GDP, country)
* ``mergeDf`` (geopandas dataframe): Merged DataFrame of perNode and perCountry
* ``demand_final`` (pandas dataframe): DataFrame for final demand data

Constants
~~~~~~~

* ``europe2Letter`` (list): List of European country codes (2-letter)
* ``europe3Letter`` (list): List of European country codes (3-letter)
* ``threeToTwo`` Dictionary mapping 3-letter country codes to 2-letter country codes

Returns
~~~~~~~

* pandas dataframe: demand data per node in regions file

Summary of script functionality
~~~~~~~

1. **Data Preparation**

   * Filter ``demand_basic`` for the reference year
   * Read ``regression_file``, ``temperature``, ``regions_file``, and ``nuts3``
   * Aggregate demand for nodes with multiple countries
   * Calculate overlay areas and merge data for nodes and countries

2. **Calculate Scaling Factors**

   * Calculate scaling factors for nodes based on GDP and population

3. **Adjust Demand**

   * Adjust historical demand based on temperature and regression coefficients
   * Scale demand using the calculated factors and merge with ``demand_final``

4. **Output**

   * Save the final demand data to a CSV file specified in ``snakemake.output[0]``

.. _buildHydro:

build_hydro_profile.py
------------

Variables
~~~~~~~

* ``power plant_database`` (pandas dataframe): information on power plants in Europe (capacity, location etc.)
* ``runoff_file`` (xarray dataset): river runoff file

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes profile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated profile
    * ``path`` (string): path to write file to

Returns
~~~~~~~

* xarray data set: hydro inflow

Summary of script functionality
~~~~~~~

1. Read ``power plant_database`` CSV file.
2. Open the runoff as ``mrro``.
3. Identify the nearest point to each power plant in the ``mrro`` file.
4. Convert ``power plant_database`` to xarray, grouping by longitude and latitude.
5. Merge ``mrro`` and ``power plant_database`` into one dataset.
6. Set negative values in the dataset to 0.
7. Calculate hourly hydro capacity based on actual runoff, historic runoff, and installed capacity.
8. Cap the capacity if it exceeds the installed capacity.
9. Set capacity values less than 0 to 0.
10. Convert capacity to a dataset named 'hydro'.
11. Write the dataset to a NetCDF file, with a timeout mechanism for parallel computing. If a timeout occurs, print an error message and exit.

.. _buildSolar:

build_solar_profile.py
------------

Variables
~~~~~~~

* ``config`` (yaml):  Snakemake configuration
* ``lat_name`` (string): 'lat' for cordex or 'latitude' for era5 data
* ``lon_name`` (string): 'lon' for cordex or 'longitude' for era5 data
* ``irradiance`` (xarray dataset): irradiance in climate model
* ``temperature`` (xarray dataset): temperature in climate model
* ``tas_name`` (string): tas for cordex, different for era5
* ``rsds_name`` (string): rsds for cordex, different for era5
* ``wind_name` (string):` sfcWind for cordex, different for era5
* ``GStc``, ``TStc``, ``c1``, ``c2``, ``c3``,  ``c4``, ``beta``, ``gamma`` (float): specifications of PV cell
* ``Tcell`` (xarray dataset): temperature of cell
* ``cf_solar`` (xarray dataset): calculated solar capacity factor

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes profile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated profile
    * ``path`` (string): path to write file to

Returns
~~~~~~~

* xarray dataset: calculated solar capacity factor

Summary of script functionality
~~~~~~~

1. Open the irradiance dataset.
2. Adapt variable names based on the dataset being observed or not.
3. Convert irradiance values from J/m^2 to W/m^2 if using observed data.
4. Calculate solar capacity factor based on the selected option (1, 2, or 3).
5. Ensure solar capacity factor values are between 0 and 1.
6. Write the solar capacity factor data to a NetCDF file, handling timeouts.

.. _buildTopo:

build_topo.py
------------

Variables
~~~~~~~

* ``config`` (yaml): Contains configuration settings.
* ``x_min``, ``y_min``, ``x_max``, ``y_max`` (float): Define the minimum and maximum longitude and latitude values for the specified doSummary of script functionality.
* ``x_size``, ``y_size``, ``x_step``, ``y_step`` (float): Define the size and step values for the grid.

Returns:
~~~~~~~

* xarray dataset: specification of grid

Summary of script functionality
~~~~~~~

1. Calculates grid specifications for tool cdo.
2. Writes the grid specifications to a file.

.. _buildCL:

build_tppCL_profile.py
------------

Variables
~~~~~~~

* ``lat_name`` (string): 'lat' for cordex or 'latitude' for era5 data
* ``lon_name`` (string): 'lon' for cordex or 'longitude' for era5 data
* ``plantType`` (string): type of plant to calculate availability for
* ``plantTypeData`` (string): replace plantType for some plantTypes without data: 'Coal' for biomass, 'CCGT' for H2, otherwise same as ``plantType``
* ``tas_name`` (string): 'tas' or configured based on ``bias_adaption`` for ERA5 or cordex data
* ``temp_const`` (float): temperature constant for specific ``plantTypeData``
* ``T_health`` (float): health temperature for specific ``plantTypeData``

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes profile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated profile
    * ``path`` (string): path to write file to

Returns
~~~~~~~

* xarray dataset: availability for chosen power plant type

Summary of script functionality
~~~~~~~

1. Open temperature data from input
2. Calculate availability for closed-loop cooled thermal power plants based on temperature and constants
3. Rename availability variable based on ``plantType``
4. Write availability to NetCDF output
5. Handle timeout errors and print message if exceeded maximum time

.. _buildOT:

build_tppOT_profile.py
------------

Variables
~~~~~~~

* ``lat_name`` (string): 'lat' for cordex or 'latitude' for era5 data
* ``lon_name`` (string): 'lon' for cordex or 'longitude' for era5 data
* ``plantType`` (string): type of plant (biomass, h2, etc.)
* ``year`` (int): year extracted from the input
* ``temperature`` (xarray dataset): Dataset containing temperature data from climate model
* ``mrro`` (xarray dataset): Dataset containing river runoff data
* ``quantileFlows`` (xarray dataset): Dataset containing quantiles for historic river runoff


Constants
~~~~~~~

* ``plantAvailabilityDict`` (dict): Dictionary mapping quantiles to plant availability values
* ``tStreamMin`` (float): 273 (minimum stream temperature in K)
* ``tStreamMax`` (float): 303.4 (maximum stream temperature in K)
* ``lambdaStream`` (float): 0.14 (constant for exponential function)
* ``tStreamIn`` (float): 289.5 (temperature at inflection point in K)
* ``lastDayNumber`` (int): last day of year 363 or 364 for leap years

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes profile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated profile
    * ``path`` (string): path to write file to

Returns
~~~~~~~

* xarray dataset: availability for chosen power plant type

Summary of script functionality
~~~~~~~

1. Calculate ``temperatureStream`` based on the provided formula
2. Adjust ``time`` based on leap years and remove duplicates
3. Modify ``mrro`` for non-observed output by adding missing timesteps
4. Calculate ``waterAvailability`` based on quantile flows and plant availability
5. Rename and adjust ``temperatureStream`` based on the plant type
6. Extract configuration parameters for temperature calculations
7. Calculate ``availability`` based on temperature conditions and water availability
8. Handle NaN values in the availability calculation
9. Write the processed ``availability`` data to a NetCDF file at the specified output path
10. If a timeout occurs, print an error message and exit with code 50.

.. _buildWind:

build_wind_profile.py
------------

Variables
~~~~~~~

* ``lat_name`` (string): 'lat' for cordex or 'latitude' for era5 data
* ``lon_name`` (string): 'lon' for cordex or 'longitude' for era5 data
* ``wind_name`` (string): 'sfcWind'
* ``wind_speed`` calculated wind speed data
* ``wind_names_obs`` (dict): dictionary of wind variable names in era5 data
* ``filename`` (string): input climate data file
* ``filename_2`` (string): additional input climate data file (only for era5)
* ``popt`` (list): fit parameters for turbine curve

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes profile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated profile
    * ``path`` (string): path to write file to
* ``f(x, a, b, c, d)``:

  * fit function for turbine curve
  * Args:

    * ``x`` (float): wind speed
    * ``a, b, c, d`` (float): constant from fit in calculate turbine curve
  * Returns:

    * float: capacity factor of turbine at wind speed x

Returns
~~~~~~~

* xarray dataset: calculated wind capacity factors

Summary of script functionality
~~~~~~~

1. Open climate data file(s)
2. Load fit parameters from a pickle file
3. Apply the fit function to wind speed data to obtain wind speed capacity factors
4. Adjust wind speed values to be within a specific range
5. Rename the wind variable
6. Write the processed wind speed data to a NetCDF file, with a timeout mechanism for parallel computing. If a timeout occurs, print an error message.

.. _calcQ:

calculate_quantiles.py
------------

Variables
~~~~~~~

* ``climate_variable`` (string): name of climate variable
* ``year_hist_start`` (int): Start year for historical data.
* ``year_hist_end`` (int): End year for historical data.
* ``years_hist`` (list): List of years in the historical period.
* ``lat_name`` (string): 'lat' for cordex or 'latitude' for era5 data
* ``lon_name`` (string): 'lon' for cordex or 'longitude' for era5 data
* ``time_name`` (string): Name of the time coordinate.
* ``quantiles`` (list): List of quantiles for empirical CDFs.
* ``maxTime`` (int): Maximum time for parallel computing.

Constants
~~~~~~~
* ``number_of_quantiles`` 100

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes profile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated profile
    * ``path`` (string): path to write file to

Returns
~~~~~~~

* xarray dataset: quantiles for given climate variable in given time periods

Summary of script functionality
~~~~~~~

1. Load historical climate data for each year, handling specific cases for observed wind speed data.
2. Concatenate the historical climate data along the time dimension.
3. Adjust units for irradiance and longitude values.
4. Calculate quantiles for the historical data.
5. Write the quantiles to a NetCDF file, handling timeouts.

.. _calcCurve:

calculate_turbine_curve.py
------------

Variables
~~~~~~~

* ``config`` (yaml): Snakemake configuration
* ``hub_height`` (float): Height of the wind turbine hub
* ``v_in`` (float): Cut-in wind speed
* ``v_r`` (float): Rated wind speed
* ``v_out`` (float): Cut-out wind speed
* ``height`` (float): Height for sfcWind calculation
* ``cf_wind``(dict): Dictionary to store wind capacity factors

Functions
~~~~~~~


* ``power_curve(v, v_in, v_r, v_out)`:

  * Calculated capacity factor for wind turbine with standardized production function.
  * Args:

    * ``v`` (float): wind speed           
    * ``v_in`` (float): cut in velocity
    * ``v_r`` (float): rated velocity
    * ``v_out`` (float): cut out velocity
  * Returns:

    * float: capacity factor of turbine at wind speed x
* ``f(x, a, b, c, d)``:

  * fit function for turbine curve
  * Args:

    * ``x`` (float): wind speed
    * ``a, b, c, d`` (float): constant from fit in calculate turbine curve
  * Returns:

    * float: capacity factor of turbine at wind speed x

Returns
~~~~~~~

* dict: wind speed with corrsponding capacity factors

Summary of script functionality
~~~~~~~

1. Determine wind turbine parameters based on whether it is offshore or onshore.
2. Calculate wind capacity factors for different wind speeds and heights.
3. Fit a curve to the calculated capacity factors using ``curve_fit``.
4. Save the optimized parameters to a file specified in ``snakemake.output[0]``.

.. _demandReg:

demand_regression.py
------------

Variables
~~~~~~~

* ``yearStart`` (int): Start year for demand data.
* ``yearEnd`` (int): End year for demand data.
* ``years`` (list): List of years within the specified range.
* ``climate_model`` (string): name of climate model.
* ``rcp`` (string): name of RCP scenario.
* ``date_series`` (list): List of formatted date strings.
* ``regions_file_path`` (string): Path to the regions file.
* ``regions_file`` (geopandas dataframe): GeoDataFrame containing region information.
* ``temperature_historic`` (pandas dataframe): DataFrame storing historic temperature data.
* ``demand`` (pandas dataframe): DataFrame containing demand data.
* ``moreCountriesThanOne`` (list): List of country codes with multiple countries in a node.
* ``regression_parameters`` (pandas dataframe): DataFrame to store regression parameters.

Constants
~~~~~~~

* ``europe2Letter`` (list): List of European country codes (2-letter)
* ``europe3Letter`` (list): List of European country codes (3-letter)
* ``threeToTwo`` Dictionary mapping 3-letter country codes to 2-letter country codes

Functions
~~~~~~~

* ``fitfunc(x, a, b, c)``:

  * quadratic fitting function
  * Args:

    * ``x`` (float): input data
    * ``a, b, c`` (float): fitting parameters
  *Returns:

    *float: function values Returns the result of a quadratic function.
* ``polyfit(x, y, degree, coeffs)``:

  * fitting for quadratic function
  * Args:

    * ``x`` (array): x values
    * ``y`` (array): y values
    * ``coeffs`` (array): coefficients

  * Returns:

    array: coefficients of fit function
* ``daterange(date1, date2)``:

  * calculate date series between two dates
  * Args:

    ``date1`` (date): start date
    ``date2`` (date): end date
  * Yields:

    * list: all dates between start and end date   

Returns
~~~~~~~

* pandas dataframe: regression parameter

Summary of script functionality
~~~~~~~

1. Generate date series.
2. Read the bus map to map node names to country codes.
3. Process historic temperature data and group by country.
4. Load historic demand data.
5. Aggregate demand for nodes with multiple countries.
6. Perform regression analysis between historic demand and temperatures for each country and save the results.
7. Save regression parameters to an output file.

.. _downloadCordex:

download_cordexData.py
------------

Variables
~~~~~~~

* ``config`` (yaml): Snakemake configuration
* ``model`` (string): name of climate model
* ``rcp`` (string): name of RCP scenario 
* ``climate_variable`` (string): name of climate variable
* ``toTemporal`` (string): desired temporal resoultion of result (``h`` hourly or ``d`` daily)
* ``year`` (int): year to download
* ``timeFrequency`` (string): time frequency to download from esgf 
* ``ensemble``, ``institute``, ``RCMModel``, ``downscalingRealisation`` (string): additional paramerers for download of cordex data
* ``outputPath`` (string): where data is downloaded to

Returns
~~~~~~~

* xarray dataset: downloaded cordex datafile

Summary of script functionality
~~~~~~~

1. Query ESGF data node based on specified parameters.
2. Create a wget script for downloading climate data for a specific year.
3. Modify the wget script for quietness and write it to a file.
4. Run the wget script to download the file.
5. Rename the downloaded file and handle download failures.
6. Remove the wget script and wget status file.

.. _downloadEra:

download_era5Data.py
------------

Variables
~~~~~~~

* ``c``: ``cdsapi.Client`` object
* ``climate_variable`` (string): Name of climate variable
* ``year`` (int): Year of the data to download
* ``x_min``, ``x_max``, ``y_min``, ``y_max`` (float): Geographic boundaries
* ``climate_data_file`` (string): path to output file for the climate data.

Functions
~~~~~~~

* ``retrieveHourly(c, variable, year, x_min, x_max, y_min, y_max, climate_data_file)``:

  * Downloads hourly era5 files
  * Args:

    * ``c`` (cdsapi.Client): cdsapi Client
    * ``variable`` (string): name of variable
    * ``year`` (int): year of the data to download
    * ``x_min, x_max, y_min, y_max`` (float): Geographic boundaries
    * ``climate_data_file`` (string): path to output
* ``retrieveDaily(c, variable, year, x_min, x_max, y_min, y_max, climate_data_file)``:

  * Downloads daily era5 files
  * Args:
  
    * ``c`` (cdsapi.Client): cdsapi Client
    * ``variable`` (string): name of variable
    * ``year`` (int): year of the data to download
    * ``x_min, x_max, y_min, y_max`` (float): Geographic boundaries
    * ``climate_data_file`` (string): path to output

Returns
~~~~~~~

* xarray dataset: downloaded era5 datafile

Summary of script functionality
~~~~~~~

1. Retrieves configuration data from ``snakemake.config``.
2. Determines the geographic boundaries based on the ``doSummary of script functionality`` wildcard.
3. Determines the climate variable based on the ``climate_variable_ERA5`` wildcard.
4. Calls the appropriate retrieval function (``retrieveHourly`` or ``retrieveDaily``) based on the climate variable.

.. _calbHydro:

hydro_calibration.py
------------

Variables
~~~~~~~

* ``config`` (yaml): Snakemake configuration
* ``mrro_df`` (xarray dataset): river runoff dataset

Returns
~~~~~~~

* pandas dataframe: list of hydro power plant with installed capacity, average runoff and location

Summary of script functionality
~~~~~~~

1. Read hydro power plant database from "resources/jrc-hydro-power-plant-database.csv".
2. Filter for hydro dam (HDAM) and run-of-river (HROR) plants.
3. Initialize "average runoff" column in the power plant database.
4. For each climate data file, calculate the average runoff for each power plant location.
5. Adjust the average runoff based on the hydro factor and the number of climate data files.
6. Save the relevant data (lon, lat, average runoff, installed capacity) to a CSV file.

.. _process:

process_cordex_data.py
------------

Functions
~~~~~~~

* ``writeToNetCdf(file, path)``:

  * Writes pfile to path with timeout to catch crashing of dask
  * Args:

    * ``file`` (xarray dataset): file of calculated file
    * ``path`` (string): path to write file to
* ``convert_to_npdatetime(date)``:

  * Converts different time formats to numpy datetime. Raises an exception if the conversion fails.
  * Args:

    * ``date`` (misc): date to be converted
  * Raises:

    * Exception: is raised if datetype cannot be converted
  * Returns:

    * numpy datetime64: converted date

Returns
~~~~~~~

* xarray dataset: processed climate datafile

Summary of script functionality
~~~~~~~

1. Define the ``config`` variable using ``snakemake.config``.
2. Create the ``data_dir`` path based on the doSummary of script functionality wildcard.
3. Extract ``model``, ``year``, ``toTemporal``, and ``climate_variable`` from ``snakemake.wildcards``.
4. Check if the output directory for the remapped data does not exist, then create it.
5. Set the ``input_file`` and ``output_file`` paths, adjusting for Windows naming if necessary.
6. If the temporal resolution is hourly (``toTemporal == '1h'``):

   * Interpolate, remap, and invert latitudes of the input data to hourly resolution.
   * Open the output file, convert timesteps to datetime, adjust for 360-day models, and handle missing timesteps.
   * Write the processed data to the output file.
7. If the temporal resolution is daily (``toTemporal == 'd'``):

   * Interpolate, remap, and invert latitudes of the input data to daily resolution.
   * Open the output file, convert timesteps to datetime, adjust for 360-day models, and handle leap years.
   * Write the processed data to the output file.

.. _renameCordex:

rename_cordexData.py
------------

Variables
~~~~~~~

* ``config`` (yaml): snakemake.config
* ``model`` (string): name of climate model
* ``climate_variable`` (string): name of climate variable
* ``rcp`` (string): name of RCP scenario
* ``year`` (int): year
* ``toTemporal`` (string): desired temporal resoultion of result (``h`` hourly or ``d`` daily)
* ``path`` (string): path where cordex data is stored
* ``toTemporalCordex1`` (string): "day" if ``toTemporal`` is "d", otherwise "1hr"
* ``toTemporalCordex2`` (string):  "day" if ``toTemporal`` is "d", otherwise "3hr"
* ``fileList`` (list): list of files in ``path``
* ``newPath`` (string): path for the renamed file

Returns
~~~~~~~

* renamed files

Summary of script functionality
~~~~~~~

1. Iterate through ``fileList``
2. If the file matches specific criteria based on ``model``, ``climate_variable``, ``rcp``, ``year``, and ``toTemporalCordex1`` or ``toTemporalCordex2``, rename the file to ``newPath`` with the specified format.

.. _retrieveGebco:

retrieveGebcoDataset.py
------------

Returns
~~~~~~~

* xarray dataset: gebco dataset

Summary of script functionality
~~~~~~~

1. Downloads gebco file for height data.
2. Unzips and saves it.