Python scripts#
Warning
Outdated Documentation
This page is currently not up to date and reflects an older commit (e6f69605067650bd949dfd66ae139d4e2ffa02a0) of the project. The content may be inaccurate or missing recent changes.
Updates are planned and will appear in future releases. Please refer to the project repository for the latest information.
This chapter documents the scripts executed via the snakemake routine.
aggregateFiles.py#
Variables#
config(yaml): Snakemake configuration filedomain(string): Snakemake wildcard for domainclimate_variable(string): name of climate variablelat_name(string): ‘lat’ for cordex or ‘latitude’ for era5 datalon_name(string): ‘lon’ for cordex or ‘longitude’ for era5 datatime_name(string): ‘time’ or determined from climate data filechoose_best_x_percent(float): Percentage of locations to be used in aggregationclimate_data_file(xarray dataset): Input climate data file
Functions#
load(ds):Loads a chunked dataset with timeout to catch crashing of dask.
Args:
ds(xarray dataset): chunked dataset to be loaded
Returns:
xarray dataset: not chunked dataset
toDataFrame(merge_ds, climate_variable, sum=False):Performs aggregation with timeout, because dask sometimes crashes.
Args:
merge_ds(xarray dataset): climate data file to be aggregatedclimate_variable(string): name of climate variablesum(bool, optional): If False, calculate mean for every geometry, else calculate sum (only for hydro inflow). Defaults to False.
Returns:
pandas dataframe: aggregated file
Returns#
pandas dataframe: aggregated file
Summary of script functionality#
Open climate data file with appropriate chunking along the time dimension.
Adjust longitude values if needed.
Determine regions file based on configuration.
Read regions file as a GeoDataFrame.
Extract coordinates from climate data and create a DataFrame with all combinations of lon and lat.
Convert the DataFrame to a GeoDataFrame.
Find intersections between points and regions, drop duplicates, and convert to xarray.
For offshore wind, filter locations based on sea depth.
If not all data shall be aggregated, calculate full load hours and select the best x percent.
Convert the merged dataset to a DataFrame (sum for hydro, mean for all other).
Drop unnecessary columns and set the column index.
Write the final DataFrame to a CSV file specified in the output.
biasAdopt.py#
Constants#
number_of_quantiles100
Functions#
writeToNetCdf(file, path):Performs bias adaption and writes file to disk with timeout, because dask sometimes crashes.
Args:
file(xarray dataset): climate file to be bias adoptedpath(string): path to write file to
Returns#
xarray dataset: bias adopted climate file
Summary of script functionality#
Suppress the warning for encountering all NaN slices.
Load historic model data (
climate_model_hist_q) and observed data (climate_observed_hist_q), reindexing the latter to match the former.Load future model data (
climate_model_future) and calculate quantiles.climate_model_future_q_uppercontains lower bounds for quantiles, calculate upper bounds for quantiles inclimate_model_future_q_upper.Assign quantiles to
climate_model_futuredata by comparing values with lower and upper bounds.Ensure valid values and avoid invalid values like NaN or 0 in
climate_model_hist_q,climate_observed_hist_q, andclimate_model_future_q.Calculate the bias adaption factor by summing over all quantiles and multiply it with the model data.
Attempt to write the bias adopted dataset to a NetCDF file, handling timeouts. If a timeout occurs, print an error message and exit with code 50.
build_backbone_input.py#
Variables#
config(yaml): Snakemake configurationbackbone_input_file(string): path to input file for backbone datasolar_file(string): path to solar data fileonwind_file(string): path to onshore wind data fileoffwind_file(string): path to offshore wind data filehydro_inflow_file(string): path to hydro inflow data fileavai_tpp_file(string): path to available TPP (Thermal Power Plant) data filepath_jrc(string): path to JRC (Joint Research Centre) datasolar_cf(pandas dataframe): Solar capacity factorsonwind_cf(pandas dataframe): Onshore wind capacity factorsoffwind_cf(pandas dataframe): Offshore wind capacity factorshydro_inflow(pandas dataframe): Hydro inflow datajrc_list(pandas dataframe): Cooling types data from JRCcostChange(pandas dataframe): Investment and efficiency changes for different cooling typesunit_df(pandas dataframe): Unit datautAvailabilityLimits(pandas dataframe): Unit availability limitseffLevelGroupUnit(pandas dataframe): Efficiency level group unit datap_gnu_io(pandas dataframe): Power generation unit input-output datap_unit(pandas dataframe): Power unit datagnugroup(pandas dataframe): Group data for GNU (GNU’s Not Unix)ts_unit(pandas dataframe): Time series unit dataflow(pandas dataframe): Dataframe containing different energy flow typesflowUnit(pandas dataframe): Dataframe linking flow types to unitsts_cf(pandas dataframe): Time series capacity factors``capacity_hydro` (pandas dataframe):` Hydro and run-of-river capacity factors
ror_cf(pandas dataframe): Run-of-river capacity factors adjusted by capacity factorsts_influx(pandas dataframe): Time series data for hydro inflow and storagedemand(pandas dataframe): Demand datatemperature(pandas dataframe): Temperature data (if output includes ‘heat’)ts_node(pandas dataframe): Time series data for nodeschanged_sheets(list): List of sheets that have been modified
Returns#
excel file: modified backbone excel file
Summary of script functionality#
Modifies backbone input Excel file with updated data in the specified sheets.
build_csp_profile.py#
Variables#
lat_name(string): ‘lat’ for cordex or ‘latitude’ for era5 datalon_name(string): ‘lon’ for cordex or ‘longitude’ for era5 datairradiance(xarray dataset): Opened irradiance dataset from input filersds_name(string): ‘rsds’ for cordex, from config for era 4config(yaml): Snakemake configuration
Functions#
writeToNetCdf(file, path):Writes profile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated profilepath(string): path to write file to
Returns#
xarray dataset: available irradiance for csp
Summary of script functionality#
Open the irradiance dataset from the input file.
Adapt variable naming based on the data source (ERA5 or cordex).
If the input data is ERA5, convert irradiance from J/m^2 to W/m^2.
Rename the irradiance variable to ‘csp’.
Define a function
writeToNetCdfto write the dataset to a NetCDF file with a timeout.Attempt to write the irradiance dataset to the output file. If a timeout occurs, print an error message and exit.
build_demand_profile.py#
Variables#
config(yaml): Snakemake configurationreference_year(int): Year from the configuration for demandclimate_data_path(string): Path to aggregated climate dataregression_file_path(string): Path to regression filedemand_basic_file(string): Path to historical demand dataregions_file_path(string): Path to regions file based on custom bus map or doSummary of script functionalitydemand_basic(pandas dataframe): DataFrame containing historical demand data for the reference yearregression_file(pandas dataframe): DataFrame from the regression filetemperature(pandas dataframe): DataFrame containing daily temperatures from climate datascaling_factors(pandas dataframe): Empty DataFrame for country and scaling factorregions_file(geopandas dataframe): GeoDataFrame of regionsnuts3(geopandas dataframe): GeoDataFrame of NUTS3 regionsmoreCountriesThanOne(list): List of country codes with more than one countryoverlay(geopandas dataframe): GeoDataFrame of overlay areas between bus and NUTS3 regionsperNode(geopandas dataframe): Aggregated data per bus node (population, GDP, country)perCountry(geopandas dataframe): Aggregated data per country (population, GDP, country)mergeDf(geopandas dataframe): Merged DataFrame of perNode and perCountrydemand_final(pandas dataframe): DataFrame for final demand data
Constants#
europe2Letter(list): List of European country codes (2-letter)europe3Letter(list): List of European country codes (3-letter)threeToTwoDictionary mapping 3-letter country codes to 2-letter country codes
Returns#
pandas dataframe: demand data per node in regions file
Summary of script functionality#
Data Preparation
Filter
demand_basicfor the reference yearRead
regression_file,temperature,regions_file, andnuts3Aggregate demand for nodes with multiple countries
Calculate overlay areas and merge data for nodes and countries
Calculate Scaling Factors
Calculate scaling factors for nodes based on GDP and population
Adjust Demand
Adjust historical demand based on temperature and regression coefficients
Scale demand using the calculated factors and merge with
demand_final
Output
Save the final demand data to a CSV file specified in
snakemake.output[0]
build_hydro_profile.py#
Variables#
power plant_database(pandas dataframe): information on power plants in Europe (capacity, location etc.)runoff_file(xarray dataset): river runoff file
Functions#
writeToNetCdf(file, path):Writes profile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated profilepath(string): path to write file to
Returns#
xarray data set: hydro inflow
Summary of script functionality#
Read
power plant_databaseCSV file.Open the runoff as
mrro.Identify the nearest point to each power plant in the
mrrofile.Convert
power plant_databaseto xarray, grouping by longitude and latitude.Merge
mrroandpower plant_databaseinto one dataset.Set negative values in the dataset to 0.
Calculate hourly hydro capacity based on actual runoff, historic runoff, and installed capacity.
Cap the capacity if it exceeds the installed capacity.
Set capacity values less than 0 to 0.
Convert capacity to a dataset named ‘hydro’.
Write the dataset to a NetCDF file, with a timeout mechanism for parallel computing. If a timeout occurs, print an error message and exit.
build_solar_profile.py#
Variables#
config(yaml): Snakemake configurationlat_name(string): ‘lat’ for cordex or ‘latitude’ for era5 datalon_name(string): ‘lon’ for cordex or ‘longitude’ for era5 datairradiance(xarray dataset): irradiance in climate modeltemperature(xarray dataset): temperature in climate modeltas_name(string): tas for cordex, different for era5rsds_name(string): rsds for cordex, different for era5``wind_name` (string):` sfcWind for cordex, different for era5
GStc,TStc,c1,c2,c3,c4,beta,gamma(float): specifications of PV cellTcell(xarray dataset): temperature of cellcf_solar(xarray dataset): calculated solar capacity factor
Functions#
writeToNetCdf(file, path):Writes profile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated profilepath(string): path to write file to
Returns#
xarray dataset: calculated solar capacity factor
Summary of script functionality#
Open the irradiance dataset.
Adapt variable names based on the dataset being observed or not.
Convert irradiance values from J/m^2 to W/m^2 if using observed data.
Calculate solar capacity factor based on the selected option (1, 2, or 3).
Ensure solar capacity factor values are between 0 and 1.
Write the solar capacity factor data to a NetCDF file, handling timeouts.
build_topo.py#
Variables#
config(yaml): Contains configuration settings.x_min,y_min,x_max,y_max(float): Define the minimum and maximum longitude and latitude values for the specified doSummary of script functionality.x_size,y_size,x_step,y_step(float): Define the size and step values for the grid.
Returns:#
xarray dataset: specification of grid
Summary of script functionality#
Calculates grid specifications for tool cdo.
Writes the grid specifications to a file.
build_tppCL_profile.py#
Variables#
lat_name(string): ‘lat’ for cordex or ‘latitude’ for era5 datalon_name(string): ‘lon’ for cordex or ‘longitude’ for era5 dataplantType(string): type of plant to calculate availability forplantTypeData(string): replace plantType for some plantTypes without data: ‘Coal’ for biomass, ‘CCGT’ for H2, otherwise same asplantTypetas_name(string): ‘tas’ or configured based onbias_adaptionfor ERA5 or cordex datatemp_const(float): temperature constant for specificplantTypeDataT_health(float): health temperature for specificplantTypeData
Functions#
writeToNetCdf(file, path):Writes profile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated profilepath(string): path to write file to
Returns#
xarray dataset: availability for chosen power plant type
Summary of script functionality#
Open temperature data from input
Calculate availability for closed-loop cooled thermal power plants based on temperature and constants
Rename availability variable based on
plantTypeWrite availability to NetCDF output
Handle timeout errors and print message if exceeded maximum time
build_tppOT_profile.py#
Variables#
lat_name(string): ‘lat’ for cordex or ‘latitude’ for era5 datalon_name(string): ‘lon’ for cordex or ‘longitude’ for era5 dataplantType(string): type of plant (biomass, h2, etc.)year(int): year extracted from the inputtemperature(xarray dataset): Dataset containing temperature data from climate modelmrro(xarray dataset): Dataset containing river runoff dataquantileFlows(xarray dataset): Dataset containing quantiles for historic river runoff
Constants#
plantAvailabilityDict(dict): Dictionary mapping quantiles to plant availability valuestStreamMin(float): 273 (minimum stream temperature in K)tStreamMax(float): 303.4 (maximum stream temperature in K)lambdaStream(float): 0.14 (constant for exponential function)tStreamIn(float): 289.5 (temperature at inflection point in K)lastDayNumber(int): last day of year 363 or 364 for leap years
Functions#
writeToNetCdf(file, path):Writes profile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated profilepath(string): path to write file to
Returns#
xarray dataset: availability for chosen power plant type
Summary of script functionality#
Calculate
temperatureStreambased on the provided formulaAdjust
timebased on leap years and remove duplicatesModify
mrrofor non-observed output by adding missing timestepsCalculate
waterAvailabilitybased on quantile flows and plant availabilityRename and adjust
temperatureStreambased on the plant typeExtract configuration parameters for temperature calculations
Calculate
availabilitybased on temperature conditions and water availabilityHandle NaN values in the availability calculation
Write the processed
availabilitydata to a NetCDF file at the specified output pathIf a timeout occurs, print an error message and exit with code 50.
build_wind_profile.py#
Variables#
lat_name(string): ‘lat’ for cordex or ‘latitude’ for era5 datalon_name(string): ‘lon’ for cordex or ‘longitude’ for era5 datawind_name(string): ‘sfcWind’wind_speedcalculated wind speed datawind_names_obs(dict): dictionary of wind variable names in era5 datafilename(string): input climate data filefilename_2(string): additional input climate data file (only for era5)popt(list): fit parameters for turbine curve
Functions#
writeToNetCdf(file, path):Writes profile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated profilepath(string): path to write file to
f(x, a, b, c, d):fit function for turbine curve
Args:
x(float): wind speeda, b, c, d(float): constant from fit in calculate turbine curve
Returns:
float: capacity factor of turbine at wind speed x
Returns#
xarray dataset: calculated wind capacity factors
Summary of script functionality#
Open climate data file(s)
Load fit parameters from a pickle file
Apply the fit function to wind speed data to obtain wind speed capacity factors
Adjust wind speed values to be within a specific range
Rename the wind variable
Write the processed wind speed data to a NetCDF file, with a timeout mechanism for parallel computing. If a timeout occurs, print an error message.
calculate_quantiles.py#
Variables#
climate_variable(string): name of climate variableyear_hist_start(int): Start year for historical data.year_hist_end(int): End year for historical data.years_hist(list): List of years in the historical period.lat_name(string): ‘lat’ for cordex or ‘latitude’ for era5 datalon_name(string): ‘lon’ for cordex or ‘longitude’ for era5 datatime_name(string): Name of the time coordinate.quantiles(list): List of quantiles for empirical CDFs.maxTime(int): Maximum time for parallel computing.
Constants#
number_of_quantiles100
Functions#
writeToNetCdf(file, path):Writes profile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated profilepath(string): path to write file to
Returns#
xarray dataset: quantiles for given climate variable in given time periods
Summary of script functionality#
Load historical climate data for each year, handling specific cases for observed wind speed data.
Concatenate the historical climate data along the time dimension.
Adjust units for irradiance and longitude values.
Calculate quantiles for the historical data.
Write the quantiles to a NetCDF file, handling timeouts.
calculate_turbine_curve.py#
Variables#
config(yaml): Snakemake configurationhub_height(float): Height of the wind turbine hubv_in(float): Cut-in wind speedv_r(float): Rated wind speedv_out(float): Cut-out wind speedheight(float): Height for sfcWind calculation``cf_wind``(dict): Dictionary to store wind capacity factors
Functions#
``power_curve(v, v_in, v_r, v_out)`:
Calculated capacity factor for wind turbine with standardized production function.
Args:
v(float): wind speedv_in(float): cut in velocityv_r(float): rated velocityv_out(float): cut out velocity
Returns:
float: capacity factor of turbine at wind speed x
f(x, a, b, c, d):fit function for turbine curve
Args:
x(float): wind speeda, b, c, d(float): constant from fit in calculate turbine curve
Returns:
float: capacity factor of turbine at wind speed x
Returns#
dict: wind speed with corrsponding capacity factors
Summary of script functionality#
Determine wind turbine parameters based on whether it is offshore or onshore.
Calculate wind capacity factors for different wind speeds and heights.
Fit a curve to the calculated capacity factors using
curve_fit.Save the optimized parameters to a file specified in
snakemake.output[0].
demand_regression.py#
Variables#
yearStart(int): Start year for demand data.yearEnd(int): End year for demand data.years(list): List of years within the specified range.climate_model(string): name of climate model.rcp(string): name of RCP scenario.date_series(list): List of formatted date strings.regions_file_path(string): Path to the regions file.regions_file(geopandas dataframe): GeoDataFrame containing region information.temperature_historic(pandas dataframe): DataFrame storing historic temperature data.demand(pandas dataframe): DataFrame containing demand data.moreCountriesThanOne(list): List of country codes with multiple countries in a node.regression_parameters(pandas dataframe): DataFrame to store regression parameters.
Constants#
europe2Letter(list): List of European country codes (2-letter)europe3Letter(list): List of European country codes (3-letter)threeToTwoDictionary mapping 3-letter country codes to 2-letter country codes
Functions#
fitfunc(x, a, b, c):quadratic fitting function
Args:
x(float): input dataa, b, c(float): fitting parameters
*Returns:
*float: function values Returns the result of a quadratic function.
polyfit(x, y, degree, coeffs):fitting for quadratic function
Args:
x(array): x valuesy(array): y valuescoeffs(array): coefficients
Returns:
array: coefficients of fit function
daterange(date1, date2):calculate date series between two dates
Args:
date1(date): start datedate2(date): end dateYields:
list: all dates between start and end date
Returns#
pandas dataframe: regression parameter
Summary of script functionality#
Generate date series.
Read the bus map to map node names to country codes.
Process historic temperature data and group by country.
Load historic demand data.
Aggregate demand for nodes with multiple countries.
Perform regression analysis between historic demand and temperatures for each country and save the results.
Save regression parameters to an output file.
download_cordexData.py#
Variables#
config(yaml): Snakemake configurationmodel(string): name of climate modelrcp(string): name of RCP scenarioclimate_variable(string): name of climate variabletoTemporal(string): desired temporal resoultion of result (hhourly orddaily)year(int): year to downloadtimeFrequency(string): time frequency to download from esgfensemble,institute,RCMModel,downscalingRealisation(string): additional paramerers for download of cordex dataoutputPath(string): where data is downloaded to
Returns#
xarray dataset: downloaded cordex datafile
Summary of script functionality#
Query ESGF data node based on specified parameters.
Create a wget script for downloading climate data for a specific year.
Modify the wget script for quietness and write it to a file.
Run the wget script to download the file.
Rename the downloaded file and handle download failures.
Remove the wget script and wget status file.
download_era5Data.py#
Variables#
c:cdsapi.Clientobjectclimate_variable(string): Name of climate variableyear(int): Year of the data to downloadx_min,x_max,y_min,y_max(float): Geographic boundariesclimate_data_file(string): path to output file for the climate data.
Functions#
retrieveHourly(c, variable, year, x_min, x_max, y_min, y_max, climate_data_file):Downloads hourly era5 files
Args:
c(cdsapi.Client): cdsapi Clientvariable(string): name of variableyear(int): year of the data to downloadx_min, x_max, y_min, y_max(float): Geographic boundariesclimate_data_file(string): path to output
retrieveDaily(c, variable, year, x_min, x_max, y_min, y_max, climate_data_file):Downloads daily era5 files
Args:
c(cdsapi.Client): cdsapi Clientvariable(string): name of variableyear(int): year of the data to downloadx_min, x_max, y_min, y_max(float): Geographic boundariesclimate_data_file(string): path to output
Returns#
xarray dataset: downloaded era5 datafile
Summary of script functionality#
Retrieves configuration data from
snakemake.config.Determines the geographic boundaries based on the
doSummary of script functionalitywildcard.Determines the climate variable based on the
climate_variable_ERA5wildcard.Calls the appropriate retrieval function (
retrieveHourlyorretrieveDaily) based on the climate variable.
hydro_calibration.py#
Variables#
config(yaml): Snakemake configurationmrro_df(xarray dataset): river runoff dataset
Returns#
pandas dataframe: list of hydro power plant with installed capacity, average runoff and location
Summary of script functionality#
Read hydro power plant database from “resources/jrc-hydro-power-plant-database.csv”.
Filter for hydro dam (HDAM) and run-of-river (HROR) plants.
Initialize “average runoff” column in the power plant database.
For each climate data file, calculate the average runoff for each power plant location.
Adjust the average runoff based on the hydro factor and the number of climate data files.
Save the relevant data (lon, lat, average runoff, installed capacity) to a CSV file.
process_cordex_data.py#
Functions#
writeToNetCdf(file, path):Writes pfile to path with timeout to catch crashing of dask
Args:
file(xarray dataset): file of calculated filepath(string): path to write file to
convert_to_npdatetime(date):Converts different time formats to numpy datetime. Raises an exception if the conversion fails.
Args:
date(misc): date to be converted
Raises:
Exception: is raised if datetype cannot be converted
Returns:
numpy datetime64: converted date
Returns#
xarray dataset: processed climate datafile
Summary of script functionality#
Define the
configvariable usingsnakemake.config.Create the
data_dirpath based on the doSummary of script functionality wildcard.Extract
model,year,toTemporal, andclimate_variablefromsnakemake.wildcards.Check if the output directory for the remapped data does not exist, then create it.
Set the
input_fileandoutput_filepaths, adjusting for Windows naming if necessary.If the temporal resolution is hourly (
toTemporal == '1h'):Interpolate, remap, and invert latitudes of the input data to hourly resolution.
Open the output file, convert timesteps to datetime, adjust for 360-day models, and handle missing timesteps.
Write the processed data to the output file.
If the temporal resolution is daily (
toTemporal == 'd'):Interpolate, remap, and invert latitudes of the input data to daily resolution.
Open the output file, convert timesteps to datetime, adjust for 360-day models, and handle leap years.
Write the processed data to the output file.
rename_cordexData.py#
Variables#
config(yaml): snakemake.configmodel(string): name of climate modelclimate_variable(string): name of climate variablercp(string): name of RCP scenarioyear(int): yeartoTemporal(string): desired temporal resoultion of result (hhourly orddaily)path(string): path where cordex data is storedtoTemporalCordex1(string): “day” iftoTemporalis “d”, otherwise “1hr”toTemporalCordex2(string): “day” iftoTemporalis “d”, otherwise “3hr”fileList(list): list of files inpathnewPath(string): path for the renamed file
Returns#
renamed files
Summary of script functionality#
Iterate through
fileListIf the file matches specific criteria based on
model,climate_variable,rcp,year, andtoTemporalCordex1ortoTemporalCordex2, rename the file tonewPathwith the specified format.
retrieveGebcoDataset.py#
Returns#
xarray dataset: gebco dataset
Summary of script functionality#
Downloads gebco file for height data.
Unzips and saves it.