Bin transformation and aggregation of climate data

The function staggregate_bin() aggregates climate data to the daily level, splits these daily values into bins, and aggregates the transformed values to the polygon level and desired temporal scale.

Usage

staggregate_bin(
  data,
  overlay_weights,
  daily_agg,
  time_agg = "month",
  start_date = NA,
  time_interval = "1 hour",
  bin_breaks
)

Arguments

data: The raster brick with the data to be transformed and aggregated
overlay_weights: A table of weights which can be generated using the function overlay_weights()
daily_agg: How to aggregate hourly values to daily values prior to transformation. Options are 'sum', 'average', or 'none' ('none' will transform values without first aggregating to the daily level)
time_agg: the temporal scale to aggregate data to. Options are 'hour', 'day', 'month', or 'year' ('hour' cannot be selected unless daily_agg = 'none')
start_date: the date (and time, if applicable) of the first layer in the raster. To be input in a format compatible with lubridate::as_datetime(), e.g. "1991-10-29" or "1991-10-29 00:00:00". The default is NA since the rasters usually already contain temporal information in the layer names and they do not need to be manually supplied.
time_interval: the time interval between layers in the raster to be aggregated. To be input in a format compatible with seq(), e.g. '1 day' or '3 months'. The default is '1 hour' and this argument is required if daily_agg is not 'none' or if the start_date argument is not NA.
bin_breaks: A vector of bin boundaries to split the data by

Examples

bin_output <- staggregate_bin(
  data = terra::rast(temp_nj_jun_2024_era5) - 273.15, # Climate data to transform and
                                         # aggregate
  overlay_weights = overlay_weights_nj, # Output from overlay_weights()
  daily_agg = "average", # Average hourly values to produce daily values
                         # before transformation
  time_agg = "month", # Sum the transformed daily values across months
  start_date = "2024-06-01 00:00:00", # The start date of the supplied data,
                                      # only required if the layer name
                                      # format is not compatible with stagg
  time_interval = "1 hour", # The temporal interval of the supplied data,
                            # required if daily_agg is not "none" or if the
                            # start_date argument is not NA
  bin_breaks = c(0, 2.5, 5, 7.5, 10) # Draw 6 bins from ninf to 0, 0 to 2.5,
                                     # 2.5 to 5, 5 to 7.5, 7.5 to 10, 10 to
                                     # inf
  )
#> Rewriting the data's temporal metadata (layer names) to reflect a dataset starting on the supplied start date and with a temporal interval of 1 hour
#> Averaging over 24 layers per day to get daily values
#> Executing binning transformation
#> Aggregating by polygon and month

head(bin_output)
#>     year month poly_id bin_ninf_to_0 bin_0_to_2.5 bin_2.5_to_5 bin_5_to_7.5
#>    <num> <num>  <char>         <num>        <num>        <num>        <num>
#> 1:  2024     6     011             0            0            0            0
#> 2:  2024     6     033             0            0            0            0
#> 3:  2024     6     015             0            0            0            0
#> 4:  2024     6     009             0            0            0            0
#> 5:  2024     6     007             0            0            0            0
#> 6:  2024     6     041             0            0            0            0
#>    bin_7.5_to_10 bin_10_to_inf
#>            <num>         <num>
#> 1:             0            30
#> 2:             0            30
#> 3:             0            30
#> 4:             0            30
#> 5:             0            30
#> 6:             0            30