Skip to contents

Introduction

This vignette demonstrates how to use the stagg package with data from climateR, as well as other data sources that do not have temporal metadata in a format that stagg accepts in the layer names.

First, we load the packages used in this vignette.

library(climateR) # Obtain climate data
library(tigris) # Polygons to request and aggregate climate data over
library(stagg) # Aggregate the climate data

climateR example: Working with GridMET Data

The climateR package provides access to data from various sources, such as GridMET or CHIRPS.

The raster layer names for data retrieved from climateR, like that of ERA5 data, are compatible with stagg and therefore can be used without any additional steps. Specifically, the layer name format compatible with stagg uses a character-date-time or character-date format following a string header, or the start date-time and temporal time steps of the raster stack must be specified in staggregate_*(). For example, ERA5 uses the format x2021.01.01.00.00.00 or x2021.01.01, and data retrieved using climateR uses the format variable_2021-01-01.

Load New Jersey Counties

First, we load the county boundaries for New Jersey using the tigris package.

nj_counties <- tigris::counties("NJ")

Retrieve GridMET Data

Next, we use the climateR package to retrieve precipitation data from GridMET for New Jersey counties for the year 2010.

gridmet_pr <- climateR::getGridMET(
  AOI = nj_counties,
  varname = "pr",
  startDate = "2010-01-01",
  endDate  = "2010-12-31"
)[[1]] |>  
  raster::stack()

We can print the layer names to ensure their compatibility with stagg.

names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "pr_2010.01.01" "pr_2010.01.02" "pr_2010.01.03"

Calculate Overlay Weights

We calculate overlay weights for the New Jersey counties using the retrieved GridMET data.

county_weights <- stagg::overlay_weights(
  polygons = nj_counties,       # Simple features object with the desired polygons
  polygon_id_col = "COUNTYFP",  # Column name containing polygon identifiers
  grid = gridmet_pr             # Raster layer with the same coordinate system and spatial resolution as the climate data
)
#> Checking for raster/polygon alignment
#> Extracting raster polygon overlap
#> Checking sum of weights within polygons
#> All weights sum to 1.

print(county_weights, digits = 4)
#>           x     y poly_id    w_area w_sum
#>       <num> <num>  <char>     <num> <num>
#>    1: 285.2 41.36     037 0.0007648     1
#>    2: 285.3 41.36     037 0.0029719     1
#>    3: 285.3 41.36     037 0.0038060     1
#>    4: 285.4 41.36     037 0.0001875     1
#>    5: 285.2 41.32     037 0.0022975     1
#>   ---                                    
#> 1891: 285.4 40.19     021 0.0095147     1
#> 1892: 285.3 40.15     021 0.0047411     1
#> 1893: 285.3 40.15     021 0.0010524     1
#> 1894: 285.4 40.15     021 0.0111197     1
#> 1895: 285.4 40.15     021 0.0116498     1

Perform Polynomial Aggregation

We perform polynomial aggregation on the GridMET data using the calculated overlay weights.

polynomial_output <- stagg::staggregate_polynomial(
  
  data = gridmet_pr,                # The GridMET data retrived from ClimateR
  
  overlay_weights = county_weights, # Weights for grid cells within each county 
                                    # in New Jersey
  
  daily_agg = "none",               # Because climateR provides data at the 
                                    # daily level already, it is not necessary
                                    # to aggregate it up. If it were finer and
                                    # we did want to aggregate it, we would 
                                    # choose "sum" since we would want the sum 
                                    # of precipitation over the time_agg of 
                                    # choice. 
  
  time_agg = "month",               # The temporal level to aggregate daily 
                                    # transformed values to. Current options are 
                                    # "hour", day", "month", and "year". Note 
                                    # that "hour" is only available if daily_agg
                                    # is set to "none"
  
  degree = 3                        # The highest order of the polynomial. Here 
                                    # this will create variable 3 columns: 
                                    # order_1, order_2, and order_3
  )
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month

polynomial_output
#>       year month poly_id   order_1   order_2    order_3
#>      <num> <num>  <char>     <num>     <num>      <num>
#>   1:  2010     1     037  56.75739 1327.2879  39806.128
#>   2:  2010     2     037  97.13325 2434.7100  80308.805
#>   3:  2010     3     037 214.54942 9825.3591 563559.235
#>   4:  2010     4     037  72.66265 1221.0374  22493.601
#>   5:  2010     5     037  77.19771 1183.7247  23434.094
#>  ---                                                   
#> 266:  2010     8     021  32.03729  276.7235   4083.636
#> 267:  2010     9     021 129.12420 4667.6112 265225.311
#> 268:  2010    10     021  60.02752  520.8648   5385.785
#> 269:  2010    11     021  60.80051  809.1176  13799.290
#> 270:  2010    12     021  73.83445 1502.6363  33897.406

Using data from any source using manual time specification

Other data types with layer names that do not follow the stagg naming convention can also be used with stagg by specifying the start date and time interval of the data when calling staggregate_*().

First, we rename all the layers to a format not compatible with stagg.

names(gridmet_pr) <- rep(NA, length(names(gridmet_pr)))
names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "layer.1" "layer.2" "layer.3"

We can still use staggregate in the same way as above and get the same output, but simply have to manually specify the start_date and time_interval of the data we are using.

polynomial_output <- stagg::staggregate_polynomial(
  
  data = gridmet_pr,                # The GridMET data retrived from ClimateR, 
                                    # but with NA layer names
  
  overlay_weights = county_weights, 
  
  daily_agg = "none",           
  
  time_agg = "month",       
  
  start_date = "2010-01-01",        # Our data starts on January 1, 2010
  
  time_interval = "1 day",          # Our data has a time interval of one day
  
  degree = 3
  )
#> Rewriting the data's temporal metadata (layer names) to reflect a dataset starting on the supplied start date and with a temporal interval of 1 day
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month

The new output is identical to the one obtained above.

polynomial_output
#>       year month poly_id   order_1   order_2    order_3
#>      <num> <num>  <char>     <num>     <num>      <num>
#>   1:  2010     1     037  56.75739 1327.2879  39806.128
#>   2:  2010     2     037  97.13325 2434.7100  80308.805
#>   3:  2010     3     037 214.54942 9825.3591 563559.235
#>   4:  2010     4     037  72.66265 1221.0374  22493.601
#>   5:  2010     5     037  77.19771 1183.7247  23434.094
#>  ---                                                   
#> 266:  2010     8     021  32.03729  276.7235   4083.636
#> 267:  2010     9     021 129.12420 4667.6112 265225.311
#> 268:  2010    10     021  60.02752  520.8648   5385.785
#> 269:  2010    11     021  60.80051  809.1176  13799.290
#> 270:  2010    12     021  73.83445 1502.6363  33897.406