Using data from climateR or other sources with stagg

Introduction

This vignette demonstrates how to use the stagg package with data from climateR, as well as other data sources that do not have temporal metadata in a format that stagg accepts in the layer names.

First, we load the packages used in this vignette.

library(climateR) # Obtain climate data
library(tigris) # Polygons to request and aggregate climate data over
library(stagg) # Aggregate the climate data

climateR example: Working with GridMET Data

The climateR package provides access to data from various sources, such as GridMET or CHIRPS.

The raster layer names for data retrieved from climateR, like that of ERA5 data, are compatible with stagg and therefore can be used without any additional steps. Specifically, the layer name format compatible with stagg uses a character-date-time or character-date format following a string header, or the start date-time and temporal time steps of the raster stack must be specified in staggregate_*(). For example, ERA5 uses the format x2021.01.01.00.00.00 or x2021.01.01, and data retrieved using climateR uses the format variable_2021-01-01.

Load New Jersey Counties

First, we load the county boundaries for New Jersey using the tigris package.

nj_counties <- tigris::counties("NJ")

Retrieve GridMET Data

Next, we use the climateR package to retrieve precipitation data from GridMET for New Jersey counties for the year 2010.

gridmet_pr <- climateR::getGridMET(
  AOI = nj_counties,
  varname = "pr",
  startDate = "2010-01-01",
  endDate  = "2010-12-31"
)[[1]] |>  
  raster::stack()

We can print the layer names to ensure their compatibility with stagg.

names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "pr_2010.01.01" "pr_2010.01.02" "pr_2010.01.03"

Calculate Overlay Weights

We calculate overlay weights for the New Jersey counties using the retrieved GridMET data.

county_weights <- stagg::overlay_weights(
  polygons = nj_counties,       # Simple features object with the desired polygons
  polygon_id_col = "COUNTYFP",  # Column name containing polygon identifiers
  grid = gridmet_pr             # Raster layer with the same coordinate system and spatial resolution as the climate data
)
#> Checking for raster/polygon alignment
#> Extracting raster polygon overlap
#> Checking sum of weights within polygons
#> All weights sum to 1.

print(county_weights, digits = 4)
#>           x     y poly_id    w_area w_sum
#>       <num> <num>  <char>     <num> <num>
#>    1: 285.2 41.36     037 0.0007648     1
#>    2: 285.3 41.36     037 0.0029719     1
#>    3: 285.3 41.36     037 0.0038060     1
#>    4: 285.4 41.36     037 0.0001875     1
#>    5: 285.2 41.32     037 0.0022975     1
#>   ---                                    
#> 1891: 285.4 40.19     021 0.0095147     1
#> 1892: 285.3 40.15     021 0.0047411     1
#> 1893: 285.3 40.15     021 0.0010524     1
#> 1894: 285.4 40.15     021 0.0111197     1
#> 1895: 285.4 40.15     021 0.0116498     1

Perform Polynomial Aggregation

We perform polynomial aggregation on the GridMET data using the calculated overlay weights.

polynomial_output <- stagg::staggregate_polynomial(
  
  data = gridmet_pr,                # The GridMET data retrived from ClimateR
  
  overlay_weights = county_weights, # Weights for grid cells within each county 
                                    # in New Jersey
  
  daily_agg = "none",               # Because climateR provides data at the 
                                    # daily level already, it is not necessary
                                    # to aggregate it up. If it were finer and
                                    # we did want to aggregate it, we would 
                                    # choose "sum" since we would want the sum 
                                    # of precipitation over the time_agg of 
                                    # choice. 
  
  time_agg = "month",               # The temporal level to aggregate daily 
                                    # transformed values to. Current options are 
                                    # "hour", day", "month", and "year". Note 
                                    # that "hour" is only available if daily_agg
                                    # is set to "none"
  
  degree = 3                        # The highest order of the polynomial. Here 
                                    # this will create variable 3 columns: 
                                    # order_1, order_2, and order_3
  )
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month

polynomial_output
#>       year month poly_id   order_1   order_2    order_3
#>      <num> <num>  <char>     <num>     <num>      <num>
#>   1:  2010     1     037  30.79827  698.5901  19485.618
#>   2:  2010     2     037  49.60716 1194.6476  36381.718
#>   3:  2010     3     037 107.00895 4598.5119 248442.534
#>   4:  2010     4     037  39.93862  678.2517  12457.194
#>   5:  2010     5     037  43.34992  703.7214  14583.161
#>  ---                                                   
#> 269:  2010     9     021  69.36754 3093.8891 208694.340
#> 270:  2010    10     021  31.27158  277.4310   2929.496
#> 271:  2010    11     021  30.83565  418.2693   7360.483
#> 272:  2010    12     021  36.97638  794.4837  19675.276
#> 273:    NA    NA     021        NA        NA         NA

Using data from any source using manual time specification

Other data types with layer names that do not follow the stagg naming convention can also be used with stagg by specifying the start date and time interval of the data when calling staggregate_*().

First, we rename all the layers to a format not compatible with stagg.

names(gridmet_pr) <- rep(NA, length(names(gridmet_pr)))
names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "layer.1" "layer.2" "layer.3"

We can still use staggregate in the same way as above and get the same output, but simply have to manually specify the start_date and time_interval of the data we are using.

polynomial_output <- stagg::staggregate_polynomial(
  
  data = gridmet_pr,                # The GridMET data retrived from ClimateR, 
                                    # but with NA layer names
  
  overlay_weights = county_weights, 
  
  daily_agg = "none",           
  
  time_agg = "month",       
  
  start_date = "2010-01-01",        # Our data starts on January 1, 2010
  
  time_interval = "1 day",          # Our data has a time interval of one day
  
  degree = 3
  )
#> Rewriting the data's temporal metadata (layer names) to reflect a dataset starting on the supplied start date and with a temporal interval of 1 day
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month

The new output is identical to the one obtained above.

polynomial_output
#>       year month poly_id   order_1   order_2    order_3
#>      <num> <num>  <char>     <num>     <num>      <num>
#>   1:  2010     1     037  30.79827  698.5901  19485.618
#>   2:  2010     2     037  49.60716 1194.6476  36381.718
#>   3:  2010     3     037 107.00895 4598.5119 248442.534
#>   4:  2010     4     037  39.93862  678.2517  12457.194
#>   5:  2010     5     037  43.34992  703.7214  14583.161
#>  ---                                                   
#> 269:  2010     9     021  69.36754 3093.8891 208694.340
#> 270:  2010    10     021  31.27158  277.4310   2929.496
#> 271:  2010    11     021  30.83565  418.2693   7360.483
#> 272:  2010    12     021  36.97638  794.4837  19675.276
#> 273:    NA    NA     021        NA        NA         NA

Tyler Liddell, Anna Boser, Sara Orofino, Tracey Mangin, and Tamma Carleton