Using data from climateR or other sources with stagg
Tyler Liddell, Anna Boser, Sara Orofino, Tracey Mangin, and Tamma Carleton
2024-08-20
Source:vignettes/data_sources.Rmd
data_sources.Rmd
Introduction
This vignette demonstrates how to use the stagg
package
with data from climateR
, as well as other data sources that
do not have temporal metadata in a format that stagg
accepts in the layer names.
First, we load the packages used in this vignette.
climateR example: Working with GridMET Data
The climateR
package provides access to data from
various sources, such as GridMET
or
CHIRPS
.
The raster layer names for data retrieved from climateR
,
like that of ERA5
data, are compatible with
stagg
and therefore can be used without any additional
steps. Specifically, the layer name format compatible with
stagg
uses a character-date-time
or
character-date format
following a string header, or the
start date-time and temporal time steps of the raster stack must be
specified in staggregate_*()
. For example,
ERA5
uses the format x2021.01.01.00.00.00
or x2021.01.01
, and data retrieved using
climateR
uses the format
variable_2021-01-01
.
Load New Jersey Counties
First, we load the county boundaries for New Jersey using the
tigris
package.
nj_counties <- tigris::counties("NJ")
Retrieve GridMET Data
Next, we use the climateR
package to retrieve
precipitation data from GridMET
for New Jersey counties for
the year 2010.
gridmet_pr <- climateR::getGridMET(
AOI = nj_counties,
varname = "pr",
startDate = "2010-01-01",
endDate = "2010-12-31"
)[[1]] |>
raster::stack()
We can print the layer names to ensure their compatibility with
stagg
.
names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "pr_2010.01.01" "pr_2010.01.02" "pr_2010.01.03"
Calculate Overlay Weights
We calculate overlay weights for the New Jersey counties using the retrieved GridMET data.
county_weights <- stagg::overlay_weights(
polygons = nj_counties, # Simple features object with the desired polygons
polygon_id_col = "COUNTYFP", # Column name containing polygon identifiers
grid = gridmet_pr # Raster layer with the same coordinate system and spatial resolution as the climate data
)
#> Checking for raster/polygon alignment
#> Extracting raster polygon overlap
#> Checking sum of weights within polygons
#> All weights sum to 1.
print(county_weights, digits = 4)
#> x y poly_id w_area w_sum
#> <num> <num> <char> <num> <num>
#> 1: 285.2 41.36 037 0.0007648 1
#> 2: 285.3 41.36 037 0.0029719 1
#> 3: 285.3 41.36 037 0.0038060 1
#> 4: 285.4 41.36 037 0.0001875 1
#> 5: 285.2 41.32 037 0.0022975 1
#> ---
#> 1891: 285.4 40.19 021 0.0095147 1
#> 1892: 285.3 40.15 021 0.0047411 1
#> 1893: 285.3 40.15 021 0.0010524 1
#> 1894: 285.4 40.15 021 0.0111197 1
#> 1895: 285.4 40.15 021 0.0116498 1
Perform Polynomial Aggregation
We perform polynomial aggregation on the GridMET data using the calculated overlay weights.
polynomial_output <- stagg::staggregate_polynomial(
data = gridmet_pr, # The GridMET data retrived from ClimateR
overlay_weights = county_weights, # Weights for grid cells within each county
# in New Jersey
daily_agg = "none", # Because climateR provides data at the
# daily level already, it is not necessary
# to aggregate it up. If it were finer and
# we did want to aggregate it, we would
# choose "sum" since we would want the sum
# of precipitation over the time_agg of
# choice.
time_agg = "month", # The temporal level to aggregate daily
# transformed values to. Current options are
# "hour", day", "month", and "year". Note
# that "hour" is only available if daily_agg
# is set to "none"
degree = 3 # The highest order of the polynomial. Here
# this will create variable 3 columns:
# order_1, order_2, and order_3
)
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month
polynomial_output
#> year month poly_id order_1 order_2 order_3
#> <num> <num> <char> <num> <num> <num>
#> 1: 2010 1 037 56.75739 1327.2879 39806.128
#> 2: 2010 2 037 97.13325 2434.7100 80308.805
#> 3: 2010 3 037 214.54942 9825.3591 563559.235
#> 4: 2010 4 037 72.66265 1221.0374 22493.601
#> 5: 2010 5 037 77.19771 1183.7247 23434.094
#> ---
#> 266: 2010 8 021 32.03729 276.7235 4083.636
#> 267: 2010 9 021 129.12420 4667.6112 265225.311
#> 268: 2010 10 021 60.02752 520.8648 5385.785
#> 269: 2010 11 021 60.80051 809.1176 13799.290
#> 270: 2010 12 021 73.83445 1502.6363 33897.406
Using data from any source using manual time specification
Other data types with layer names that do not follow the
stagg
naming convention can also be used with stagg by
specifying the start date and time interval of the data when calling
staggregate_*()
.
First, we rename all the layers to a format not compatible with
stagg
.
names(gridmet_pr) <- rep(NA, length(names(gridmet_pr)))
names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "layer.1" "layer.2" "layer.3"
We can still use staggregate in the same way as above and get the
same output, but simply have to manually specify the
start_date
and time_interval
of the data we
are using.
polynomial_output <- stagg::staggregate_polynomial(
data = gridmet_pr, # The GridMET data retrived from ClimateR,
# but with NA layer names
overlay_weights = county_weights,
daily_agg = "none",
time_agg = "month",
start_date = "2010-01-01", # Our data starts on January 1, 2010
time_interval = "1 day", # Our data has a time interval of one day
degree = 3
)
#> Rewriting the data's temporal metadata (layer names) to reflect a dataset starting on the supplied start date and with a temporal interval of 1 day
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month
The new output is identical to the one obtained above.
polynomial_output
#> year month poly_id order_1 order_2 order_3
#> <num> <num> <char> <num> <num> <num>
#> 1: 2010 1 037 56.75739 1327.2879 39806.128
#> 2: 2010 2 037 97.13325 2434.7100 80308.805
#> 3: 2010 3 037 214.54942 9825.3591 563559.235
#> 4: 2010 4 037 72.66265 1221.0374 22493.601
#> 5: 2010 5 037 77.19771 1183.7247 23434.094
#> ---
#> 266: 2010 8 021 32.03729 276.7235 4083.636
#> 267: 2010 9 021 129.12420 4667.6112 265225.311
#> 268: 2010 10 021 60.02752 520.8648 5385.785
#> 269: 2010 11 021 60.80051 809.1176 13799.290
#> 270: 2010 12 021 73.83445 1502.6363 33897.406