Using data from climateR or other sources with stagg
Tyler Liddell, Anna Boser, Sara Orofino, Tracey Mangin, and Tamma Carleton
2024-11-05
Source:vignettes/data_sources.Rmd
data_sources.Rmd
Introduction
This vignette demonstrates how to use the stagg
package
with data from climateR
, as well as other data sources that
do not have temporal metadata in a format that stagg
accepts in the layer names.
First, we load the packages used in this vignette.
climateR example: Working with GridMET Data
The climateR
package provides access to data from
various sources, such as GridMET
or
CHIRPS
.
The raster layer names for data retrieved from climateR
,
like that of ERA5
data, are compatible with
stagg
and therefore can be used without any additional
steps. Specifically, the layer name format compatible with
stagg
uses a character-date-time
or
character-date format
following a string header, or the
start date-time and temporal time steps of the raster stack must be
specified in staggregate_*()
. For example,
ERA5
uses the format x2021.01.01.00.00.00
or x2021.01.01
, and data retrieved using
climateR
uses the format
variable_2021-01-01
.
Load New Jersey Counties
First, we load the county boundaries for New Jersey using the
tigris
package.
nj_counties <- tigris::counties("NJ")
Retrieve GridMET Data
Next, we use the climateR
package to retrieve
precipitation data from GridMET
for New Jersey counties for
the year 2010.
gridmet_pr <- climateR::getGridMET(
AOI = nj_counties,
varname = "pr",
startDate = "2010-01-01",
endDate = "2010-12-31"
)[[1]] |>
raster::stack()
We can print the layer names to ensure their compatibility with
stagg
.
names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "pr_2010.01.01" "pr_2010.01.02" "pr_2010.01.03"
Calculate Overlay Weights
We calculate overlay weights for the New Jersey counties using the retrieved GridMET data.
county_weights <- stagg::overlay_weights(
polygons = nj_counties, # Simple features object with the desired polygons
polygon_id_col = "COUNTYFP", # Column name containing polygon identifiers
grid = gridmet_pr # Raster layer with the same coordinate system and spatial resolution as the climate data
)
#> Checking for raster/polygon alignment
#> Extracting raster polygon overlap
#> Checking sum of weights within polygons
#> All weights sum to 1.
print(county_weights, digits = 4)
#> x y poly_id w_area w_sum
#> <num> <num> <char> <num> <num>
#> 1: 285.2 41.36 037 0.0007648 1
#> 2: 285.3 41.36 037 0.0029719 1
#> 3: 285.3 41.36 037 0.0038060 1
#> 4: 285.4 41.36 037 0.0001875 1
#> 5: 285.2 41.32 037 0.0022975 1
#> ---
#> 1891: 285.4 40.19 021 0.0095147 1
#> 1892: 285.3 40.15 021 0.0047411 1
#> 1893: 285.3 40.15 021 0.0010524 1
#> 1894: 285.4 40.15 021 0.0111197 1
#> 1895: 285.4 40.15 021 0.0116498 1
Perform Polynomial Aggregation
We perform polynomial aggregation on the GridMET data using the calculated overlay weights.
polynomial_output <- stagg::staggregate_polynomial(
data = gridmet_pr, # The GridMET data retrived from ClimateR
overlay_weights = county_weights, # Weights for grid cells within each county
# in New Jersey
daily_agg = "none", # Because climateR provides data at the
# daily level already, it is not necessary
# to aggregate it up. If it were finer and
# we did want to aggregate it, we would
# choose "sum" since we would want the sum
# of precipitation over the time_agg of
# choice.
time_agg = "month", # The temporal level to aggregate daily
# transformed values to. Current options are
# "hour", day", "month", and "year". Note
# that "hour" is only available if daily_agg
# is set to "none"
degree = 3 # The highest order of the polynomial. Here
# this will create variable 3 columns:
# order_1, order_2, and order_3
)
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month
polynomial_output
#> year month poly_id order_1 order_2 order_3
#> <num> <num> <char> <num> <num> <num>
#> 1: 2010 1 037 30.79827 698.5901 19485.618
#> 2: 2010 2 037 49.60716 1194.6476 36381.718
#> 3: 2010 3 037 107.00895 4598.5119 248442.534
#> 4: 2010 4 037 39.93862 678.2517 12457.194
#> 5: 2010 5 037 43.34992 703.7214 14583.161
#> ---
#> 269: 2010 9 021 69.36754 3093.8891 208694.340
#> 270: 2010 10 021 31.27158 277.4310 2929.496
#> 271: 2010 11 021 30.83565 418.2693 7360.483
#> 272: 2010 12 021 36.97638 794.4837 19675.276
#> 273: NA NA 021 NA NA NA
Using data from any source using manual time specification
Other data types with layer names that do not follow the
stagg
naming convention can also be used with stagg by
specifying the start date and time interval of the data when calling
staggregate_*()
.
First, we rename all the layers to a format not compatible with
stagg
.
names(gridmet_pr) <- rep(NA, length(names(gridmet_pr)))
names(gridmet_pr[[1:3]]) # Print the first three layer names
#> [1] "layer.1" "layer.2" "layer.3"
We can still use staggregate in the same way as above and get the
same output, but simply have to manually specify the
start_date
and time_interval
of the data we
are using.
polynomial_output <- stagg::staggregate_polynomial(
data = gridmet_pr, # The GridMET data retrived from ClimateR,
# but with NA layer names
overlay_weights = county_weights,
daily_agg = "none",
time_agg = "month",
start_date = "2010-01-01", # Our data starts on January 1, 2010
time_interval = "1 day", # Our data has a time interval of one day
degree = 3
)
#> Rewriting the data's temporal metadata (layer names) to reflect a dataset starting on the supplied start date and with a temporal interval of 1 day
#> Skipping pre-transformation aggregation to daily level
#> Executing polynomial transformation
#> Aggregating by polygon and month
The new output is identical to the one obtained above.
polynomial_output
#> year month poly_id order_1 order_2 order_3
#> <num> <num> <char> <num> <num> <num>
#> 1: 2010 1 037 30.79827 698.5901 19485.618
#> 2: 2010 2 037 49.60716 1194.6476 36381.718
#> 3: 2010 3 037 107.00895 4598.5119 248442.534
#> 4: 2010 4 037 39.93862 678.2517 12457.194
#> 5: 2010 5 037 43.34992 703.7214 14583.161
#> ---
#> 269: 2010 9 021 69.36754 3093.8891 208694.340
#> 270: 2010 10 021 31.27158 277.4310 2929.496
#> 271: 2010 11 021 30.83565 418.2693 7360.483
#> 272: 2010 12 021 36.97638 794.4837 19675.276
#> 273: NA NA 021 NA NA NA