library(dplyr)
library(lubridate)
library(arrow)
library(readr)
library(climindi) # http://rfsaldanha.github.io/climindi/
library(zendown) # https://rfsaldanha.github.io/zendown/
Introduction
Climatological normals are averages of climate variables observed for a given time – usually months – in a 30-year range. The normals are usually used as comparison benchmarks against recent or current conditions, being useful to recognize anomalies and to characterize global warming impacts.
The normals are usually computed with surface meteorological stations data, maintained by government meteorological institutions, and its distribution over the territory may be scarce or uneven, like in Brazil. Thus, the availability of climatological normals to different regions and administrative divisions like municipalities is difficult to obtain.
I propose here a method to compute monthly climatological normals for the 1961–1990 and monthly aggregated climate indicators from 1991 to 2023 for Brazilian municipalities using a climate reanalysis dataset.
In a hurry? Jump to the download section ;-)
Methods
A climatological normal can be computed with data from different sources, including remote sensing sensors and “area averages or points in gridded datasets” (WMO 2017). Some gridded climatological datasets are available for the Brazilian territory, including the ERA5-Land from Copernicus (Muñoz-Sabater et al. 2021) and the BR-DWGD dataset (Xavier et al. 2022), offering several climatological indicators for a long time range, and continuously updates.
Some research methods demands that climate data must be aggregated in the same spatial and temporal units of other data to be used in statistical models, being a fairly common procedure in epidemiology and economy studies. In order to approach this issue, spatial gridded data can be aggregated using zonal statistics (Saldanha et al. 2024).
I used the same methodology from the study above to create zonal statistics of climate indicators from the BR-DWGD project (Xavier et al. 2022). This data is available here, presenting climatological indicators from 1961 to March 2024 for all Brazilian municipalities. I propose here to compute the climatological normals and other climate aggregated indicators from this dataset of zonal statistics.
In order to compute these indicators, I created an R package named {climindi}. The package provides helper functions to compute climatological normals and other aggregated indicators in a tidy way.
Normal indicators
The {climindi} package computes the average, 10th and 90th percentile as climatological normals.
Aggregated indicators
The {climindi} package present functions to compute the following statistics for time-aggregated data: count of data points, average, median, standard deviation, standard error, maximum and minimum values, the 10th, 25th, 75th and 90th percentiles, and indicator-specific indicators, listed bellow.
- Precipitation
- Rain spells: count of rain spells occurrences, with 3 and 5 or more consecutive days with rain above the climatological normal average value
- Count of days with precipitation above 1mm, 5mm, 10mm, 50mm, and 100mm
- Count of sequences of 3 days, 5 days, 10 days, 15 days, 20 days, and 25 days or more without precipitation
- Maximum temperature
- Heat waves: Count of heat waves occurrences, with 3 and 5 or more consecutive days with maximum temperature above the climatological normal value plus 5 Celsius degrees
- Hot days: count of warm days, when the maximum temperature is above the normal 90th percentile
- Count of days with temperatures above or equal to 25, 30, 35, and 40 Celsius degrees
- Minimum temperature
- Cold spells: count of cold spells occurrences, with 3 and 5 or more consecutive days with minimum temperature bellow the climatological normal value minus 5 Celsius degrees
- Cold days: count of cold days, when the minimum temperature is bellow the normal 10th percentile
- Count of days with temperatures bellow or equal to 0, 5, 10, 15, and 20 Celsius degrees
- Relative humidity
- Count of dry spells occurrences, with 3 and 5 or more consecutive days with relative humidity bellow the climatological normal value minus 10 percent
- Count of wet spells occurrences, with 3 and 5 or more consecutive days with relative humidity above the climatological normal value plus 10 percent
- Count of dry days, when the relative humidity is bellow the normal 10th percentile
- Count of wet days, when the relative humidity is above the normal 90th percentile
- Count of days with relative humidity between 21% and 30% (Attention level)
- Count of days with relative humidity between 12% and 20% (Alert level)
- Count of days with relative humidity bellow 12% (Emergence level)
- Wind speed
- Count of sequences of 3 and 5 days or more with wind speed bellow the climatological average normal
- Count of sequences of 3 and 5 days or more with wind speed above the climatological average normal
- Evapotranspirations
- Count of sequences of 3 and 5 days or more with evapotranspirations bellow the climatological average normal
- Count of sequences of 3 and 5 days or more with evapotranspirations above the climatological average normal
- Solar radiation
- Count of sequences of 3 and 5 days or more with solar radiation bellow the climatological normal
- Count of sequences of 3 and 5 days or more with solar radiation above the climatological normal
Data source
The zonal statistics for the Brazilian municipalities computed with the BR-DWGD project is described here. We can use the {zendown} package to download the data files directly from from Zenodo.
Packages
To perform those computations, I needed to increase the envinroment variable R_MAX_VSIZE
, as explained here.
Precipitation (mm)
Data
<- zen_file(13906834, "pr_3.2.3.parquet") |>
pr_data open_dataset() |>
filter(name == "pr_3.2.3_mean") |>
filter(date >= as.Date("1961-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
Normal
<- pr_data |>
pr_normal # Identify month
mutate(month = month(date)) |>
# Group by id variable and month
group_by(code_muni, month) |>
# Compute normal
summarise_normal(
date_var = date, value_var = value,
year_start = 1961, year_end = 1990
|>
) # Ungroup
ungroup()
Indicators
<- pr_data |>
pr_indi # Identify year
mutate(year = year(date)) |>
# Identify month
mutate(month = month(date)) |>
# Filter year
filter(year >= 1991) |>
# Group by id variable, year and month
group_by(code_muni, year, month) |>
# Compute precipitation indicators
summarise_precipitation(
value_var = value,
normals_df = pr_normal
|>
) # Ungroup
ungroup()
Export
write_parquet(x = pr_normal, sink = "pr_normal.parquet")
write_csv2(x = pr_normal, file = "pr_normal.csv")
write_parquet(x = pr_indi, sink = "pr_indi.parquet")
write_csv2(x = pr_indi, file = "pr_indi.csv")
Evapotranspiration (mm)
Data
<- zen_file(13906834, "ETo_3.2.3.parquet") |>
eto_data open_dataset() |>
filter(name == "ETo_3.2.3_mean") |>
filter(date >= as.Date("1961-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
Normal
<- eto_data |>
eto_normal # Identify month
mutate(month = month(date)) |>
# Group by id variable and month
group_by(code_muni, month) |>
# Compute normal
summarise_normal(
date_var = date, value_var = value,
year_start = 1961, year_end = 1990
|>
) # Ungroup
ungroup()
Indicators
<- eto_data |>
eto_indi # Identify year
mutate(year = year(date)) |>
# Identify month
mutate(month = month(date)) |>
# Filter year
filter(year >= 1991) |>
# Group by id variable, year and month
group_by(code_muni, year, month) |>
# Compute precipitation indicators
summarise_evapotrapiration(
value_var = value,
normals_df = eto_normal
|>
) # Ungroup
ungroup()
Export
write_parquet(x = eto_normal, sink = "eto_normal.parquet")
write_csv2(x = eto_normal, file = "eto_normal.csv")
write_parquet(x = eto_indi, sink = "eto_indi.parquet")
write_csv2(x = eto_indi, file = "eto_indi.csv")
Maximum temperature (°C)
Data
<- zen_file(13906834, "Tmax_3.2.3.parquet") |>
tmax_data open_dataset() |>
filter(name == "Tmax_3.2.3_mean") |>
filter(date >= as.Date("1961-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
Normal
<- tmax_data |>
tmax_normal # Identify month
mutate(month = month(date)) |>
# Group by id variable and month
group_by(code_muni, month) |>
# Compute normal
summarise_normal(
date_var = date, value_var = value,
year_start = 1961, year_end = 1990
|>
) # Ungroup
ungroup()
Indicators
<- tmax_data |>
tmax_indi # Identify year
mutate(year = year(date)) |>
# Identify month
mutate(month = month(date)) |>
# Filter year
filter(year >= 1991) |>
# Group by id variable, year and month
group_by(code_muni, year, month) |>
# Compute precipitation indicators
summarise_temp_max(
value_var = value,
normals_df = tmax_normal
|>
) # Ungroup
ungroup()
Export
write_parquet(x = tmax_normal, sink = "tmax_normal.parquet")
write_csv2(x = tmax_normal, file = "tmax_normal.csv")
write_parquet(x = tmax_indi, sink = "tmax_indi.parquet")
write_csv2(x = tmax_indi, file = "tmax_indi.csv")
Minimum temperature (°C)
Data
<- zen_file(13906834, "Tmin_3.2.3.parquet") |>
tmin_data open_dataset() |>
filter(name == "Tmin_3.2.3_mean") |>
filter(date >= as.Date("1961-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
Normal
<- tmin_data |>
tmin_normal # Identify month
mutate(month = month(date)) |>
# Group by id variable and month
group_by(code_muni, month) |>
# Compute normal
summarise_normal(
date_var = date, value_var = value,
year_start = 1961, year_end = 1990
|>
) # Ungroup
ungroup()
Indicators
<- tmin_data |>
tmin_indi # Identify year
mutate(year = year(date)) |>
# Identify month
mutate(month = month(date)) |>
# Filter year
filter(year >= 1991) |>
# Group by id variable, year and month
group_by(code_muni, year, month) |>
# Compute precipitation indicators
summarise_temp_min(
value_var = value,
normals_df = tmin_normal
|>
) # Ungroup
ungroup()
Export
write_parquet(x = tmin_normal, sink = "tmin_normal.parquet")
write_csv2(x = tmin_normal, file = "tmin_normal.csv")
write_parquet(x = tmin_indi, sink = "tmin_indi.parquet")
write_csv2(x = tmin_indi, file = "tmin_indi.csv")
Solar radiation (MJm-2)
Data
<- zen_file(13906834, "Rs_3.2.3.parquet") |>
rs_data open_dataset() |>
filter(name == "Rs_3.2.3_mean") |>
filter(date >= as.Date("1961-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
Normal
<- rs_data |>
rs_normal # Identify month
mutate(month = month(date)) |>
# Group by id variable and month
group_by(code_muni, month) |>
# Compute normal
summarise_normal(
date_var = date, value_var = value,
year_start = 1961, year_end = 1990
|>
) # Ungroup
ungroup()
Indicators
<- rs_data |>
rs_indi # Identify year
mutate(year = year(date)) |>
# Identify month
mutate(month = month(date)) |>
# Filter year
filter(year >= 1991) |>
# Group by id variable, year and month
group_by(code_muni, year, month) |>
# Compute precipitation indicators
summarise_solar_radiation(
value_var = value,
normals_df = rs_normal
|>
) # Ungroup
ungroup()
Export
write_parquet(x = rs_normal, sink = "rs_normal.parquet")
write_csv2(x = rs_normal, file = "rs_normal.csv")
write_parquet(x = rs_indi, sink = "rs_indi.parquet")
write_csv2(x = rs_indi, file = "rs_indi.csv")
Wind speed at 2m height (m/s)
Data
<- zen_file(13906834, "u2_3.2.3.parquet") |>
u2_data open_dataset() |>
filter(name == "u2_3.2.3_mean") |>
filter(date >= as.Date("1961-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
Normal
<- u2_data |>
u2_normal # Identify month
mutate(month = month(date)) |>
# Group by id variable and month
group_by(code_muni, month) |>
# Compute normal
summarise_normal(
date_var = date, value_var = value,
year_start = 1961, year_end = 1990
|>
) # Ungroup
ungroup()
Indicators
<- u2_data |>
u2_indi # Identify year
mutate(year = year(date)) |>
# Identify month
mutate(month = month(date)) |>
# Filter year
filter(year >= 1991) |>
# Group by id variable, year and month
group_by(code_muni, year, month) |>
# Compute precipitation indicators
summarise_windspeed(
value_var = value,
normals_df = u2_normal
|>
) # Ungroup
ungroup()
Export
write_parquet(x = u2_normal, sink = "u2_normal.parquet")
write_csv2(x = u2_normal, file = "u2_normal.csv")
write_parquet(x = u2_indi, sink = "u2_indi.parquet")
write_csv2(x = u2_indi, file = "u2_indi.csv")
Relative humidity (%)
Data
<- zen_file(13906834, "RH_3.2.3.parquet") |>
rh_data open_dataset() |>
filter(name == "RH_3.2.3_mean") |>
filter(date >= as.Date("1961-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
Normal
<- rh_data |>
rh_normal # Identify month
mutate(month = month(date)) |>
# Group by id variable and month
group_by(code_muni, month) |>
# Compute normal
summarise_normal(
date_var = date, value_var = value,
year_start = 1961, year_end = 1990
|>
) # Ungroup
ungroup()
Indicators
<- rh_data |>
rh_indi # Identify year
mutate(year = year(date)) |>
# Identify month
mutate(month = month(date)) |>
# Filter year
filter(year >= 1991) |>
# Group by id variable, year and month
group_by(code_muni, year, month) |>
# Compute precipitation indicators
summarise_rel_humidity(
value_var = value,
normals_df = rh_normal
|>
) # Ungroup
ungroup()
Export
write_parquet(x = rh_normal, sink = "rh_normal.parquet")
write_csv2(x = rh_normal, file = "rh_normal.csv")
write_parquet(x = rh_indi, sink = "rh_indi.parquet")
write_csv2(x = rh_indi, file = "rh_indi.csv")
Results and dataset download
The climatological normals and aggregated indicators of Brazilian municipalities can be downloaded from Zenodo on CSV and parquet formats. Click the link bellow to access and download the data.
You can also download the dataset directly from R, using the {zendown} package.
Let’s check some results.
Maximum temperature, Rio de Janeiro, RJ, 2023
Observed and normal
<- zen_file(13906834, "Tmax_3.2.3.parquet") |>
tmax_data open_dataset() |>
filter(name == "Tmax_3.2.3_mean") |>
filter(code_muni == 3304557) |>
filter(date >= as.Date("2023-01-01")) |>
filter(date <= as.Date("2023-12-31")) |>
select(-name) |>
collect()
<- zen_file(13934888, "tmax_normal.parquet") |>
tmax_normal open_dataset() |>
filter(code_muni == 3304557) |>
collect()
library(ggplot2)
library(tidyr)
<- tmax_normal |>
tmax_normal_exp mutate(date = as_date(paste0("2023-",month,"-01"))) |>
group_by(month) %>%
expand(
date = seq.Date(floor_date(date, unit = "month"), ceiling_date(date, unit="month")-days(1), by="day"), normal_mean, normal_p10, normal_p90
|>
) pivot_longer(cols = starts_with("normal_")) |>
mutate(name = substr(name, 8, 100))
ggplot() +
geom_line(data = tmax_data, aes(x = date, y = value)) +
geom_line(data = tmax_normal_exp, aes(x = date, y = value, color = name)) +
theme_bw() +
labs(
title = "Maximum temperature and climatological normal",
subtitle = "Rio de Janeiro, RJ",
color = "Normal (1961-1990)",
x = "Date", y = "Celsius degrees"
+
) theme(legend.position = "bottom", legend.direction = "horizontal")
Indicators
library(gt)
zen_file(13934888, "tmax_indi.parquet") |>
open_dataset() |>
filter(code_muni == 3304557) |>
filter(year == 2023) |>
select(-code_muni, -year) |>
collect() |>
gt() |>
fmt_number(
columns = where(is.double),
decimals = 2,
use_seps = FALSE
)
month | count | normal_mean | normal_p10 | normal_p90 | mean | median | sd | se | max | min | p10 | p25 | p75 | p90 | heat_waves_3d | heat_waves_5d | hot_days | t_25 | t_30 | t_35 | t_40 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.00 | 31 | 31.73 | 26.93 | 36.51 | 30.56 | 31.44 | 3.92 | 0.70 | 35.32 | 21.36 | 23.99 | 29.28 | 32.96 | 34.65 | 0 | 0 | 0 | 2 | 4 | 1 | 0 |
2.00 | 28 | 32.23 | 27.80 | 36.05 | 32.67 | 32.22 | 2.47 | 0.47 | 37.20 | 28.26 | 30.04 | 30.92 | 34.91 | 35.86 | 0 | 0 | 3 | 1 | 3 | 5 | 0 |
3.00 | 31 | 31.15 | 26.95 | 34.75 | 31.63 | 31.01 | 2.21 | 0.40 | 36.72 | 27.88 | 29.21 | 30.32 | 33.14 | 35.03 | 0 | 0 | 4 | 1 | 5 | 4 | 0 |
4.00 | 30 | 28.83 | 25.03 | 32.98 | 27.86 | 27.79 | 2.33 | 0.42 | 32.16 | 23.10 | 24.92 | 26.17 | 29.54 | 30.68 | 0 | 0 | 0 | 3 | 4 | 0 | 0 |
5.00 | 31 | 27.19 | 23.23 | 31.03 | 26.33 | 27.45 | 2.55 | 0.46 | 30.42 | 20.67 | 23.26 | 24.11 | 28.06 | 29.22 | 0 | 0 | 0 | 3 | 1 | 0 | 0 |
6.00 | 30 | 26.14 | 21.77 | 30.30 | 26.55 | 26.39 | 2.56 | 0.47 | 31.11 | 21.69 | 23.39 | 25.09 | 28.06 | 30.46 | 0 | 0 | 2 | 2 | 2 | 0 | 0 |
7.00 | 31 | 25.85 | 21.44 | 30.29 | 25.99 | 25.40 | 3.40 | 0.61 | 35.50 | 20.93 | 22.13 | 23.04 | 28.41 | 29.46 | 0 | 0 | 3 | 6 | 3 | 1 | 0 |
8.00 | 31 | 26.79 | 21.34 | 31.84 | 27.29 | 26.51 | 4.91 | 0.88 | 37.73 | 17.49 | 22.32 | 24.37 | 30.79 | 33.19 | 1 | 0 | 4 | 6 | 5 | 1 | 0 |
9.00 | 30 | 26.82 | 21.49 | 32.93 | 30.39 | 30.81 | 4.17 | 0.76 | 36.94 | 21.40 | 23.91 | 27.73 | 33.35 | 35.00 | 3 | 0 | 6 | 4 | 6 | 3 | 0 |
10.00 | 31 | 27.59 | 22.33 | 33.40 | 28.49 | 29.49 | 4.54 | 0.82 | 37.85 | 20.91 | 23.06 | 24.81 | 32.65 | 33.71 | 1 | 0 | 4 | 5 | 6 | 1 | 0 |
11.00 | 30 | 29.02 | 23.87 | 34.44 | 31.21 | 30.75 | 4.79 | 0.88 | 40.94 | 24.80 | 25.35 | 26.20 | 34.72 | 37.07 | 1 | 1 | 5 | 4 | 5 | 5 | 1 |
12.00 | 31 | 30.41 | 25.32 | 35.41 | 31.30 | 31.32 | 3.27 | 0.59 | 38.47 | 25.13 | 26.78 | 29.20 | 33.47 | 34.47 | 0 | 0 | 2 | 1 | 5 | 3 | 0 |
Session info
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS 15.0.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Paris
date 2024-10-16
pandoc 3.2 @ /Applications/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
arrow * 17.0.0.1 2024-08-21 [1] CRAN (R 4.3.3)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.3.0)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.3.3)
bit 4.5.0 2024-09-20 [1] CRAN (R 4.3.3)
bit64 4.5.2 2024-09-22 [1] CRAN (R 4.3.3)
checkmate 2.3.2 2024-07-29 [1] CRAN (R 4.3.3)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.3)
climindi * 0.0.0.9000 2024-10-16 [1] local
colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.3.3)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.3)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
evaluate 1.0.1 2024-10-10 [1] CRAN (R 4.3.3)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3)
fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.3.1)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.3.3)
gt * 0.11.1 2024-10-04 [1] CRAN (R 4.3.3)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.3.1)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.3.3)
knitr 1.48 2024-07-07 [1] CRAN (R 4.3.3)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.3.1)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.3)
rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.3.3)
sass 0.4.9 2024-03-15 [1] CRAN (R 4.3.1)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
withr 3.0.1 2024-07-31 [1] CRAN (R 4.3.3)
xfun 0.48 2024-10-03 [1] CRAN (R 4.3.3)
xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.1)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.3.3)
zendown * 0.1.0 2024-04-18 [1] Github (rfsaldanha/zendown@afbd73a)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────