Skip to contents

The nseq package helps to count the number of sequences of values in a vector that meets conditions of length and magnitude.

As an example, let’s use the airquality dataset. It contains daily air quality measurements in New York, from May to September 1973.

library(nseq)
data("airquality")

head(airquality)
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
#> 4    18     313 11.5   62     5   4
#> 5    NA      NA 14.3   56     5   5
#> 6    28      NA 14.9   66     5   6

Examples

First, let’s consider the temperature measurements taken in June.

june_data <- subset(airquality, Month == 6)
library(ggplot2)

ggplot(june_data, aes(x = Day, y = Temp)) +
  geom_line() +
  geom_point() +
  theme_bw()

How many times we had three days or more in a row, with temperatures above 83F?

To answer this question, first look at the plot.

ggplot(june_data, aes(x = Day, y = Temp)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 83, color = "red") +
  theme_bw()

There are two sequences of days with temperatures above 83F. One with two days, and one with 5 days. This last sequence meets the condition of “three days or more”.

Now, let’s use the nseq package to compute this for us.

trle_cond(june_data$Temp, a_op = "gte", a = 3, b_op = "gte", b = 83)
#> [1] 1

How many times we had two days or more in sequence with temperatures below 75F?

ggplot(june_data, aes(x = Day, y = Temp)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 75, color = "red") +
  theme_bw()

trle_cond(june_data$Temp, a_op = "gte", a = 2, b_op = "lte", b = 75)
#> [1] 2

Grouping

You can use the nseq functions inside ?dplyr::summarise to compute counts for groups.

For each month, how many sequences of three days or more presented temperatures above the month’s average.

library(dplyr)

airquality |>
  summarise(
    avg = mean(Temp, na.rm = TRUE),
    count_3 = trle_cond(Temp, a_op = "gte", a = 3, b_op = "gte", b = avg), 
    .by = Month
  )
#>   Month      avg count_3
#> 1     5 65.54839       3
#> 2     6 79.10000       1
#> 3     7 83.90323       3
#> 4     8 83.96774       2
#> 5     9 76.90000       1
month_avg <- airquality |>
  summarise(
    avg = mean(Temp, na.rm = TRUE),
    .by = Month
  )

airquality |>
  left_join(month_avg, by = "Month") |>
  mutate(date = as.Date(paste0(1973,"-",Month,"-",Day))) |> 
  ggplot(aes(x = date)) +
    geom_line(aes(y = Temp)) +
    geom_step(aes(y = avg), color = "red") +
    geom_point(aes(y = Temp)) +
    theme_bw()