Author

Dmitry Shemetov & David Weber

Published

May 22, 2025

Forecasting Season 2024/25 Summary

State level quantile forecasting of flu and covid hospitalizations

Dmitry Shemetov & David Weber

May 22, 2025

Major Forecasting Challenges

non-stationarity - varying vaccination rates, strains, mobility, superspreader events, etc.
large seasonal variation (flu) - season start, duration, and peak size vary
lack of seasonal component (covid) - seasonal pattern in the data is weak (post-pandemic)
bad data quality - initial observations are incomplete, regular reporting delays, errors, etc.
few seasons to train on - NHSN data starts in 2020, the first two seasons are excluded because they were anomalous due to the pandemic, so effectively only 2 seasons of data are available for training

The Data

Recent Seasons Data: Flu

nhsn_archive_flu <- qs2::qs_read(here::here("flu_hosp_prod", "objects", "nhsn_archive_data"))
nhsn_latest <- nhsn_archive_flu %>%
  epix_as_of_current() %>%
  # Cut out the tiny fraction of 2019/20 data (there's like 58 data points)
  filter(time_value >= "2020-10-01") %>%
  add_season_info() %>%
  filter(geo_value %nin% c("as", "gu", "mp", "vi"))
ggplot(nhsn_latest, aes(x = season_week, y = value, color = season, size = season)) +
  geom_line() +
  facet_wrap(~geo_value, scale = "free") +
  theme(legend.text = element_text(size = 12)) +
  scale_size_manual(values = c(rep(0.5,5), 1))

Here we see the 2024/25 season in pink. It’s the worst season in 15 years and very different from the previous 2 seasons. Before that, the 2020/21 and the 2021/22 seasons are essentially negligible due to the pandemic (so we’ve dropped them).

SciAm article with some speculation on why. - Late start -> less protection + 2nd wave - different strain (H3N2 H1N1) - decreased vaccination (not that low) - lower levels of immunity from decreased exposure

This bears out somewhat in NSSP, but it is not as extreme an increase, so it may be a change in reporting.

Recent Seasons Data: Covid

nhsn_archive_covid <- qs2::qs_read(here::here("covid_hosp_prod", "objects", "nhsn_archive_data"))
nhsn_latest_covid <- nhsn_archive_covid %>%
  epix_as_of_current() %>%
  add_season_info() %>%
  filter(geo_value %nin% c("as", "gu", "mp", "vi"), season %nin% c("2019/20", "2020/21", "2021/22"))
ggplot(nhsn_latest_covid, aes(x = season_week, y = value, color = season, size = season)) +
  geom_line() +
  facet_wrap(~geo_value, scale = "free") +
  theme(legend.text = element_text(size = 12)) +
  scale_size_manual(values = c(rep(0.5, 2), 1))

Here we’ve dropped the 2020 through 2022 seasons because they absolutely swamp the past 3 years. Even so, this year is frequently without a season at all depending on the geo (for instance, see CA and FL and TX); part of the reason for this is the late summer wave, which is not visible in this data.

Data Revisions: Flu

nhsn_archive_flu$time_type <- "day"
revision_sum <- nhsn_archive_flu %>% epiprocess::revision_analysis(value, min_waiting_period = NULL)
av_re_spread <- revision_sum$revision_behavior %>%
  group_by(geo_value) %>%
  summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
  arrange(desc(rel_spread)) %>%
  filter(geo_value %nin% c("vi", "as", "gu"))
nhsn_filtered <- nhsn_archive_flu %>%
  filter(geo_value %in% av_re_spread$geo_value[1:18]) %>%
  filter(time_value >= "2024-11-19")
nhsn_filtered$DT %<>%
  mutate(geo_value = factor(geo_value, levels = av_re_spread$geo_value[1:18]))

autoplot(nhsn_filtered, "value") + facet_wrap(~geo_value, ncol = 3, scales = "free") + theme(strip.text.x = element_text(size = 8)) +
  labs(title = "States with the largest mean revision")

Here we see the revision behavior in the states with the largest mean revision. Note that some revisions are massive: NH, AZ, MN, MO.

Data Revisions: Covid

nhsn_archive_covid$time_type <- "day"
revision_sum <- nhsn_archive_covid %>% epiprocess::revision_analysis(value, min_waiting_period = NULL)
av_re_spread <- revision_sum$revision_behavior %>%
  group_by(geo_value) %>%
  summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
  arrange(desc(rel_spread)) %>%
  filter(geo_value %nin% c("vi", "as", "gu"))
nhsn_filtered <- nhsn_archive_covid %>%
  filter(geo_value %in% av_re_spread$geo_value[1:18]) %>%
  filter(time_value >= "2024-11-19")
nhsn_filtered$DT %<>%
  mutate(geo_value = factor(geo_value, levels = av_re_spread$geo_value[1:18]))

autoplot(nhsn_filtered, "value") + facet_wrap(~geo_value, ncol = 3, scales = "free") + theme(strip.text.x = element_text(size = 8)) +
  labs(title = "States with the largest mean revision")

Here we see the revision behavior in the states with the largest mean revision. Note that some revisions are massive: DC, AZ, MN, MO, HI.

Models

Our Models: Simplest

In order of increasing complexity/usage

linear: linearly extrapolate the last 4 weeks
climate_base: a climatological model, quantiles estimated from a 7 week period centered on the target_date’s season week from historical data for that geo
climate_geo: a climatological model, like climate_base but converts to rates, and then creates quantiles using all geos.
climate_linear: a weighted ensemble of the climatological and linear

weights <-
  make_ahead_weights(-1:3) %>%
  left_join(
    make_quantile_weights(covidhub_probs()),
    by = c("forecast_family"),
    relationship = "many-to-many"
  ) %>%
  mutate(weight = weight.x * weight.y) %>%
  select(forecast_family, quantile, ahead, weight)
weights %>%
  filter(forecast_family == "climate") %>%
  ggplot(aes(y = factor(ahead), x = factor(quantile), fill = weight)) +
  geom_tile() +
  scale_fill_viridis_c(limits = c(0,1))

Our Models: Autoregressive

windowed_seasonal: an AR model where the training data is taken from a window 4 weeks before and after the training
- The model is roughly $x_{t} = x_{t - 7} + x_{t - 14}$ , where $x$ is the scaled NHSN data (along with ILI+ and FluSURV) and $t$ is the target date in days.
- Covid predicted this on rates data
- For flu, this model also uses ILI+ data and fluSURV, with the forecasts done on variance-stabilized, scaled, and centered data, on a per-geo-source basis.
  - the variance stabilization is a 4th root
  - the scaling is by the difference between the 5th and 95th quantiles
  - the centering is so that the median is zero

This “training window” approach to handling seasonality gave a large improvement to the model score over the base AR model. The augmented data sources also improved the model score. We did not see much improvement from adding more lags or tweaking the whitening settings.

Our Models: Autoregressive with Exogenous Predictors

windowed_seasonal_nssp: like windowed_seasonal without the data scaling, and using nssp (emergency department visits) as an exogenous predictor (same lags).
- The model is roughly $x_{t} = x_{t - 7} + x_{t - 14} + y_{t - 7} + y_{t - 14}$ , where $x$ is the scaled NHSN data (along with ILI+ and FluSURV) and $y$ is the NSSP data, and $t$ is the target date in days.

This exogenous predictor gave the largest improvement to the model score out of Google Symptoms and NWSS.

Timeline of models used

Flu Model

2024-11-21 initial forecast, straight average of climate_base, climate_geo and linear
2024-11-27 climate_linear ensemble instead of average (2 weeks)
2024-12-11 introduce windowed_seasonal to ensemble (5 weeks)
2025-01-15 introduce windowed_seasonal_nssp to ensemble
Final model ensembles windowed_seasonal (which uses ILI+ and FluSURV), windowed_seasonal_nssp (which doesn’t), and climate_linear

Covid Model

2024-11-21 initial forecast, straight average of climate_base, climate_geo and linear
2024-11-27 climate_linear ensemble instead of average (9 weeks)
2025-01-29 windowed_seasonal_nssp introduced to ensemble
2025-02-19 forecast using only windowed_seasonal_nssp
Final model uses only windowed_seasonal_nssp, except for states without nssp, where it ensembles windowed_seasonal and climate_linear

The takeaway here is mostly that we changed the model a lot as the season progressed.

Flu Scores

Flu Scores: WIS

flu_archive <- qs2::qs_read(here::here("flu_hosp_prod", "objects", "nhsn_archive_data"))
flu_current <- flu_archive %>%
  epix_as_of_current() %>%
  filter(geo_value %nin% c("as", "gu", "mp", "vi"))
flu_max <- flu_current %>% group_by(geo_value) %>% summarize(max_value = max(value))
compute_peak_season <- function(data_current, threshold = 0.5, start_of_year = as.Date("2024-11-01")) {
  season_length <- data_current %>% pull(time_value) %>% max() - start_of_year
  data_current %>%
    filter(time_value > start_of_year) %>%
    group_by(geo_value) %>%
    mutate(max_val = max(value)) %>%
    filter(value >= threshold * max_val) %>%
    summarize(first_above = min(time_value), last_above = max(time_value)) %>%
    mutate(
      duration = last_above - first_above,
      rel_duration = as.integer(duration) / as.integer(season_length))
}
classify_phase <- function(time_value, first_above, last_above, rel_duration, threshold) {
  case_when(
    rel_duration > threshold ~ "flat",
    time_value < first_above ~ "increasing",
    time_value > last_above ~ "decreasing",
    .default = "peak"
  ) %>% factor(levels = c("increasing", "peak", "decreasing", "flat"))
}
flu_flat_threshold <- 0.9
flu_within_max <- compute_peak_season(flu_current)

sanity_check_classifying <- flu_current %>%
  left_join(flu_within_max, by = "geo_value") %>%
  mutate(phase = classify_phase(time_value, first_above, last_above, rel_duration, flu_flat_threshold)) %>%
  group_by(geo_value) %>%
  distinct(phase)
repo_forecasters <- c(setdiff(unique(flu_scores$forecaster), our_forecasters), "CMU-TimeSeries")
flu_score_summary <- flu_scores %>%
  left_join(state_census, by = join_by(geo_value == abbr)) %>%
  group_by(forecaster) %>%
  mutate(
    min_wis = min(wis[wis > 1e-5]),
    min_ae = min(ae_median[ae_median > 1e-5])
  ) %>%
  summarize(
    mean_wis = round(Mean(wis), 2),
    wis_sd = round(sd(wis), 2),
    pop_norm_wis = round(Mean(wis *1e5/pop), 2),
    pop_norm_wis_sd = round(sd(wis * 1e5/pop), 2),
    geo_wis = round(GeoMean(wis, min_wis), 2),
    #nWISzero = sum(wis < 1e-5),
    mean_ae = round(Mean(ae_median), 2),
    ae_sd = round(sd(ae_median), 2),
    pop_norm_ae = round(Mean(ae_median*1e5/pop), 2),
    pop_norm_ae_sd = round(sd(ae_median * 1e5/pop), 2),
    geo_ae = round(GeoMean(ae_median, min_ae), 2),
    #nAEzero = sum(ae_median < 1e-5),
    mean_cov_50 = round(Mean(interval_coverage_50), 2),
    mean_cov_90 = round(Mean(interval_coverage_90), 2),
    n = n()
  ) %>%
  arrange(mean_wis)

Show entries

Search:

	forecaster	mean_wis	wis_sd	pop_norm_wis	pop_norm_wis_sd	mean_cov_50	mean_cov_90	n
1	PSI-PROF	105.96	204.55	1.82	2.47	0.5	0.84	4084
2	FluSight-lop_norm	108.63	206	1.79	2.35	0.54	0.87	4084
3	UMass-flusion	114.03	230.41	1.89	2.67	0.35	0.73	4084
4	CMU-TimeSeries	115.8	233.9	1.88	2.64	0.59	0.87	4084
5	windowed_seasonal_nssp	115.86	258.93	1.82	2.7	0.45	0.83	4068
6	FluSight-ensemble	116.54	237.68	1.92	2.73	0.45	0.73	4084
7	ensemble_windowed	116.64	244.81	1.85	2.64	0.49	0.85	4084
8	NIH-Flu_ARIMA	117.1	216	2.1	3.09	0.24	0.58	4084
9	windowed_seasonal	123.8	246.55	2.01	2.72	0.5	0.84	4084
10	retro_submission	124.33	257.25	1.95	2.49	0.57	0.9	4084
11	climate_linear	162.1	332.54	2.5	3.3	0.5	0.87	4084
12	linear	172.6	323.86	2.94	4.29	0.27	0.56	4084
13	FluSight-baseline	176.02	306.35	2.81	3.28	0.15	0.54	4084
14	climate_base	203.51	375.77	3.17	3.83	0.39	0.8	4084
15	climate_geo_agged	257.67	484.85	4.15	3.71	0.52	0.92	4084

Showing 1 to 15 of 15 entries

Previous1Next

The external forecasters are in bold. Our submitted forecasts are “CMU-TimeSeries”. The rest are our component models (as they exist now).

Takeaways: - windowed_seasonal_nssp attains very low WIS - climate_linear performs surprisingly well given its simplicity - TODO: more commentary

Flu Scores: Absolute Error

flu_ae_summary <- flu_score_summary %>%
  arrange(mean_ae) %>%
  filter(forecaster %nin% c("seasonal_nssp_latest")) %>%
  select(forecaster, mean_ae, ae_sd, pop_norm_ae, pop_norm_ae_sd, mean_cov_50, mean_cov_90, n) %>%
  arrange(mean_ae)
datatable(
    flu_ae_summary,
    fillContainer = FALSE,
    options = list(
      initComplete = htmlwidgets::JS(
          "function(settings, json) {",
          paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
          "}"),
      pageLength = 25
    )
  ) %>%
  formatStyle("forecaster", target = c("cell"),
              fontWeight = styleEqual(repo_forecasters, rep("900", length(repo_forecasters))),
              textDecoration = styleEqual("CMU-TimeSeries", "underline")) %>%
  formatStyle(
    "mean_ae",
    background = styleColorBar(c(0, max(flu_ae_summary$mean_ae)), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "pop_norm_ae",
    background = styleColorBar(c(0, max(flu_ae_summary$pop_norm_ae)), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  )

Show entries

Search:

	forecaster	mean_ae	ae_sd	pop_norm_ae	pop_norm_ae_sd	mean_cov_50	mean_cov_90	n
1	UMass-flusion	160.71	307.29	2.68	3.49	0.35	0.73	4084
2	FluSight-lop_norm	162.01	316.1	2.65	3.43	0.54	0.87	4084
3	FluSight-ensemble	163.99	317.27	2.69	3.43	0.45	0.73	4084
4	CMU-TimeSeries	164.38	330.09	2.69	3.62	0.59	0.87	4084
5	PSI-PROF	165.05	322.74	2.83	3.7	0.5	0.84	4084
6	NIH-Flu_ARIMA	165.76	295.71	2.87	3.87	0.24	0.58	4084
7	windowed_seasonal_nssp	166.76	354.68	2.62	3.62	0.45	0.83	4068
8	ensemble_windowed	173.68	361.44	2.76	3.73	0.49	0.85	4084
9	retro_submission	180.64	367.06	2.85	3.75	0.57	0.9	4084
10	windowed_seasonal	186	376.14	3.01	3.95	0.5	0.84	4084
11	FluSight-baseline	225.01	355.91	3.64	3.72	0.15	0.54	4084
12	climate_linear	225.88	430.51	3.47	4.38	0.5	0.87	4084
13	linear	232.62	411.39	3.92	5.16	0.27	0.56	4084
14	climate_base	296.91	517.14	4.6	5	0.39	0.8	4084
15	climate_geo_agged	367.05	647.05	5.16	5.38	0.52	0.92	4084

Showing 1 to 15 of 15 entries

Previous1Next

Flu Scores: WIS Rate Histogram

These scores are rates rather than counts, and color gives population. Note there’s an implied fat tail in the distribution of scores. Overall, the distributions of scores are very similar.

In the increasing phase, CMU-TimeSeries has a smaller left tail, but a more concentrated middle, and smaller right tail, so this doesn’t show up in the aggregate. In the peak phase, CMU-TimeSeries has a thinner left tail (meaning we have fewer cases where we’re dead on), which is enough to show up in the aggregate. The decreasing phase is about the same for the compared models.

ggplot(data, aes(x = wis, y = forecaster, fill = forecaster)) + geom_density_ridges(scale = 3, alpha = 0.7) + scale_y_discrete(expand = c(0, 0)) + # will generally have to set the expand option scale_x_continuous(expand = c(0, 0)) + # for both axes to remove unneeded padding scale_fill_brewer(palette = 4) + coord_cartesian(clip = “off”) + # to avoid clipping of the very top of the top ridgeline theme_ridges() + facet_wrap(~phase) + labs(title = “Wis score histogram”) + ylab(“phase”) + xlab(“wis, population normalized”) + theme(plot.title = element_text(hjust = 0.5), legend.position = “none”) + theme(legend.position = “none”)


## Flu Scores: Phase
::: {.notes}
We classify the season into three phases: increasing, peak, and decreasing.
Increasing is before the first time the value exceeds a threshold.
Decreasing is the last time the value dips below the threshold.
Peak is between these two.
The threshold is chosen at 50% of the max value.

Note that our WIS/AE is in the top 5, but everyone's score is quite similar and washed out.
We performed well in the increasing/decreasing phases, but most of our error came in the peak phase.
:::


::: {.cell layout-align="center"}

```{.r .cell-code}
phase_scores <-
  flu_scores %>%
  filter(forecaster %nin% c("seasonal_nssp_latest")) %>%
  left_join(flu_within_max, by = "geo_value") %>%
  mutate(phase = classify_phase(target_end_date, first_above, last_above, rel_duration, flu_flat_threshold)) %>%
  group_by(forecaster, phase) %>%
  summarize(
    wis_sd = round(sd(wis, na.rm = TRUE), 2),
    ae_sd = round(sd(ae_median, na.rm = TRUE), 2),
    across(c(wis, ae_median, interval_coverage_90), \(x) round(Mean(x), 2)),
    n = n(),
    our_forecaster = first(our_forecaster),
    .groups = "drop"
  )
sketch <- htmltools::withTags(table(
  class = 'display',
  style = "font-size: 11px;",
  thead(
    tr(
      th(colspan = 2, "forecaster"),
      th(colspan = 2, 'increasing'),
      th(colspan = 2, 'peak'),
      th(colspan = 2, 'decreasing')
    ),
    tr(
      list(th("peak_wis_rank"), th("name"),
           lapply(rep(c('wis', 'sd'), 3), th)
           )
    )
  )
))
phase_scores_wider <- phase_scores %>%
  select(forecaster, phase, wis, wis_sd, our_forecaster) %>%
  pivot_wider(names_from = phase, values_from = c(wis, wis_sd))
wis_max <- phase_scores_wider %>% select(wis_increasing, wis_peak, wis_decreasing) %>% max()
phase_scores_wider %>%
  select(forecaster, wis_increasing, wis_sd_increasing, wis_peak, wis_sd_peak, wis_decreasing, wis_sd_decreasing) %>%
  arrange(wis_peak) %>%
  datatable(
    fillContainer = FALSE,
    options = list(
      initComplete = htmlwidgets::JS(
          "function(settings, json) {",
          paste0("$(this.api().table().container()).css({'font-size': '", "11pt", "'});"),
          "}"),
      pageLength = 25
    ),
    container = sketch
  ) %>%
  formatStyle("forecaster", target = c("cell"),
              fontWeight = styleEqual(repo_forecasters, rep("900", length(repo_forecasters))),
              textDecoration = styleEqual("CMU-TimeSeries", "underline")) %>%
  formatStyle(
    "wis_increasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_peak",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_decreasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  )

Show entries

Search:

forecaster		increasing		peak		decreasing
peak_wis_rank	name	wis	sd	wis	sd	wis	sd
1	PSI-PROF	71.36	127.89	213.9	305.58	46.44	76.13
2	FluSight-lop_norm	79.41	139.78	217.74	304.27	45.18	73.97
3	NIH-Flu_ARIMA	89.05	160.59	219.33	294.01	58.04	133.36
4	UMass-flusion	92.62	168.38	226.37	335.69	43.74	95.01
5	FluSight-ensemble	88.38	162.15	237.87	354.64	43.5	79.45
6	CMU-TimeSeries	72.66	125.97	248.15	356.9	43.22	66.11
7	ensemble_windowed	82.25	148.51	251.52	373.3	37.21	54.28
8	windowed_seasonal_nssp	77.1	142.39	252.34	401.84	37.5	58.88
9	retro_submission	84.12	143.13	260.17	390.72	47.51	91.87
10	windowed_seasonal	91.28	160.34	264.1	369.35	39.31	55.86
11	FluSight-baseline	111.65	198.84	275.08	388.38	140.03	268.77
12	linear	116.96	207.06	275.65	378.74	128.69	317.17
13	climate_linear	83.98	140.85	314.49	469.3	94.8	238.55
14	climate_base	92.56	139.78	457.99	560.01	79.97	119.66
15	climate_geo_agged	143.18	195.69	542.3	739.37	114	176.62

Showing 1 to 15 of 15 entries

Previous1Next

:::

Flu: Sample Forecasts

80% CI We’re plotting the 80% CI along with the median. The locations were chosen based on the scores to have a sample of large and small states with large and small (population normalized) WIS. We’ve scaled so everything is in rates per 100k so that it’s easier to actually compare; even so the peak value varies drastically. Forecasters we’ve produced are blue, while forecasters from other teams are red. They are ordered by mean_wis score, best to worst.

Flu: Forecasting on latest

The point is to demonstrate the upper bound of what accurate nowcasting would get us. named seasonal_nssp_latest compare with windowed_seasonal_nssp. Generally outperforms all other models by a good margin.

phase_scores <-
  flu_scores %>%
  left_join(flu_within_max, by = "geo_value") %>%
  mutate(phase = classify_phase(target_end_date, first_above, last_above, rel_duration, flu_flat_threshold)) %>%
  group_by(forecaster, phase) %>%
  summarize(
    wis_sd = round(sd(wis, na.rm = TRUE), 2),
    ae_sd = round(sd(ae_median, na.rm = TRUE), 2),
    across(c(wis, ae_median, interval_coverage_90), \(x) round(Mean(x), 2)),
    n = n(),
    our_forecaster = first(our_forecaster),
    .groups = "drop"
  )
sketch <- htmltools::withTags(table(
  class = 'display',
  style = "font-size: 11px;",
  thead(
    tr(
      th(colspan = 2, "forecaster"),
      th(colspan = 2, 'increasing'),
      th(colspan = 2, 'peak'),
      th(colspan = 2, 'decreasing')
    ),
    tr(
      list(th("peak_wis_rank"), th("name"),
           lapply(rep(c('wis', 'sd'), 3), th)
           )
    )
  )
))
phase_scores_wider <- phase_scores %>%
  select(forecaster, phase, wis, wis_sd, our_forecaster) %>%
  pivot_wider(names_from = phase, values_from = c(wis, wis_sd))
wis_max <- phase_scores_wider %>% select(wis_increasing, wis_peak, wis_decreasing) %>% max()
phase_scores_wider %>%
  select(forecaster, wis_increasing, wis_sd_increasing, wis_peak, wis_sd_peak, wis_decreasing, wis_sd_decreasing) %>%
  arrange(wis_peak) %>%
  datatable(
    fillContainer = FALSE,
    options = list(
      initComplete = htmlwidgets::JS(
          "function(settings, json) {",
          paste0("$(this.api().table().container()).css({'font-size': '", "11pt", "'});"),
          "}"),
      pageLength = 25
    ),
    container = sketch
  ) %>%
  formatStyle("forecaster", target = c("cell"),
              fontWeight = styleEqual(repo_forecasters, rep("900", length(repo_forecasters))),
              textDecoration = styleEqual("CMU-TimeSeries", "underline")) %>%
  formatStyle(
    "wis_increasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_peak",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_decreasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  )

Show entries

Search:

forecaster		increasing		peak		decreasing
peak_wis_rank	name	wis	sd	wis	sd	wis	sd
1	seasonal_nssp_latest	64.04	122.3	208.25	361.04	41.46	71.05
2	PSI-PROF	71.36	127.89	213.9	305.58	46.44	76.13
3	FluSight-lop_norm	79.41	139.78	217.74	304.27	45.18	73.97
4	NIH-Flu_ARIMA	89.05	160.59	219.33	294.01	58.04	133.36
5	UMass-flusion	92.62	168.38	226.37	335.69	43.74	95.01
6	FluSight-ensemble	88.38	162.15	237.87	354.64	43.5	79.45
7	CMU-TimeSeries	72.66	125.97	248.15	356.9	43.22	66.11
8	ensemble_windowed	82.25	148.51	251.52	373.3	37.21	54.28
9	windowed_seasonal_nssp	77.1	142.39	252.34	401.84	37.5	58.88
10	retro_submission	84.12	143.13	260.17	390.72	47.51	91.87
11	windowed_seasonal	91.28	160.34	264.1	369.35	39.31	55.86
12	FluSight-baseline	111.65	198.84	275.08	388.38	140.03	268.77
13	linear	116.96	207.06	275.65	378.74	128.69	317.17
14	climate_linear	83.98	140.85	314.49	469.3	94.8	238.55
15	climate_base	92.56	139.78	457.99	560.01	79.97	119.66
16	climate_geo_agged	143.18	195.69	542.3	739.37	114	176.62

Showing 1 to 16 of 16 entries

Previous1Next

Covid Scores

Covid Scores: WIS

CMU-TimeSeries did significantly worse; a large part of this is poor performance on literally the first week

covid_archive <- qs2::qs_read(here::here("covid_hosp_prod", "objects", "nhsn_archive_data"))
covid_current <- covid_archive %>%
  epix_as_of_current() %>%
  filter(geo_value %nin% c("as", "gu", "mp", "vi"))
covid_max <- covid_current %>% group_by(geo_value) %>% summarize(max_value = max(value))
covid_within_max <- compute_peak_season(covid_current)

covid_flat_threshold <- 0.6
sanity_check_classifying <- covid_current %>%
  left_join(covid_within_max, by = "geo_value") %>%
  mutate(phase = classify_phase(time_value, first_above, last_above, rel_duration, covid_flat_threshold)) %>%
  group_by(geo_value) %>%
  distinct(phase)

repo_forecasters <- c(setdiff(unique(covid_scores$forecaster), our_forecasters), "CMU-TimeSeries")
covid_score_summary <- covid_scores %>%
  left_join(state_census, by = join_by(geo_value == abbr)) %>%
  group_by(forecaster) %>%
  mutate(
    min_wis = min(wis[wis > 1e-5]),
    min_ae = min(ae_median[ae_median > 1e-5])
  ) %>%
  summarize(
    mean_wis = round(Mean(wis), 2),
    wis_sd = round(sd(wis), 2),
    pop_norm_wis = round(Mean(wis *1e5/pop), 2),
    pop_norm_wis_sd = round(sd(wis * 1e5/pop), 2),
    geo_wis = round(GeoMean(wis, min_wis), 2),
    #nWISzero = sum(wis < 1e-5),
    mean_ae = round(Mean(ae_median), 2),
    ae_sd = round(sd(ae_median), 2),
    pop_norm_ae = round(Mean(ae_median*1e5/pop), 2),
    pop_norm_ae_sd = round(sd(ae_median * 1e5/pop), 2),
    geo_ae = round(GeoMean(ae_median, min_ae), 2),
    #nAEzero = sum(ae_median < 1e-5),
    mean_cov_50 = round(Mean(interval_coverage_50), 2),
    mean_cov_90 = round(Mean(interval_coverage_90), 2),
    n = n()
  ) %>%
  arrange(mean_wis)

Show entries

Search:

	forecaster	mean_wis	wis_sd	pop_norm_wis	pop_norm_wis_sd	mean_cov_50	mean_cov_90	n
1	windowed_seasonal_nssp	29.39	49.12	0.51	0.53	0.51	0.91	3872
2	ensemble_windowed	30.92	43.69	0.56	0.58	0.57	0.94	3888
3	CovidHub-ensemble	32.63	55.16	0.58	0.72	0.56	0.9	3888
4	UMass-ar6_pooled	35.8	59.15	0.65	0.78	0.55	0.9	3888
5	UMass-gbqr	37.16	69.67	0.65	0.86	0.44	0.8	3888
6	ensemble_mix	38.23	67.8	0.73	0.99	0.56	0.92	3888
7	windowed_seasonal	38.4	52.92	0.71	0.77	0.51	0.93	3888
8	CEPH-Rtrend_covid	38.89	66.28	0.68	0.86	0.51	0.86	3888
9	CovidHub-baseline	39.46	68.09	0.71	0.85	0.52	0.79	3888
10	climate_linear	43.99	69.86	0.86	1.01	0.58	0.91	3888
11	CMU-TimeSeries	44.79	82.15	0.73	0.93	0.52	0.92	3888
12	linear	50.43	100.94	0.94	1.4	0.35	0.71	3888
13	climate_base	100.79	161.86	1.47	0.95	0.35	0.87	3888
14	climate_geo_agged	133.85	85.92	5.24	6.47	0.48	0.89	3888

Showing 1 to 14 of 14 entries

Previous1Next

Covid Scores: Absolute Error

covid_ae_summary <- covid_score_summary %>%
  arrange(mean_ae) %>%
  filter(forecaster %nin% c("seasonal_nssp_latest")) %>%
  select(forecaster, mean_ae, ae_sd, pop_norm_ae, pop_norm_ae_sd, mean_cov_50, mean_cov_90, n)
  datatable(
    covid_ae_summary,
    fillContainer = FALSE,
    options = list(
      initComplete = htmlwidgets::JS(
          "function(settings, json) {",
          paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
          "}"),
      pageLength = 25
    )
  ) %>%
  formatStyle("forecaster", target = c("cell"),
              fontWeight = styleEqual(repo_forecasters, rep("900", length(repo_forecasters))),
              textDecoration = styleEqual("CMU-TimeSeries", "underline")) %>%
  formatStyle(
    "mean_ae",
    background = styleColorBar(c(0, max(covid_ae_summary$mean_ae)), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "pop_norm_ae",
    background = styleColorBar(c(0, max(covid_ae_summary$pop_norm_ae)), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  )

Show entries

Search:

	forecaster	mean_ae	ae_sd	pop_norm_ae	pop_norm_ae_sd	mean_cov_50	mean_cov_90	n
1	ensemble_windowed	45.15	74.3	0.84	0.98	0.57	0.94	3888
2	windowed_seasonal_nssp	45.24	83.18	0.78	0.89	0.51	0.91	3872
3	CovidHub-ensemble	49.29	85.26	0.89	1.1	0.56	0.9	3888
4	UMass-ar6_pooled	53.23	87.64	0.99	1.18	0.55	0.9	3888
5	UMass-gbqr	53.86	91.47	0.95	1.11	0.44	0.8	3888
6	ensemble_mix	54.15	105.25	1	1.45	0.56	0.92	3888
7	CovidHub-baseline	55.33	88.86	1	1.14	0.52	0.79	3888
8	windowed_seasonal	56.36	88.24	1.09	1.29	0.51	0.93	3888
9	CEPH-Rtrend_covid	59.35	102.43	1.03	1.3	0.51	0.86	3888
10	climate_linear	63.29	108.96	1.17	1.47	0.58	0.91	3888
11	CMU-TimeSeries	67.18	131.93	1.13	1.49	0.52	0.92	3888
12	linear	70.37	121.5	1.29	1.61	0.35	0.71	3888
13	climate_geo_agged	165.81	162.42	6.23	8.16	0.48	0.89	3888
14	climate_base	182.22	320.31	2.61	2.01	0.35	0.87	3888

Showing 1 to 14 of 14 entries

Previous1Next

Coivd Scores: Aggregated By Forecast Date

Covid Scores: Wis Rate Histogram

These scores are rates rather than counts.

Note that again, the score distributions are very similar. You can see where CMU-TimeSeries performed worse at the beginning of the season in the top left. This was due to a naive use of the climatological baseline, which completely overestimated values in states that had no season.

Covid Scores: Phase

phase_scores <-
  covid_scores %>%
  filter(forecaster %nin% c("seasonal_nssp_latest")) %>%
  left_join(covid_within_max, by = "geo_value") %>%
  mutate(phase = classify_phase(target_end_date, first_above, last_above, rel_duration, covid_flat_threshold)) %>%
  group_by(forecaster, phase) %>%
  summarize(
    wis_sd = round(sd(wis, na.rm = TRUE), 2),
    ae_sd = round(sd(ae_median, na.rm = TRUE), 2),
    across(c(wis, ae_median, interval_coverage_90), \(x) round(Mean(x), 2)),
    n = n(),
    our_forecaster = first(our_forecaster),
    .groups = "drop"
  )
sketch <- htmltools::withTags(table(
  class = 'display',
  style = "font-size: 12px;",
  thead(
    tr(
      th(colspan = 2, "forecaster"),
      th(colspan = 2, 'increasing'),
      th(colspan = 2, 'peak'),
      th(colspan = 2, 'decreasing'),
      th(colspan = 2, 'flat')
    ),
    tr(
      list(th("peak_wis_rank"), th("name"),
           lapply(rep(c('wis', 'sd'), 4), th)
           )
    )
  )
))
phase_scores_wider <- phase_scores %>%
  select(forecaster, phase, wis, wis_sd, our_forecaster) %>%
  pivot_wider(names_from = phase, values_from = c(wis, wis_sd)) %>%
  select(forecaster, wis_increasing, wis_sd_increasing, wis_peak, wis_sd_peak, wis_decreasing, wis_sd_decreasing, wis_flat, wis_sd_flat) %>%
  arrange(wis_peak)
wis_max <- phase_scores_wider %>% select(wis_increasing, wis_peak, wis_decreasing) %>% max()
datatable(
  phase_scores_wider,
  fillContainer = FALSE,
  options = list(
    initComplete = htmlwidgets::JS(
          "function(settings, json) {",
          paste0("$(this.api().table().container()).css({'font-size': '", "12pt", "'});"),
          "}"),
    pageLength = 25
  ),
  container = sketch
) %>%
  formatStyle("forecaster", target = c("cell"),
              fontWeight = styleEqual(repo_forecasters, rep("900", length(repo_forecasters))),
              textDecoration = styleEqual("CMU-TimeSeries", "underline")) %>%
  formatStyle(
    "wis_increasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_peak",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_decreasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_flat",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  )

Show entries

Search:

forecaster		increasing		peak		decreasing		flat
peak_wis_rank	name	wis	sd	wis	sd	wis	sd	wis	sd
1	windowed_seasonal_nssp	28.08	48.78	40.88	57.46	14.11	19.54	33.79	59.11
2	ensemble_windowed	24.43	28.97	42.81	56.34	18.95	21.79	30.87	42.27
3	CovidHub-ensemble	23.36	22.69	50.49	75.57	15.87	22.46	30.34	45.39
4	windowed_seasonal	27.64	31.45	52.23	70.18	27.02	30.31	35.19	44.35
5	UMass-ar6_pooled	26.74	33.07	52.77	82.85	22.43	28.55	29.42	40.08
6	CovidHub-baseline	27.76	33.49	58.35	93.06	26.08	33.18	30.44	55.93
7	CEPH-Rtrend_covid	24.94	25.79	58.43	89.01	20.94	34.34	37.05	55.84
8	ensemble_mix	36.03	36.38	58.62	93.14	16.03	27.46	37.76	53.95
9	UMass-gbqr	25.88	28.9	59.82	98.67	16.92	21.67	32.36	52.06
10	CMU-TimeSeries	74.62	82.36	59.83	92.51	15.91	26.12	54.24	106.08
11	climate_linear	36.2	36.28	61.32	95.52	25.41	34.31	45.15	56.52
12	linear	33.5	41.67	77.59	146.24	29.57	46.3	40.21	57.28
13	climate_base	171.93	148	94.21	117.72	42.71	38.09	184.89	278.45
14	climate_geo_agged	148.47	22.17	158.28	97.17	81.54	42.62	170.01	90.31

Showing 1 to 14 of 14 entries

Previous1Next

Covid: Sample Forecasts

Future Work

Account for “always decreasing” behavior observed in nearly all forecasters (include external forecasters)
- We’ve determined that this is due to data bias, so we need ways to mitigate that.
- Possible solution is to attempt to split the data into phase components and fit a different model to each phase. (Was tried before, but didn’t work as well as hoped; perhaps with a better phase definition it could be better?)
Nowcasting, revision anticipation
Forecasting at the city/county level for NSSP (metrocasting)
Check evaluation robustness via data “fuzzing”
- A way to mitigate over-weighting the performance of a forecaster on a single season
- Possibilities include parametric boot with observational noise (additive and/or shifts) or leave-one-out training across seasons (e.g. leave 2023 out and train on 2022 and 2024)

Covid: Forecasting on latest

phase_scores <-
  covid_scores %>% 
  left_join(covid_within_max, by = "geo_value") %>%
  mutate(phase = classify_phase(target_end_date, first_above, last_above, rel_duration, covid_flat_threshold)) %>%
  group_by(forecaster, phase) %>%
  summarize(
    wis_sd = round(sd(wis, na.rm = TRUE), 2),
    ae_sd = round(sd(ae_median, na.rm = TRUE), 2),
    across(c(wis, ae_median, interval_coverage_90), \(x) round(Mean(x), 2)),
    n = n(),
    our_forecaster = first(our_forecaster),
    .groups = "drop"
  )

sketch <- htmltools::withTags(table(
  class = 'display',
  style = "font-size: 11px;",
  thead(
    tr(
      th(colspan = 2, "forecaster"),
      th(colspan = 2, 'increasing'),
      th(colspan = 2, 'peak'),
      th(colspan = 2, 'decreasing')
    ),
    tr(
      list(th("peak_wis_rank"), th("name"),
           lapply(rep(c('wis', 'sd'), 3), th)
           )
    )
  )
))

phase_scores_wider <- phase_scores %>%
  select(forecaster, phase, wis, wis_sd, our_forecaster) %>%
  pivot_wider(names_from = phase, values_from = c(wis, wis_sd))

wis_max <- phase_scores_wider %>% select(wis_increasing, wis_peak, wis_decreasing) %>% max()

phase_scores_wider %>%
  select(forecaster, wis_increasing, wis_sd_increasing, wis_peak, wis_sd_peak, wis_decreasing, wis_sd_decreasing) %>%
  arrange(wis_peak) %>%
  datatable(
    fillContainer = FALSE,
    options = list(
      initComplete = htmlwidgets::JS(
          "function(settings, json) {",
          paste0("$(this.api().table().container()).css({'font-size': '", "11pt", "'});"),
          "}"),
      pageLength = 25
    ),
    container = sketch
  ) %>%
  formatStyle("forecaster", target = c("cell"),
              fontWeight = styleEqual(repo_forecasters, rep("900", length(repo_forecasters))),
              textDecoration = styleEqual("CMU-TimeSeries", "underline")) %>%
  formatStyle(
    "wis_increasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_peak",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  ) %>%
  formatStyle(
    "wis_decreasing",
    background = styleColorBar(c(0, wis_max), 'lightblue'),
    backgroundSize = '98% 88%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  )

Show entries

Search:

forecaster		increasing		peak		decreasing
peak_wis_rank	name	wis	sd	wis	sd	wis	sd
1	seasonal_nssp_latest	23.64	44.34	40.49	55.47	14.92	20.82
2	windowed_seasonal_nssp	28.08	48.78	40.88	57.46	14.11	19.54
3	ensemble_windowed	24.43	28.97	42.81	56.34	18.95	21.79
4	CovidHub-ensemble	23.36	22.69	50.49	75.57	15.87	22.46
5	windowed_seasonal	27.64	31.45	52.23	70.18	27.02	30.31
6	UMass-ar6_pooled	26.74	33.07	52.77	82.85	22.43	28.55
7	CovidHub-baseline	27.76	33.49	58.35	93.06	26.08	33.18
8	CEPH-Rtrend_covid	24.94	25.79	58.43	89.01	20.94	34.34
9	ensemble_mix	36.03	36.38	58.62	93.14	16.03	27.46
10	UMass-gbqr	25.88	28.9	59.82	98.67	16.92	21.67
11	CMU-TimeSeries	74.62	82.36	59.83	92.51	15.91	26.12
12	climate_linear	36.2	36.28	61.32	95.52	25.41	34.31
13	linear	33.5	41.67	77.59	146.24	29.57	46.3
14	climate_base	171.93	148	94.21	117.72	42.71	38.09
15	climate_geo_agged	148.47	22.17	158.28	97.17	81.54	42.62

Showing 1 to 15 of 15 entries

Previous1Next

Final slide

Thanks:

The whole CMU Delphi Team (across many institutions)
Optum/UnitedHealthcare, Change Healthcare.
Google, Facebook, Amazon Web Services.
Quidel, SafeGraph, Qualtrics.
Centers for Disease Control and Prevention.
Council of State and Territorial Epidemiologists

Appendix

Scoring Metrics

Weighted interval score (WIS)
- Discrete approximation to CRPS, which sums the colored area in the figure squared
Absolute error (AE)
- $| y_{t} - m |$ where $m$ is the median forecast
80% prediction interval coverage (80%PI)
- Proportion of times the true value falls within the center 80%

df <- data.frame(x = seq(-1, 2, length.out = 3000))

# Calculate y values for both functions
df$y1 <- pnorm(df$x, mean = 0.5, sd = 0.25)  # Normal CDF
df$y2 <- ifelse(df$x >= 0.75, 1, 0)        # Step function

# Plot
ggplot(df, aes(x = x)) +
  # Plot the functions with color aesthetic
  geom_line(aes(y = y1, color = "CDF of forecast")) +
  geom_line(aes(y = y2, color = "Eventually observed value")) +
  # Add the ribbon between the functions
  geom_ribbon(aes(ymin = pmin(y1, y2), ymax = pmax(y1, y2), fill = "WIS ~ ∫(F(x) - I(x ≥ .75))²"),
              alpha = 0.3) +
  # Customize colors and fills
  scale_color_manual(name = "", values = c("CDF of forecast" = "red",
                                          "Eventually observed value" = "blue")) +
  scale_fill_manual(name = "", values = c("WIS ~ ∫(F(x) - I(x ≥ .75))²" = "red")) +
  # Add legend inside plot
  theme(legend.position = c(0.2, 0.8),  # Adjust these values to position the legend
        legend.background = element_rect(fill = "white"),
        legend.text = element_text(size = 6)) +
  labs(x = "x", y = "CDF", title = "WIS example") +
  theme(plot.title = element_text(hjust = 0.5))

Explicit phase definitions: Flu

flu_current %>%
  filter(time_value > "2024-11-01") %>%
  autoplot(value, .facet_by = "geo_value") +
  geom_vline(data = flu_within_max, aes(xintercept = first_above)) +
  geom_vline(data = flu_within_max, aes(xintercept = last_above)) +
  facet_wrap(~geo_value, scale = "free") +
  theme(legend.position = "none")

Explicit phase definitions: Covid

no_phase <- covid_within_max %>% arrange(desc(rel_duration)) %>% filter(rel_duration > covid_flat_threshold)
covid_current %>%
  filter(time_value > "2024-11-01", geo_value %nin% no_phase$geo_value) %>%
  autoplot(value, .facet_by = "geo_value") +
  geom_vline(data = covid_within_max, aes(xintercept = first_above)) +
  geom_vline(data = covid_within_max, aes(xintercept = last_above)) +
  facet_wrap(~geo_value, scale = "free") +
  ylim(0, NA) +
  theme(legend.position = "none")

Explicit phase definitions: Covid `flat`

covid_current %>%
  filter(time_value > "2022-06-01") %>%
  filter(geo_value %in% no_phase$geo_value) %>%
  autoplot(value, .facet_by = "geo_value") +
  geom_vline(aes(xintercept = as.Date("2024-11-01"))) +
  ylim(0, NA) +
  theme(legend.position = "none")

Other Formats

Forecasting Season 2024/25 Summary

State level quantile forecasting of flu and covid hospitalizations

Dmitry Shemetov & David Weber

Major Forecasting Challenges

The Data

Recent Seasons Data: Flu

Recent Seasons Data: Covid

Data Revisions: Flu

Data Revisions: Covid

Models

Our Models: Simplest

Our Models: Autoregressive

Our Models: Autoregressive with Exogenous Predictors

Timeline of models used

Flu Model

Covid Model

Flu Scores

Flu Scores: WIS

Flu Scores: Absolute Error

Flu Scores: WIS Rate Histogram

Flu: Sample Forecasts

Flu: Forecasting on latest

Covid Scores

Covid Scores: WIS

Covid Scores: Absolute Error

Coivd Scores: Aggregated By Forecast Date

Covid Scores: Wis Rate Histogram

Covid Scores: Phase

Covid: Sample Forecasts

Future Work

Covid: Forecasting on latest

Final slide

Thanks:

Appendix

Scoring Metrics

Explicit phase definitions: Flu

Explicit phase definitions: Covid

Explicit phase definitions: Covid flat

Explicit phase definitions: Covid `flat`