\[\\[.4in]\]

Overall takeaways

There is substantial under-reporting behavior that is fairly consistent for a single geo. We can probably improve our forecasts by including revision behavior. Further, flu and covid revision behavior is fairly strongly correlated; it is reported through the same channels by the same people, so this makes sense. We should look into the extra columns to see if it provides useful information for handling revision behavior.

Flu revision

NHSN

Revision behavior

First we get the archive and remove the data older than the first version so as not to clog up the revision behavior, and display the overall revision summary1.

nhsn_archive_flu <- qs2::qs_read(here::here("flu_hosp_prod", "objects", "nhsn_archive_data"))

nhsn_archive_flu <- nhsn_archive_flu$DT %>% filter(time_value >= "2024-11-19", geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nhsn_archive_flu$time_type <- "day"
revision_summary <- nhsn_archive_flu %>% epiprocess::revision_analysis(value, min_waiting_period = NULL)
revision_summary %>% print(quick_revision = 7)
## 
## ── An epi_archive spanning 2024-11-23 to 2025-05-03. ──
## 
## ── Min lag (time to first version):
##      min median     mean     max
##   4 days 4 days 4.2 days 25 days
## 
## ── Fraction of epi_key + time_values with
## No revisions:
## • 205 out of 1,282 (15.99%)
## 
## Quick revisions (last revision within 7 days of the `time_value`):
## • 357 out of 1,282 (27.85%)
## 
## Few revisions (At most 3 revisions for that `time_value`):
## • 925 out of 1,282 (72.15%)
## 
## 
## 
## ── Fraction of revised epi_key + time_values which have: 
## 
## Less than 0.1 spread in relative value:
## • 604 out of 1,077 (56.08%)
## 
## Spread of more than 2719.05 in actual value (when revised):
## • 7 out of 1,077 (0.65%)
## 
## 
## 
## ── Days until within 20% of the latest value:
##      min median     mean      max
##   4 days 4 days 6.7 days 160 days

So around a fifth have no revisions, around a quarter resolve within a week, and around 2/3rds have a small number of revisions. Around half of those with revisions have little relative change (10% of the max value). The “actual value” change isn’t really worth thinking about because this is counts data (so there being only 6 doesn’t tell us much).

Here’s a plot of the version changes for all locations.

text_size <- 6
nhsn_archive_flu %>% autoplot(value) + theme(strip.text.x = element_text(size = text_size, margin = margin(.1, 0, .1, 0, "cm")), axis.text = element_text(size =text_size, angle = 45), legend.title = element_text(size = text_size), legend.text = element_text(size = text_size), legend.key.size = unit(0.5, "cm")) + scale_size_manual(values = c(0.5))

Since this is probably too small to actually be legible, let’s figure out the states with the worst revision behavior and plot those. Worst in this case meaning worst average relative spread

av_re_spread <- revision_summary$revision_behavior %>%
  group_by(geo_value) %>%
  summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
  arrange(desc(rel_spread)) %>%
  filter(geo_value %nin% c("vi", "as", "gu"))
av_re_spread %>%
  print(n=20)
## # A tibble: 54 × 2
##    geo_value rel_spread
##    <chr>          <dbl>
##  1 pr             0.371
##  2 ok             0.266
##  3 az             0.263
##  4 ar             0.220
##  5 nh             0.218
##  6 nm             0.217
##  7 id             0.215
##  8 dc             0.207
##  9 mo             0.205
## 10 or             0.175
## 11 mt             0.155
## 12 ri             0.145
## 13 ak             0.142
## 14 sd             0.134
## 15 md             0.128
## 16 ga             0.118
## 17 mn             0.117
## 18 ky             0.115
## 19 va             0.113
## 20 hi             0.111
## # ℹ 34 more rows

The worst 9 excluding the geo’s we don’t actually forecast (so no as or vi, gu).

nhsn_archive_flu %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))

And the next 9 worse

nhsn_archive_flu %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))

These cover a range of revision behaviors; some are off on a single time_value in a terrible way, such as mn or nh, while most have fairly constant revisioning. az is unusual in having two bad versions when they revised the entire history down by a factor of 2, and then reverted to what it had been previously. Most are systematically reporting lower than the actual values, with some such as nm, nh, ok, and ak particularly bad about this. It is probably worth adding a measure at the key level of how systematic the bias is to revision_analysis, or perhaps a separate function. These seem likely to be estimable beforehand. Exceptions that include at least one case of over-reporting: mn, id, nh These are not later backfilled; they seem more like bad estimates or entry error.

Data substitions

And we actually need to compare this with the data revision estimates:

data_substitutions <- readr::read_csv(
    here::here(glue::glue("flu_data_substitutions.csv")),
    comment = "#",
    show_col_types = FALSE
  ) %>%
  mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
nhsn_archive_flu %>%
  autoplot(value, .facet_filter = geo_value %in% (data_substitutions$geo_value %>% unique())) + geom_point(data = data_substitutions, aes(x = time_value, y = value)) + facet_wrap(~geo_value, scale = "free")

which doesn’t look all that great. To calculate how much closer (or further) we were from the final value, first we construct the relevant snapshots:

final_values <- nhsn_archive_flu %>% epix_as_of_current() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
data_as_it_was <- map(
  data_substitutions$forecast_date %>% unique(),
  \(version) nhsn_archive_flu %>% epix_as_of(version) %>% mutate(forecast_date = version)
) %>%
  list_rbind() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))

and then we compute several notions of differences

full_table <- data_substitutions %>%
  left_join(
    final_values,
    by = join_by(geo_value, time_value), suffix = c("_substituted", "_final")
  ) %>%
  left_join(
    data_as_it_was,
    by = join_by(geo_value, forecast_date, time_value)
  ) %>%
  rename(as_of_value = value) %>%
  mutate()

diffs <- full_table %>%
  mutate(
    abs_diff = value_substituted - value_final,
    rel_diff = abs_diff / value_final,
    rel_rev_diff = abs_diff / (-value_final + as_of_value),
    ) %>%
  arrange(rel_diff)
diffs %>%
  select(-forecast_date, -value_substituted, -value_final, -as_of_value) %>%
  print(n = 23)
## # A tibble: 23 × 5
##    geo_value time_value abs_diff rel_diff rel_rev_diff
##    <chr>     <date>        <dbl>    <dbl>        <dbl>
##  1 sd        2025-01-04      -16  -0.327         0.444
##  2 nm        2025-02-15      -77  -0.312         0.658
##  3 nj        2025-03-22      -84  -0.244         0.272
##  4 ok        2025-02-01     -266  -0.238         5.66 
##  5 ok        2025-02-15     -186  -0.237         0.660
##  6 ms        2025-02-01     -117  -0.218       117    
##  7 ca        2025-02-01     -525  -0.119         3.70 
##  8 de        2025-02-01      -10  -0.0714     -Inf    
##  9 nm        2025-02-01      -14  -0.0654        1.4  
## 10 id        2025-02-01       -9  -0.0566       -3    
## 11 ga        2025-02-15      -34  -0.0463        0.105
## 12 hi        2025-03-01        3   0.0448       -0.2  
## 13 az        2025-02-01       43   0.0533        0.683
## 14 nh        2025-03-15       21   0.266       Inf    
## 15 id        2025-01-04       81   0.335        -1.42 
## 16 tn        2025-03-22      122   0.396        -4.88 
## 17 nh        2025-03-22       27   0.509       Inf    
## 18 dc        2025-02-15       47   0.566        -2.47 
## 19 dc        2025-02-22       58   0.806        -2.64 
## 20 dc        2025-02-22       58   0.806        -4.83 
## 21 dc        2025-02-15       77   0.928       Inf    
## 22 dc        2025-02-15       77   0.928       Inf    
## 23 nh        2025-03-29       66   1.94          0.733

The table is sorted by rel_diff.

  • abs_diff gives how much our adjustment was over by (so positive means we were over the latest value).

  • rel_diff gives the difference relative to the latest value; for nh on 03/29 (the last entry in the table), our estimate was nearly twice as large as the latest value. This is mostly to compensate for the different magnitudes of values across time and geo.

  • rel_rev_diff is the most appropriate to view as a substitution scoring. It divides the abs_diff by how much the value as of the forecast date was off; for nh on 03/29 again, it was merely 73%, so we did bring it closer to the actual value. Any of these which are <1 are “successful” in the sense that we were closer to the latest value than the as_of value was. An infinite value tells us that we adjusted a value that hasn’t been corrected. The sign for rel_rev_diff is a bit confusing, and tells us whether our estimate and the as of value were both larger/smaller than the latest value, or one larger and one smaller.

How many did we substitute a more accurate value?

mean(abs(diffs$rel_rev_diff) < 1)
## [1] 0.3478261

around 35%, so not a great track record overall. How about lower than the target vs higher than the target?

diffs %>%
  mutate(is_over = rel_diff > 0) %>%
  group_by(is_over) %>%
  summarize(fraction_improved = mean(abs(rel_rev_diff) < 1))
## # A tibble: 2 × 2
##   is_over fraction_improved
##   <lgl>               <dbl>
## 1 FALSE               0.455
## 2 TRUE                0.25

So we did marginally better when it was below, but much worse when it was above.

Overall, it turns out our value substitutions did not actually help much for flu.

NSSP

nssp_archive_flu <- qs2::qs_read(here::here("flu_hosp_prod", "objects", "nssp_archive_data"))

nssp_archive_flu <- nssp_archive_flu$DT %>% filter(time_value >= "2024-11-19", geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nssp_archive_flu$time_type <- "day"
revision_summary_nssp <- nssp_archive_flu %>% epiprocess::revision_analysis(nssp, min_waiting_period = NULL)
revision_summary_nssp %>% print(quick_revision = 7)
## 
## ── An epi_archive spanning 2024-11-20 to 2025-04-30. ──
## 
## ── Min lag (time to first version):
##      min median     mean     max
##   4 days 4 days 4.6 days 25 days
## 
## ── Fraction of epi_key + time_values with
## No revisions:
## • 159 out of 1,200 (13.25%)
## 
## Quick revisions (last revision within 7 days of the `time_value`):
## • 132 out of 1,200 (11%)
## 
## Few revisions (At most 3 revisions for that `time_value`):
## • 827 out of 1,200 (68.92%)
## 
## 
## 
## ── Fraction of revised epi_key + time_values which have: 
## 
## Less than 0.1 spread in relative value:
## • 899 out of 1,041 (86.36%)
## 
## Spread of more than 0.703 in actual value (when revised):
## • 44 out of 1,041 (4.23%)
## 
## 
## 
## ── Days until within 20% of the latest value:
##      min median     mean     max
##   4 days 4 days 4.8 days 25 days

So very few weeks have no revision, but ~70% have only a small number of revisions. And most (87%) are very close to their final values

av_re_spread <- revision_summary_nssp$revision_behavior %>%
  group_by(geo_value) %>%
  summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
  arrange(desc(rel_spread)) %>%
  filter(geo_value %nin% c("vi", "as", "gu"))
nssp_archive_flu %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))

nssp_archive_flu %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))

These corrections are way more regular than nhsn, so we can definitely do backcasting to adjust the values.

Revision behavior

Correlation with latest

Does NSSP actually correlate better with the latest value than nhsn itself does? This was a property we were relying on through the season to generate our corrections.

Covid revision

And now for ~ the same idea, but for covid

NHSN

First we get the archive and remove the data older than the first version so as not to clog up the revision behavior, and display the overall revision summary2.

nhsn_archive_covid <- qs2::qs_read(here::here("covid_hosp_prod", "objects", "nhsn_archive_data"))

nhsn_archive_covid <- nhsn_archive_covid$DT %>% filter(time_value >= "2024-11-19") %>%
  filter(geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nhsn_archive_covid$time_type <- "day"
revision_summary <- nhsn_archive_covid %>%
  epiprocess::revision_analysis(value, min_waiting_period = NULL)
revision_summary %>% print(quick_revision = 7)
## 
## ── An epi_archive spanning 2024-11-23 to 2025-05-03. ──
## 
## ── Min lag (time to first version):
##      min median     mean     max
##   4 days 4 days 4.2 days 25 days
## 
## ── Fraction of epi_key + time_values with
## No revisions:
## • 202 out of 1,282 (15.76%)
## 
## Quick revisions (last revision within 7 days of the `time_value`):
## • 366 out of 1,282 (28.55%)
## 
## Few revisions (At most 3 revisions for that `time_value`):
## • 975 out of 1,282 (76.05%)
## 
## 
## 
## ── Fraction of revised epi_key + time_values which have: 
## 
## Less than 0.1 spread in relative value:
## • 612 out of 1,080 (56.67%)
## 
## Spread of more than 929.6 in actual value (when revised):
## • 6 out of 1,080 (0.56%)
## 
## 
## 
## ── Days until within 20% of the latest value:
##      min median     mean      max
##   4 days 4 days 6.8 days 160 days

Around a fifth have no revisions, around a third resolve within a week, and around 3/4ths have a small number of revisions. Around 60% of those with revisions have little relative change (10% of the max value). The “actual value” change isn’t really worth thinking about because this is counts data (so there being only 6 doesn’t tell us much).

Here’s a plot of the version changes for all locations.

nhsn_archive_covid %>% autoplot(value) + theme(strip.text.x = element_text(size = text_size, margin = margin(.1, 0, .1, 0, "cm")), axis.text = element_text(size =text_size, angle = 45), legend.title = element_text(size = text_size), legend.text = element_text(size = text_size), legend.key.size = unit(0.5, "cm")) + scale_size_manual(values = c(0.5))

Since this is probably too small to actually be legible, let’s figure out the states with the worst revision behavior and plot those. Worst in this case meaning worst average relative spread

av_re_spread <- revision_summary$revision_behavior %>%
  group_by(geo_value) %>%
  summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
  arrange(desc(rel_spread)) %>%
  filter(geo_value %nin% c("vi", "as", "gu"))
av_re_spread %>%
  print(n=20)
## # A tibble: 54 × 2
##    geo_value rel_spread
##    <chr>          <dbl>
##  1 pr             0.358
##  2 az             0.291
##  3 dc             0.273
##  4 mo             0.262
##  5 ar             0.243
##  6 ok             0.220
##  7 nm             0.216
##  8 id             0.188
##  9 mt             0.186
## 10 ks             0.175
## 11 nh             0.171
## 12 ri             0.168
## 13 ky             0.161
## 14 hi             0.150
## 15 sd             0.136
## 16 va             0.128
## 17 mn             0.118
## 18 nv             0.114
## 19 or             0.111
## 20 co             0.105
## # ℹ 34 more rows

Which, if you compare with the average relative spread in the Flu section is remarkably similar. The worst 9 excluding the geo’s we don’t actually forecast (so no as or vi, gu).

nhsn_archive_covid %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))

And the next 9 worse

nhsn_archive_covid %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))

Strictly visually, this seems to revise more than the flu data (this doesn’t actually fit well with the numeric revision analysis so something odd is going on). Perhaps the revisions are more chaotic. mn instead of having a single time point for a single version has it’s entire trajectory wrong in the same way that az does, possibly on the same version. az has similar revision behavior (a couple of weeks where they were wildly off).

nh has similar behavior in over-estimating a specific single time value quite badly, but otherwise having typical under-reporting problems.

dc has some new and wild behavior; this is likely as visually striking as it is because the values are so small.

mo has new revision behavior, primarily in the magnitude of the difference.

ok has more extreme under-reporting than in the case of flu, but again this seems to likely be a factor of the number of cases, suggesting their underreporting happens in absolute number of cases rather than relative3.

Like flu, these seem likely to be estimable beforehand.

Data substitutions

And we actually need to compare revision behavior with our estimates of the correct values:

data_substitutions <- readr::read_csv(
    here::here(glue::glue("covid_data_substitutions.csv")),
    comment = "#",
    show_col_types = FALSE
  ) %>%
  mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
nhsn_archive_covid %>%
  autoplot(value, .facet_filter = geo_value %in% (data_substitutions$geo_value %>% unique())) + geom_point(data = data_substitutions, aes(x = time_value, y = value)) + facet_wrap(~geo_value, scale = "free")

To calculate how much closer (or further) we were from the final value, first we construct the relevant snapshots:

final_values <- nhsn_archive_covid %>% epix_as_of_current() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
data_as_it_was <- map(
  data_substitutions$forecast_date %>% unique(),
  \(version) nhsn_archive_covid %>% epix_as_of(version) %>% mutate(forecast_date = version)
) %>%
  list_rbind() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
final_values %>% filter(time_value > "2025-01-03")
## An `epi_df` object, 964 x 3 with metadata:
## * geo_type  = nation
## * time_type = week
## * as_of     = 2025-05-09
## 
## # A tibble: 964 × 3
##    geo_value time_value value
##    <chr>     <date>     <dbl>
##  1 ak        2025-01-04     6
##  2 ak        2025-01-11    14
##  3 ak        2025-01-18     7
##  4 ak        2025-01-25     5
##  5 ak        2025-02-01     9
##  6 ak        2025-02-08    10
##  7 ak        2025-02-15    11
##  8 ak        2025-02-22    10
##  9 ak        2025-03-01    19
## 10 ak        2025-03-08    14
## # ℹ 954 more rows
full_table <- data_substitutions %>%
  left_join(
    final_values,
    by = join_by(geo_value, time_value), suffix = c("_substituted", "_final")
  ) %>%
  left_join(
    data_as_it_was,
    by = join_by(geo_value, forecast_date, time_value)
  ) %>%
  rename(as_of_value = value) %>%
  mutate()

diffs <- full_table %>%
  mutate(
    abs_diff = value_substituted - value_final,
    rel_diff = abs_diff / value_final,
    rel_rev_diff = abs_diff / (-value_final + as_of_value),
    ) %>%
  arrange(rel_diff)
diffs %>%
  select(-forecast_date, -value_substituted, -value_final, -as_of_value) %>%
  print(n = 23)
## # A tibble: 8 × 5
##   geo_value time_value abs_diff rel_diff rel_rev_diff
##   <chr>     <date>        <dbl>    <dbl>        <dbl>
## 1 ga        2025-02-15      -52  -0.198         0.473
## 2 nj        2025-03-22      -40  -0.129         0.149
## 3 ky        2025-03-22       -5  -0.0370        0.263
## 4 dc        2025-02-15        5   0.385        -1    
## 5 ar        2025-03-22       24   0.393        -0.8  
## 6 ak        2025-03-22        5   0.5          -1.67 
## 7 nh        2025-01-11       64   0.744       -10.7  
## 8 sd        2025-03-01       15   0.75         -3.75

The table is sorted by rel_diff. - abs_diff gives how much our adjustment was over by (so positive means we were over the latest value).

  • rel_diff gives the difference relative to the latest value.

  • rel_rev_diff on the other hand divides that by how much the value as of the forecast date was off. Any of these which are <1 are “successful” in the sense that we were closer to the latest value than the as_of value was. An infinite value tells us that we adjusted a value that hasn’t been corrected. The sign for rel_rev_diff is a bit confusing, and tells us whether our estimate and the as of value were both larger/smaller than the latest value, or one larger and one smaller.

We did fewer substitutions in Covid, but they were more regularly good, and there were no cases where we shouldn’t have changed the value and did.

How many did we substitute a more accurate value?

mean(abs(diffs$rel_rev_diff) < 1)
## [1] 0.5

50%, so overall basically guessing. How about lower than the target vs higher than the target?

diffs %>%
  mutate(is_over = rel_diff > 0) %>%
  group_by(is_over) %>%
  summarize(fraction_improved = mean(abs(rel_rev_diff) < 1))
## # A tibble: 2 × 2
##   is_over fraction_improved
##   <lgl>               <dbl>
## 1 FALSE                 1  
## 2 TRUE                  0.2

When we were under, we always improved the situation. However when we were above, sometimes we made it worse.

NSSP

nssp_archive_covid <- qs2::qs_read(here::here("covid_hosp_prod", "objects", "nssp_archive_data"))

nssp_archive_covid <- nssp_archive_covid$DT %>% filter(time_value >= "2024-11-19", geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nssp_archive_covid$time_type <- "day"
revision_summary_nssp <- nssp_archive_covid %>% epiprocess::revision_analysis(nssp, min_waiting_period = NULL)
revision_summary_nssp %>% print(quick_revision = 7)
## 
## ── An epi_archive spanning 2024-11-20 to 2025-04-30. ──
## 
## ── Min lag (time to first version):
##      min median     mean     max
##   4 days 4 days 4.6 days 25 days
## 
## ── Fraction of epi_key + time_values with
## No revisions:
## • 156 out of 1,200 (13%)
## 
## Quick revisions (last revision within 7 days of the `time_value`):
## • 127 out of 1,200 (10.58%)
## 
## Few revisions (At most 3 revisions for that `time_value`):
## • 1,055 out of 1,200 (87.92%)
## 
## 
## 
## ── Fraction of revised epi_key + time_values which have: 
## 
## Less than 0.1 spread in relative value:
## • 848 out of 1,044 (81.23%)
## 
## Spread of more than 0.162 in actual value (when revised):
## • 70 out of 1,044 (6.7%)
## 
## 
## 
## ── Days until within 20% of the latest value:
##      min median     mean     max
##   4 days 4 days 4.8 days 25 days

So few weeks have no revision (only 14%), but ~87% have only a small number of revisions. And most (81%) are close to their final values

av_re_spread <- revision_summary_nssp$revision_behavior %>%
  group_by(geo_value) %>%
  summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
  arrange(desc(rel_spread)) %>%
  filter(geo_value %nin% c("vi", "as", "gu"))
nssp_archive_covid %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))

nssp_archive_covid %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))

A similar story to flu, these corrections are way more regular than nhsn, so we can definitely do backcasting to adjust the values.

Correlations between the two archives

Visually, the flu and covid nhsn datasets seem to have related revision behavior. We should probably study this more carefully, but the question is a somewhat difficult one. One potential way to go about this is for each (time_value, geo_value) pair, do a correlation of the time series across version. Alteratively, we could do the same, but for the differences between versions (so correlate the correction amount).


  1. min_waiting_period is NULL here since we’re plotting mid-season, while quick_revision = 7 because this is weekly data represented using days (because the versions are days).↩︎

  2. min_waiting_period is NULL here since we’re plotting mid-season, while quick_revision = 7 because this is weekly data represented using days (because the versions are days).↩︎

  3. which would definitely be a bit odd of an effect.↩︎