\[\\[.4in]\]
There is substantial under-reporting behavior that is fairly consistent for a single geo. We can probably improve our forecasts by including revision behavior. Further, flu and covid revision behavior is fairly strongly correlated; it is reported through the same channels by the same people, so this makes sense. We should look into the extra columns to see if it provides useful information for handling revision behavior.
First we get the archive and remove the data older than the first version so as not to clog up the revision behavior, and display the overall revision summary1.
nhsn_archive_flu <- qs2::qs_read(here::here("flu_hosp_prod", "objects", "nhsn_archive_data"))
nhsn_archive_flu <- nhsn_archive_flu$DT %>% filter(time_value >= "2024-11-19", geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nhsn_archive_flu$time_type <- "day"
revision_summary <- nhsn_archive_flu %>% epiprocess::revision_analysis(value, min_waiting_period = NULL)
revision_summary %>% print(quick_revision = 7)
##
## ── An epi_archive spanning 2024-11-23 to 2025-05-03. ──
##
## ── Min lag (time to first version):
## min median mean max
## 4 days 4 days 4.2 days 25 days
##
## ── Fraction of epi_key + time_values with
## No revisions:
## • 205 out of 1,282 (15.99%)
##
## Quick revisions (last revision within 7 days of the `time_value`):
## • 357 out of 1,282 (27.85%)
##
## Few revisions (At most 3 revisions for that `time_value`):
## • 925 out of 1,282 (72.15%)
##
##
##
## ── Fraction of revised epi_key + time_values which have:
##
## Less than 0.1 spread in relative value:
## • 604 out of 1,077 (56.08%)
##
## Spread of more than 2719.05 in actual value (when revised):
## • 7 out of 1,077 (0.65%)
##
##
##
## ── Days until within 20% of the latest value:
## min median mean max
## 4 days 4 days 6.7 days 160 days
So around a fifth have no revisions, around a quarter resolve within a week, and around 2/3rds have a small number of revisions. Around half of those with revisions have little relative change (10% of the max value). The “actual value” change isn’t really worth thinking about because this is counts data (so there being only 6 doesn’t tell us much).
Here’s a plot of the version changes for all locations.
text_size <- 6
nhsn_archive_flu %>% autoplot(value) + theme(strip.text.x = element_text(size = text_size, margin = margin(.1, 0, .1, 0, "cm")), axis.text = element_text(size =text_size, angle = 45), legend.title = element_text(size = text_size), legend.text = element_text(size = text_size), legend.key.size = unit(0.5, "cm")) + scale_size_manual(values = c(0.5))
Since this is probably too small to actually be legible, let’s figure out the states with the worst revision behavior and plot those. Worst in this case meaning worst average relative spread
av_re_spread <- revision_summary$revision_behavior %>%
group_by(geo_value) %>%
summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
arrange(desc(rel_spread)) %>%
filter(geo_value %nin% c("vi", "as", "gu"))
av_re_spread %>%
print(n=20)
## # A tibble: 54 × 2
## geo_value rel_spread
## <chr> <dbl>
## 1 pr 0.371
## 2 ok 0.266
## 3 az 0.263
## 4 ar 0.220
## 5 nh 0.218
## 6 nm 0.217
## 7 id 0.215
## 8 dc 0.207
## 9 mo 0.205
## 10 or 0.175
## 11 mt 0.155
## 12 ri 0.145
## 13 ak 0.142
## 14 sd 0.134
## 15 md 0.128
## 16 ga 0.118
## 17 mn 0.117
## 18 ky 0.115
## 19 va 0.113
## 20 hi 0.111
## # ℹ 34 more rows
The worst 9 excluding the geo’s we don’t actually forecast (so no
as
or vi
, gu
).
nhsn_archive_flu %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))
And the next 9 worse
nhsn_archive_flu %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))
These cover a range of revision behaviors; some are off on a single
time_value
in a terrible way, such as mn
or
nh
, while most have fairly constant revisioning.
az
is unusual in having two bad version
s when
they revised the entire history down by a factor of 2, and then reverted
to what it had been previously. Most are systematically reporting lower
than the actual values, with some such as nm
,
nh
, ok
, and ak
particularly bad
about this. It is probably worth adding a measure at the key level of
how systematic the bias is to revision_analysis
, or perhaps
a separate function. These seem likely to be estimable beforehand.
Exceptions that include at least one case of over-reporting:
mn
, id
, nh
These are not later
backfilled; they seem more like bad estimates or entry error.
And we actually need to compare this with the data revision estimates:
data_substitutions <- readr::read_csv(
here::here(glue::glue("flu_data_substitutions.csv")),
comment = "#",
show_col_types = FALSE
) %>%
mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
nhsn_archive_flu %>%
autoplot(value, .facet_filter = geo_value %in% (data_substitutions$geo_value %>% unique())) + geom_point(data = data_substitutions, aes(x = time_value, y = value)) + facet_wrap(~geo_value, scale = "free")
which doesn’t look all that great. To calculate how much closer (or further) we were from the final value, first we construct the relevant snapshots:
final_values <- nhsn_archive_flu %>% epix_as_of_current() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
data_as_it_was <- map(
data_substitutions$forecast_date %>% unique(),
\(version) nhsn_archive_flu %>% epix_as_of(version) %>% mutate(forecast_date = version)
) %>%
list_rbind() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
and then we compute several notions of differences
full_table <- data_substitutions %>%
left_join(
final_values,
by = join_by(geo_value, time_value), suffix = c("_substituted", "_final")
) %>%
left_join(
data_as_it_was,
by = join_by(geo_value, forecast_date, time_value)
) %>%
rename(as_of_value = value) %>%
mutate()
diffs <- full_table %>%
mutate(
abs_diff = value_substituted - value_final,
rel_diff = abs_diff / value_final,
rel_rev_diff = abs_diff / (-value_final + as_of_value),
) %>%
arrange(rel_diff)
diffs %>%
select(-forecast_date, -value_substituted, -value_final, -as_of_value) %>%
print(n = 23)
## # A tibble: 23 × 5
## geo_value time_value abs_diff rel_diff rel_rev_diff
## <chr> <date> <dbl> <dbl> <dbl>
## 1 sd 2025-01-04 -16 -0.327 0.444
## 2 nm 2025-02-15 -77 -0.312 0.658
## 3 nj 2025-03-22 -84 -0.244 0.272
## 4 ok 2025-02-01 -266 -0.238 5.66
## 5 ok 2025-02-15 -186 -0.237 0.660
## 6 ms 2025-02-01 -117 -0.218 117
## 7 ca 2025-02-01 -525 -0.119 3.70
## 8 de 2025-02-01 -10 -0.0714 -Inf
## 9 nm 2025-02-01 -14 -0.0654 1.4
## 10 id 2025-02-01 -9 -0.0566 -3
## 11 ga 2025-02-15 -34 -0.0463 0.105
## 12 hi 2025-03-01 3 0.0448 -0.2
## 13 az 2025-02-01 43 0.0533 0.683
## 14 nh 2025-03-15 21 0.266 Inf
## 15 id 2025-01-04 81 0.335 -1.42
## 16 tn 2025-03-22 122 0.396 -4.88
## 17 nh 2025-03-22 27 0.509 Inf
## 18 dc 2025-02-15 47 0.566 -2.47
## 19 dc 2025-02-22 58 0.806 -2.64
## 20 dc 2025-02-22 58 0.806 -4.83
## 21 dc 2025-02-15 77 0.928 Inf
## 22 dc 2025-02-15 77 0.928 Inf
## 23 nh 2025-03-29 66 1.94 0.733
The table is sorted by rel_diff
.
abs_diff
gives how much our adjustment was over by
(so positive means we were over the latest value).
rel_diff
gives the difference relative to the latest
value; for nh
on 03/29
(the last entry in the
table), our estimate was nearly twice as large as the latest value. This
is mostly to compensate for the different magnitudes of values across
time and geo.
rel_rev_diff
is the most appropriate to view as a
substitution scoring. It divides the abs_diff
by how much
the value as of the forecast date was off; for nh
on
03/29
again, it was merely 73%, so we did bring it closer
to the actual value. Any of these which are <1 are “successful” in
the sense that we were closer to the latest value than the as_of value
was. An infinite value tells us that we adjusted a value that hasn’t
been corrected. The sign for rel_rev_diff
is a bit
confusing, and tells us whether our estimate and the as of value were
both larger/smaller than the latest value, or one larger and one
smaller.
How many did we substitute a more accurate value?
mean(abs(diffs$rel_rev_diff) < 1)
## [1] 0.3478261
around 35%, so not a great track record overall. How about lower than the target vs higher than the target?
diffs %>%
mutate(is_over = rel_diff > 0) %>%
group_by(is_over) %>%
summarize(fraction_improved = mean(abs(rel_rev_diff) < 1))
## # A tibble: 2 × 2
## is_over fraction_improved
## <lgl> <dbl>
## 1 FALSE 0.455
## 2 TRUE 0.25
So we did marginally better when it was below, but much worse when it was above.
Overall, it turns out our value substitutions did not actually help much for flu.
nssp_archive_flu <- qs2::qs_read(here::here("flu_hosp_prod", "objects", "nssp_archive_data"))
nssp_archive_flu <- nssp_archive_flu$DT %>% filter(time_value >= "2024-11-19", geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nssp_archive_flu$time_type <- "day"
revision_summary_nssp <- nssp_archive_flu %>% epiprocess::revision_analysis(nssp, min_waiting_period = NULL)
revision_summary_nssp %>% print(quick_revision = 7)
##
## ── An epi_archive spanning 2024-11-20 to 2025-04-30. ──
##
## ── Min lag (time to first version):
## min median mean max
## 4 days 4 days 4.6 days 25 days
##
## ── Fraction of epi_key + time_values with
## No revisions:
## • 159 out of 1,200 (13.25%)
##
## Quick revisions (last revision within 7 days of the `time_value`):
## • 132 out of 1,200 (11%)
##
## Few revisions (At most 3 revisions for that `time_value`):
## • 827 out of 1,200 (68.92%)
##
##
##
## ── Fraction of revised epi_key + time_values which have:
##
## Less than 0.1 spread in relative value:
## • 899 out of 1,041 (86.36%)
##
## Spread of more than 0.703 in actual value (when revised):
## • 44 out of 1,041 (4.23%)
##
##
##
## ── Days until within 20% of the latest value:
## min median mean max
## 4 days 4 days 4.8 days 25 days
So very few weeks have no revision, but ~70% have only a small number of revisions. And most (87%) are very close to their final values
av_re_spread <- revision_summary_nssp$revision_behavior %>%
group_by(geo_value) %>%
summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
arrange(desc(rel_spread)) %>%
filter(geo_value %nin% c("vi", "as", "gu"))
nssp_archive_flu %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))
nssp_archive_flu %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))
These corrections are way more regular than nhsn, so we can definitely do backcasting to adjust the values.
Does NSSP actually correlate better with the latest value than nhsn itself does? This was a property we were relying on through the season to generate our corrections.
And now for ~ the same idea, but for covid
First we get the archive and remove the data older than the first version so as not to clog up the revision behavior, and display the overall revision summary2.
nhsn_archive_covid <- qs2::qs_read(here::here("covid_hosp_prod", "objects", "nhsn_archive_data"))
nhsn_archive_covid <- nhsn_archive_covid$DT %>% filter(time_value >= "2024-11-19") %>%
filter(geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nhsn_archive_covid$time_type <- "day"
revision_summary <- nhsn_archive_covid %>%
epiprocess::revision_analysis(value, min_waiting_period = NULL)
revision_summary %>% print(quick_revision = 7)
##
## ── An epi_archive spanning 2024-11-23 to 2025-05-03. ──
##
## ── Min lag (time to first version):
## min median mean max
## 4 days 4 days 4.2 days 25 days
##
## ── Fraction of epi_key + time_values with
## No revisions:
## • 202 out of 1,282 (15.76%)
##
## Quick revisions (last revision within 7 days of the `time_value`):
## • 366 out of 1,282 (28.55%)
##
## Few revisions (At most 3 revisions for that `time_value`):
## • 975 out of 1,282 (76.05%)
##
##
##
## ── Fraction of revised epi_key + time_values which have:
##
## Less than 0.1 spread in relative value:
## • 612 out of 1,080 (56.67%)
##
## Spread of more than 929.6 in actual value (when revised):
## • 6 out of 1,080 (0.56%)
##
##
##
## ── Days until within 20% of the latest value:
## min median mean max
## 4 days 4 days 6.8 days 160 days
Around a fifth have no revisions, around a third resolve within a week, and around 3/4ths have a small number of revisions. Around 60% of those with revisions have little relative change (10% of the max value). The “actual value” change isn’t really worth thinking about because this is counts data (so there being only 6 doesn’t tell us much).
Here’s a plot of the version changes for all locations.
nhsn_archive_covid %>% autoplot(value) + theme(strip.text.x = element_text(size = text_size, margin = margin(.1, 0, .1, 0, "cm")), axis.text = element_text(size =text_size, angle = 45), legend.title = element_text(size = text_size), legend.text = element_text(size = text_size), legend.key.size = unit(0.5, "cm")) + scale_size_manual(values = c(0.5))
Since this is probably too small to actually be legible, let’s figure out the states with the worst revision behavior and plot those. Worst in this case meaning worst average relative spread
av_re_spread <- revision_summary$revision_behavior %>%
group_by(geo_value) %>%
summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
arrange(desc(rel_spread)) %>%
filter(geo_value %nin% c("vi", "as", "gu"))
av_re_spread %>%
print(n=20)
## # A tibble: 54 × 2
## geo_value rel_spread
## <chr> <dbl>
## 1 pr 0.358
## 2 az 0.291
## 3 dc 0.273
## 4 mo 0.262
## 5 ar 0.243
## 6 ok 0.220
## 7 nm 0.216
## 8 id 0.188
## 9 mt 0.186
## 10 ks 0.175
## 11 nh 0.171
## 12 ri 0.168
## 13 ky 0.161
## 14 hi 0.150
## 15 sd 0.136
## 16 va 0.128
## 17 mn 0.118
## 18 nv 0.114
## 19 or 0.111
## 20 co 0.105
## # ℹ 34 more rows
Which, if you compare with the average relative spread in the Flu section is remarkably similar. The worst 9
excluding the geo’s we don’t actually forecast (so no as
or
vi
, gu
).
nhsn_archive_covid %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))
And the next 9 worse
nhsn_archive_covid %>% autoplot(.facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))
Strictly visually, this seems to revise more than the flu data (this
doesn’t actually fit well with the numeric revision analysis so
something odd is going on). Perhaps the revisions are more chaotic.
mn
instead of having a single time point for a single
version has it’s entire trajectory wrong in the same way that
az
does, possibly on the same version
.
az
has similar revision behavior (a couple of weeks where
they were wildly off).
nh
has similar behavior in over-estimating a specific
single time value quite badly, but otherwise having typical
under-reporting problems.
dc
has some new and wild behavior; this is likely as
visually striking as it is because the values are so small.
mo
has new revision behavior, primarily in the magnitude
of the difference.
ok
has more extreme under-reporting than in the case of
flu, but again this seems to likely be a factor of the number of cases,
suggesting their underreporting happens in absolute number of cases
rather than relative3.
Like flu, these seem likely to be estimable beforehand.
And we actually need to compare revision behavior with our estimates of the correct values:
data_substitutions <- readr::read_csv(
here::here(glue::glue("covid_data_substitutions.csv")),
comment = "#",
show_col_types = FALSE
) %>%
mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
nhsn_archive_covid %>%
autoplot(value, .facet_filter = geo_value %in% (data_substitutions$geo_value %>% unique())) + geom_point(data = data_substitutions, aes(x = time_value, y = value)) + facet_wrap(~geo_value, scale = "free")
To calculate how much closer (or further) we were from the final value, first we construct the relevant snapshots:
final_values <- nhsn_archive_covid %>% epix_as_of_current() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
data_as_it_was <- map(
data_substitutions$forecast_date %>% unique(),
\(version) nhsn_archive_covid %>% epix_as_of(version) %>% mutate(forecast_date = version)
) %>%
list_rbind() %>% mutate(time_value = round_date(time_value, unit = "week", week_start = 6))
final_values %>% filter(time_value > "2025-01-03")
## An `epi_df` object, 964 x 3 with metadata:
## * geo_type = nation
## * time_type = week
## * as_of = 2025-05-09
##
## # A tibble: 964 × 3
## geo_value time_value value
## <chr> <date> <dbl>
## 1 ak 2025-01-04 6
## 2 ak 2025-01-11 14
## 3 ak 2025-01-18 7
## 4 ak 2025-01-25 5
## 5 ak 2025-02-01 9
## 6 ak 2025-02-08 10
## 7 ak 2025-02-15 11
## 8 ak 2025-02-22 10
## 9 ak 2025-03-01 19
## 10 ak 2025-03-08 14
## # ℹ 954 more rows
full_table <- data_substitutions %>%
left_join(
final_values,
by = join_by(geo_value, time_value), suffix = c("_substituted", "_final")
) %>%
left_join(
data_as_it_was,
by = join_by(geo_value, forecast_date, time_value)
) %>%
rename(as_of_value = value) %>%
mutate()
diffs <- full_table %>%
mutate(
abs_diff = value_substituted - value_final,
rel_diff = abs_diff / value_final,
rel_rev_diff = abs_diff / (-value_final + as_of_value),
) %>%
arrange(rel_diff)
diffs %>%
select(-forecast_date, -value_substituted, -value_final, -as_of_value) %>%
print(n = 23)
## # A tibble: 8 × 5
## geo_value time_value abs_diff rel_diff rel_rev_diff
## <chr> <date> <dbl> <dbl> <dbl>
## 1 ga 2025-02-15 -52 -0.198 0.473
## 2 nj 2025-03-22 -40 -0.129 0.149
## 3 ky 2025-03-22 -5 -0.0370 0.263
## 4 dc 2025-02-15 5 0.385 -1
## 5 ar 2025-03-22 24 0.393 -0.8
## 6 ak 2025-03-22 5 0.5 -1.67
## 7 nh 2025-01-11 64 0.744 -10.7
## 8 sd 2025-03-01 15 0.75 -3.75
The table is sorted by rel_diff
. - abs_diff
gives how much our adjustment was over by (so positive means we were
over the latest value).
rel_diff
gives the difference relative to the latest
value.
rel_rev_diff
on the other hand divides that by how
much the value as of the forecast date was off. Any of these which are
<1 are “successful” in the sense that we were closer to the latest
value than the as_of value was. An infinite value tells us that we
adjusted a value that hasn’t been corrected. The sign for
rel_rev_diff
is a bit confusing, and tells us whether our
estimate and the as of value were both larger/smaller than the latest
value, or one larger and one smaller.
We did fewer substitutions in Covid, but they were more regularly good, and there were no cases where we shouldn’t have changed the value and did.
How many did we substitute a more accurate value?
mean(abs(diffs$rel_rev_diff) < 1)
## [1] 0.5
50%, so overall basically guessing. How about lower than the target vs higher than the target?
diffs %>%
mutate(is_over = rel_diff > 0) %>%
group_by(is_over) %>%
summarize(fraction_improved = mean(abs(rel_rev_diff) < 1))
## # A tibble: 2 × 2
## is_over fraction_improved
## <lgl> <dbl>
## 1 FALSE 1
## 2 TRUE 0.2
When we were under, we always improved the situation. However when we were above, sometimes we made it worse.
nssp_archive_covid <- qs2::qs_read(here::here("covid_hosp_prod", "objects", "nssp_archive_data"))
nssp_archive_covid <- nssp_archive_covid$DT %>% filter(time_value >= "2024-11-19", geo_value %nin% c("vi", "as", "gu")) %>% as_epi_archive()
nssp_archive_covid$time_type <- "day"
revision_summary_nssp <- nssp_archive_covid %>% epiprocess::revision_analysis(nssp, min_waiting_period = NULL)
revision_summary_nssp %>% print(quick_revision = 7)
##
## ── An epi_archive spanning 2024-11-20 to 2025-04-30. ──
##
## ── Min lag (time to first version):
## min median mean max
## 4 days 4 days 4.6 days 25 days
##
## ── Fraction of epi_key + time_values with
## No revisions:
## • 156 out of 1,200 (13%)
##
## Quick revisions (last revision within 7 days of the `time_value`):
## • 127 out of 1,200 (10.58%)
##
## Few revisions (At most 3 revisions for that `time_value`):
## • 1,055 out of 1,200 (87.92%)
##
##
##
## ── Fraction of revised epi_key + time_values which have:
##
## Less than 0.1 spread in relative value:
## • 848 out of 1,044 (81.23%)
##
## Spread of more than 0.162 in actual value (when revised):
## • 70 out of 1,044 (6.7%)
##
##
##
## ── Days until within 20% of the latest value:
## min median mean max
## 4 days 4 days 4.8 days 25 days
So few weeks have no revision (only 14%), but ~87% have only a small number of revisions. And most (81%) are close to their final values
av_re_spread <- revision_summary_nssp$revision_behavior %>%
group_by(geo_value) %>%
summarize(rel_spread = mean(rel_spread, na.rm = TRUE)) %>%
arrange(desc(rel_spread)) %>%
filter(geo_value %nin% c("vi", "as", "gu"))
nssp_archive_covid %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[1:9]) + theme(strip.text.x = element_text(size = 8))
nssp_archive_covid %>% autoplot(nssp, .facet_filter = geo_value %in% av_re_spread$geo_value[10:18]) + theme(strip.text.x = element_text(size = 8))
A similar story to flu, these corrections are way more regular than nhsn, so we can definitely do backcasting to adjust the values.
Visually, the flu and covid nhsn datasets seem to have related
revision behavior. We should probably study this more carefully, but the
question is a somewhat difficult one. One potential way to go about this
is for each (time_value, geo_value)
pair, do a correlation
of the time series across version
. Alteratively, we could
do the same, but for the differences between versions (so correlate the
correction amount).
min_waiting_period
is NULL
here since we’re plotting mid-season, while
quick_revision = 7
because this is weekly data represented
using days (because the versions are days).↩︎
min_waiting_period
is NULL
here since we’re plotting mid-season, while
quick_revision = 7
because this is weekly data represented
using days (because the versions are days).↩︎
which would definitely be a bit odd of an effect.↩︎