Delphi Forecast Reports

GitHub Repo

Overview

Weekly Fanplots 2024-2025 Season

2024-2025 Season Reports

2023-2024 Season Backtesting

Description of Forecaster Families

The main forecaster families were:

Notes:

Autoregressive models (AR)

Internal name: scaled_pop.

A simple autoregressive model, which predicts using

xt+k=ar(x)

where x is the target variable and ar(x) is a linear combination of the target variable’s past values, which can be scaled according to each state’s population or whitened according to another scheme (or both). In practice, we found that using lags (0, 7) was quite effective (with (0, 7, 14) and (0, 7, 14, 21) providing no discernible advantage), so we focused on those, so in practice our model was

xt+k=xt+xt7

where k{0,7,14,21,28} is the forecast horizon.

Autoregressive models with seasonal features

Internal name: scaled_pop_seasonal.

We tried a few different attempts at incorporating seasonal features:

Autoregressive models with exogenous features

Internal name: scaled_pop_seasonal.

These models could opt into the same seasonal features as the scaled_pop_seasonal forecaster, but also included exogenous features.

Flu exogenous features

Covid exogenous features

Autoregressive models with augmented data

Internal name: scaled_pop (with filter_source = "").

This forecaster is still the standard autoregressive model, but with additional training data. Inspired by UMass-flusion, the additional training data consisted of historical data from ILI+ and Flusurv+, which was brought to a comprable level with NHSN and treated as additional observations of the target variable (hence the name “augmented data”). Flusurv was taken from epidata, but ILI+ was constructed by Evan Ray and given to Richard (Berkeley Summer 2024 intern). Naturally, this forecaster was only used for flu, as the same data was not available for covid.

Scaling Parameters (Data Whitening)

We tried a few different approaches to data whitening.

Climatological

This was our term for a forecaster that directly forecast a distribution built from similar weeks from previous seasons (in analogy with baseline weather forecasting). We found that in some cases it made a reasonable baseline, though when the current season’s peak time was significatly different from the seasons in the training data, it was not particularly effective.

Linear Trend

A simple linear trend model that predicts the median using linear extrapolation from the past 4 weeks of data and then uses residuals to create a distributional forecast.

No Recent Outcome

This was a fall-back forecaster built for the scenario where NHSN data was not going to be reported in time for the start of the forecasting challenge.

A flusion-adjacent model pared down to handle the case of not having the target as a predictor.

x¯t+k=ytkk=0:1+ytkt=0:3

where y here is any set of exogenous variables.

Flatline

A simple “LOCF” forecaster that simply forecasts the last observed value and uses residuals to create a distributional forecast. This is what the FluSight-baseline is based on, so they should be identical.