Living Forecast: Using data science to break down the US election

Data Science
Analytics (offline)
Data Architecture
Data Management
Data Strategy

As the pantomime of the USA presidential election winds down for another four years, vivisection of electoral forecasts has been ongoing as results have trickled in from the last of the states.

 

There are two eminent forecast models for US politics. The FiveThirtyEight model lead by Nate Silver, 2020 election forecast, which had Biden at 89% probability to win and the modelling team lead by Andrew Gelman and Elliot Morris building us POTUS model for The Economist which had Biden to with at 97% probability.

 

For me watching at home, having never followed a US election before and not realising all that comes into play with the electoral college votes, the state-by-state differences in practice and the requisite media circus, it certainly did not feel like such a certain outcome for Biden as the counts came in.

 

For a binary outcome like the US presidential election (Trump or Biden) it is easy to apply a pass/fail mark to any forecast that predicts the correct/incorrect result on the balance of probabilities in a way that isn’t so obvious for other predictions which we widely rely on like the weather or some macroeconomic forecasts. For anyone seeking a lesson in interpreting probability and understanding uncertainty in forecasts Andrew Gelman’s blog is a masterclass in forecast calibration and statistical reasoning from the Bayesian perspective.

 

In both FiveThirtyEight and The Economist models, it is fascinating to see the post-election analysis by each team now that most of the results have come in. For instance, it seems that, as was the case in 2016, the pre-election polling data is biased.

 

This interrogation of results, being made even before the dust has settled on vote counts for all states, is a critical part of true forecasting models that are so often glossed over in industry applications of data science where the model is not only tested out-of-sample but assessed from the most fundamental elements:

· model specification

· data input; and

· interpretability by the end-user

 

In my experience, the careful review of these three areas is what separates an out-of-the-box, elementary model-to-get-an-answer build and a model built to be built upon. A living forecast model that is built and re-built, torn down and re-fit, with results presented in ways that the user cares about and understands.

 

Written by Dr James McKeone.