What's next for data-driven election journalism? | #214 - April 25, 2023
Hopefully, a focus on polling methods and the substance of public opinion, not just predictions
Dear readers,
I am sure by now the majority of you have seen the news that many (most?) of the journalists at data-journalism website FiveThirtyEight, including Nate Silver, have been laid off (or, in Silver’s case, are expecting to leave/be terminated shortly).
I am not going to comment on whatever business calculations Disney and ABC News, which own FiveThirtyEight.com, are making, but I do want to take the opportunity to highlight what I see as a potential moment for political journalists to advance the way we cover elections. In particular, I see 2024 as an opportunity for “data-driven” (or perhaps “data-first”) election journalism to embrace a new set of guiding principles centred around (1) discussing the particular challenges of polling in our day and age as well as (2) covering the substance of public opinion more broadly, rather than making predictions for prediction’s sake or committing to narratives without first consulting the data (what I see as the big misstep with 538, and particularly Nate Silver—with exceptions for some of the most talented politics journalists there).
Brief thoughts follow. I imagine I will write a longer version of this soon, so feedback is welcome.
In no particular order, here is a list of some precise problems that election forecasting faces:
Appetite for horse-race coverage is very high; Supply of polls is also high; But some polls are better than others — which do we trust?
The loudest voices covering elections do not dive into the methodologies that generate polling numbers, preferring (in the most high-profile case) to instead rely on backwards-looking algorithms that give imperfect predictions of the quality of a poll based only on its performance in past elections. This exposes models to higher probabilities of failure.A harder challenge than that of legitimate pollsters having different degrees of accuracy is the problem of illegitimate firms. Forecasts can be biased by the inclusion of ideologically motivated and even borderline unscientific polls — some notable examples include the firm Rasmussen and amateur clubs of high school seniors.
Modelers need frameworks for excluding these surveys from their models. These standards may not be quantitative but they can nevertheless be rigorous.A key iteration in the analysis of public opinion is a focus on the crosstabs. But sample sizes are low and response rates even lower. We cannot trust single reading of demographic “tabs,” as the pollsters say, much less trends in them. Journalists must collect more data here and acknowledge interpretability issues.
Visualizing uncertainty is hard, but necessary. RealClearPolitics and 538 in recent years have chosen not to show uncertainty alongside their polling averages, the latter doing so only on its forecast pages. This is a mistake.
We can generalize these problems into a set of broader issues for data journalists:
Reporters and editors in the mainstream press, as well as the audiences they serve, seek precision. It is the job of “news nerds” (as “data journalists” were once called) to convincingly sell them a quantified measure of (un)certainty
Models, if they are to be used (and they are not always necessary to tell a story), must be robust to parameters changing over time, and must explore the uncertainty of both the data and the model. They must only be as complex as is necessary
Journalists doing data work must work closely with visualizers to produce charts, interactives and so forth that convey the story at a level that can be understood by readers
And we can apply these principles back to the current status quo of polling aggregation/election forecasting/what-have-you to glean some solutions:
We need to move beyond “Frankenstein” models of polls and elections which shove different models of economics, politics and polls into one probabilistic prediction and miss a lot of nuance about how data are generated and how various predictive features are correlated. In particular, we need to program models in statistical languages that propagate uncertainty in all parts of the model (Bayesians, rejoice), even if this means we sacrifice some interpretability of the model.
We need to hire more data-driven reporters who treat polls as unequal. Politics is not sports; do not get recorded directly from an observable process with a single true answer (how many balls a baseball player hits). Instead, election data journalists must investigate the “data-generating process” behind each number before deciding whether they are worth including in subsequent aggregation and forecasting models. This would help bridge the gap between mental and existing statistical models of elections and politics.
We need to focus on visualizing the _distributions of outcomes_ that are generated by our polling aggregation and election forecasting models, rather than the probabilities of outcomes occurring. This will be a big change for some, but the purpose of modeling elections, in general, is to invite readers to think through what might happen in a given contest if polls were “as wrong as they were in X year” or if “the campaign moves as rapidly as in Y year”.
We need modelers to talk directly with journalists writing about the news for mass audiences, such that they can better shape down-stream coverage. This may look like teams of forecasters embedded with national politics desks, working alongside the nightly news shows, or establishing a consortium of analysts who work together with multiple outlets (much like how the exit polls are run).
An election data journalism website which implemented these solutions would be at the top of the field. These would not be just steps, but leaps towards better election coverage in the US. A sensible business model could draw an even bigger audience than pre-existing outlets/efforts.
(Interested readers may consult this paper for more.)
In regards to FiveThirtyEight: It was a revolutionary site when it launched. I think people forget just how bad coverage of polls and elections was prior to the “data journalism” revolution. Silver and his contemporaries at places like Pollster.com really enhanced our understanding of politics with tools like polling aggregation and models of uncertainty (“forecasts”) and with daily commentary on the horse race.
It is easy to mourn a suspected loss of those iterations (if you assume FiveThirtyEight will cease to exist, which is not clear to me). But I think that is an overreaction. The optimist in me thinks that 538 was always just the first step in the longer project with the goal of delivering better data-driven election journalism to every news consumer in America. There are a lot of talented people out there trying a lot of new tricks and techniques to increase literacy about polls and push narratives closer to the data (and the people).
In Nate Silver’s 2014 essay “What The Fox Knows,” which announced the birth of a standalone FiveThirtyEight (then owned by ESPN), he urged journalists and consumers alike to “start making the news a little nerdier.” Nearly ten years later, there are even more tools at our disposal to achieve that mission today.
More to come…
Elliott
Thanks for your column.
But I think 538 already did most of what you suggest in your column. And its podcast delved deeper and wider into polling, politics, and political issues.
I share your hopes for the future of data-driven journalism, but with cutbacks across the media universe, it’s hard to be optimistic.
I enjoyed 538’s journalists, website, and podcast, and think it was unique in the media landscape. What do you see on the horizon that will be better?