Confirmation bias is a hell of a phenomenon. This search for confirmatory evidence—or even the tendency to warp facts into confirmatory evidence—is even stronger when one has staked her professional reputation on the high-profile successes (and failures) of their work. And confirmation bias is the best explanation I can think of to explain why Helmut Norpoth, an old-timer in election forecasting, political scientist at Stony Brook and I presume otherwise intelligent person, proclaims that Donald Trump has a 91-95% chance of winning the 2020 presidential election.
To some extent, I can’t even believe I have to write up this post. The model Norpoth is relying on for such claims (which he calls “The Primary Model”) is so bad that I had guessed most reasonable people would rightly reject it out of hand. But it turns out that most people are not experts in election forecasting, and many others aren’t reasonable political analysts—or, if they, are Norpoth has duped them.
Put simply, The Primary Model is built upon several fatal flaws in predicting election outcomes. I’ll let Norpoth describe it for you:
It is a statistical model that relies on presidential primaries and an election cycle as predictors of the vote in the general election. This year the model has been calibrated to predict the Electoral College vote.
The technical description of the model is a bit harder to give you. That’s because Norpoth changes it every cycle to keep fixing his mistakes. In 2000, for example, he predicted that Al Gore would win the popular vote and the presidency, and he did the same with Donald Trump in 2016 (assigning a 87% probability that he would win the popular vote, most likely by 5 points, which of course he did not). Yet in both the press appearances for his model and in the online write-up, he ignores these facts. Norpoth ignores both of these errors as he says he has “modified” his model to fix them, and his record should be adjusted retroactively to his new predictions.
That is a lie akin to banks saying after the 2008 financial crisis “oh, you shouldn’t punish our AAA ratings because we know why our mortgage bundling was bad now, so we’re fixing it.” It’s too little too late.
But it’s not just the obscuration his record and mis-predictions that makes Norpoth’s model bad, it’s the way that it is constructed. For his 2016 prediction, he described that he is predicting the Democratic share of the two-party vote in November with (1) the Democratic candidate’s average share of the primary vote in New Hampshire and South Carolina; plus (b) the same for the Republican candidate; plus (c, d) the Democratic candidate’s share of the vote in 2012 and 2008. Hopefully you can recognize why Norpoth calls it “The Primary Model” now; his model assumed that the outcome of the election will be determined by the share of the vote that the presidential candidates win in their primaries.
In 2020, it appears Norpoth has added another variable to reward candidates when they are the incumbent president, though I cannot be sure because he won’t release the full formula.
There are some very obvious ways this model can break down. For one, if a candidate faces more challengers than usual (hello, 2020 is calling!) their share of the vote in the primary will be artificially deflated. Norpoth tries to adjust for this fact by using a candidate’s share of the votes cast only for the top-two candidates, but this does not actually solve the problem; at lower vote shares, small differences will get massively inflated by dividing by the sum. And the model will still be contorted by presidents who don’t face serious primary challenges in their party. This was the case in 2012 when Norpoth predicted Barack Obama would win 53% of the two-party vote for president (admittedly not too far off from the result), but he obtained this prediction by arbitrarily capping Obama’s vote share (which theoretically was close to 100%) at 65%, a number which (I presume) gave him the best results in the past.
The model will also be led astray if there are demographic patterns to candidates’ vote shares in the primary contests—again, as there were in 2020. Joe Biden doesn’t technically even register in Mr Norpoth’s parameter for New Hampshire vote share as he got fifth place there. He only backs out of this hole by adding South Carolina primary results for Democratic Party primary elections since 2008.
There’s also the problem of over-confidence, and a technical issue with model training. Generally, you don’t want to evaluate your models on the same set of data you’re training them on. That’s like peeking in a hat for your name before drawing one out for a raffle. It’s like being in a room before turning the lights out and telling me what’s in it. In my field, we call this “cheating”—not in the playground sense, but in the “you presume your model knows more than it actually does about the world” sense. This is also why Norpoth can get away (with some people, at least) by saying he didn’t mis-predict some election outcomes—he simply trains a new Primary Model to predict them after he already has the results for them.
This year, Norpoth uses his model to predict that Donald Trump will win 362 electoral votes to Joe Biden’s 176. Honestly, I think such a prediction is laughable. The political gravity is indisputably titled against the president right now. He’s in a much bigger hole than he ever was in 2016.
This is where Norpoth gets into confirmation bias territory with his forecast. Perhaps he’s looking around for other sources of information that seem to justify it’s bold conclusions. So you’ll notice on his website that he begins with a slew of indicators (I would consider them cherry-picked) that suggest the president is a slam-dunk for re-election. He uses a Gallup Poll showing Trump with a 49% approval rating in May. He asserts the coronavirus is good for the president because it makes him look like a wartime president (a fraught claim, to be sure). Only then does he dive into the model, implying the above patterns are evidence for why he is right.
Except, as we’ve shown here, Norpoth’s model is not often right. It is quite often unreliable. Statistically, it is severely flawed. Theoretically, it is lacking. And in presentation, it is constantly in search of confirmatory evidence. In 2016, The Primary Model predicted Donald Trump would win by 5 percentage points—a 7 percentage-point error, or 7 times as large as the error in national polls (and about 2 points over par at the state level).
Curiously, Norpoth has charted a way out for his bold predictions this year. And I don’t blame him! They are certainly bold! The following note appears on his website:
Caution: The massive disruptions caused by the Coronavirus outbreak may prompt me to revise the forecast, especially if there is a crack in Trump support.
Norpoth admitted the same on an appearance with Fox News’s “The Ingraham Angle“ in late May, saying “Unless his approval rating collapses, I don’t think [the coronavirus] will have much bearing on my forecast.”
With the president’s approval rating sagging to near-all-time-lows, and his poll numbers against Joe Biden low and not budging, I do wonder if and how Norpoth will “revise” his model before November.