Lower your expectations for the polls
They are subject to many sources of error and aren’t as precise as you think
Friends,
I wrote this post before taking a trip to Shenandoah National Park yesterday. It’s only a short drive from my home in Washington, DC, that we (my fiancée and I) do not make nearly enough. See the attached photo (from my ageing iPhone, so excuse the low quality). The relevant point from this anecdote is that I meant to finish and send this post before leaving but was too eager for the break from the city to do so.
…
As I have reflected on how the media covered polls and election forecasts this year — and how people have reacted to the election outcome — one take keeps jumping to the front of my mind. It’s that people seem to dramatically overestimate the precision of pre-election surveys. This might also help to explain their harsh reactions when they go awry.
Editor’s Note: This is a paid post for premium subscribers. If you are a subscriber and have friends or family that you think might learn something from this post, you should feel open to forward it to them regardless of their membership. And if you’re a free reader who got this from a friend, consider signing up for posts by clicking the button below!
It is not a mystery where the conventional wisdom about the precision of pre-election surveys comes from. Consider a reductive and short history of poll aggregation and election forecasting. Popular in academia and political science in the late 1900s and oughts, the journalistic application of polling analysis was mainstreamed by Nate Silver in 2008. The enterprise enjoyed a series of high-profile successes during Barack Obama’s elections, with Silver calling 49 and 50 states correctly in 2008 and 2012, respectively. After that, statistical analysis of polls was heralded as a soothsaying practice by the media, with many analysts hired to peer into the crystal ball and Silver the head of them all. A series of successful bets against other prominent media pundits by Silver cemented his role as the wisest of them all — his analysis buttressed by polls that some in the media viewed as infallible predictors when aggregated.
But several important misses by public pollsters shattered this view of public opinion data. Polls underestimated support for Brexit and Donald Trump in the 2016 election by a few percentage points. And if they could not be trusted to predict election results to a T, what could they be used for? After this year’s presidential race, in which polls overestimated Joe Biden’s vote margin by about 5-6 percentage points on average across states, some people have been so bold as to call them useless.
Of course, polls are not useless — or even less accurate than they used to be. On average between 1936 and 2020, national polls have missed the Democratic presidential candidate’s share of the vote by about four percentage points. They will miss this year by the same margin.
Instead, the problem is that many in the media fooled themselves into believing that polls were more accurate than they were.
At their core, polls are just measurements — guesses about how everyone in the target population (adults, voters, seniors, etc) would feel and behave if could talk to them all. But pollsters run into several problems that could cause those guesses to be wrong. We refer to these as sources of error and uncertainty, of which there are three main types:
Coverage error: The chance that the people you can reach via phone, the internet, mail etc, what pollsters call the sampling frame, are systematically unrepresentative of your target population.
Sampling error: The chance that the people you choose to call, the sample, just so happen (IE “by random chance”) to be unrepresentative of the sampling frame.
Non-response bias: The chance that the group of people who choose to answer pollsters’ solicitations and complete their surveys are systematically different than the sampling frame and target population. Usually, “different” is defined by respondents’ demographic traits, but pollsters are increasingly thinking about political variation too.
It is tough to quantify exactly how much uncertainty is introduced by these sources of error, but here’s something you should know: The traditional margin of error reported by a pollster is not nearly large enough to cover them all. That’s because the traditional margin of error (MOE) only captures sampling error, that’s why it decreases with the number of people a pollster talks to. This is obviously not a good representation of the true uncertainty in the data, a fact we can quantify by measuring the error in pre-election polls. One study did just that and found that the “true” margin of error for a poll is at least twice as large as the one you see reported in the news.
I have to wonder whether readers would think about polls differently if we reminded them of this uncertainty every time we mentioned public opinion. Of course, this is what our election forecasting models try to accomplish, but we only get so much attention in a media environment that is otherwise inundated with headlines like “Biden Up 17 Points in Wisconsin” and “2020 Election Poll: Biden Leads Trump by Double Digits.” Also of note: Those headlines tend to play better on social media than a forecast that says “Biden and Trump are within the margin of error for historical accuracy of pre-election polls,” which only amplifies communicative difficulties.
…
I’m at the point with this piece that I may start rambling, so I’ll leave it here for now. The point is that individual polls are less accurate, and systematic biases more common, than we think. Internalizing the measurement error in public opinion surveys might help people from being burned by an outcome the next time an average or above-average miss occurs.
Looking east into Virginia at Moormans River:
I don't think we have anyone in the mainstream media, talking about polling the way it should covered. This means that the inaccurate conventional wisdom takes on polling remain.
In 2012, the conventional wisdom was that polls were imperfect, we shouldn't trust them. The polls were pretty accurate and Obama got reelected fairly easily. The general reaction was that we should trust polls.
In 2016, the conventional wisdom was that polls were perfect, but a normal polling error occurred and Trump got elected. The general reaction was that we shouldn't trust polls.
In 2020, the reaction to the polls being off again, have led to reaction that we should never trust polls again.
Maybe it was just people's bias talking, but I saw a lot of takes that Biden would win by double digits and win 413 Electoral votes. I personally thought that was a bit on the crazy side. Biden winning by double digits and Texas seemed unlikely given polarization. I think this may have caused the reaction of "we should never trust the polls again". The "blue shift" effect may also have contributed to that narrative as well since Trump was leading in MI, WI, and PA by a lot on Election Night.
Hello Elliott, If the margin of error generally reported only reflects sampling error and the “true-er” margin of error is really double or more than the sampling error that means typical state polling error is 6 to 10 to maybe 13 percent. It would seem then that polls would then be “reliable” only in states that knowledgeable politicians would be able to call without polling. Meaning it is likely that Utah goes Republican or DC goes Democratic. In states that are in the Lean Dem, Lean Rep or Tossup states to use Cook Political report categories polls provide little directional insight. Do I have that right?