I’m collecting responses to this question for a larger project I’m working on. What do you want to know about polling?
(Maybe you think you know everything. In that case, what do you find most important about polls?)
Ok, discuss!
EDIT 10:31 PM EST: I’m going offline now—thanks all for the engagement. I think we all learned a lot and I got some great ideas for my project. Do feel free to leave residual comments, however insightful!
What I can say is that they seem to be much more inclined to vote for Democratic candidates than for Republican ones. The bigger problem for Dems is getting them to actually vote. Turnout is tough! Will have more on this as we get closer to November.
What I want to see in polls is a question that asks the respondent where they primarily get their news from. My hypothesis is that the best indicator for support / opposition to an issue is the answer to that question. And yet, I rarely ever see that question asked.
I feel like I've seen aspects of this before somewhere, but I think it would be interesting to have graphs/discussion of the (theoretical) relationship between sample size and margin of error in polls, compared to the the empirical relationship across the past few elections. Especially topical given the discussion over the small sample size of many democratic primary polls.
Good idea. The margin of error cited in a poll is often much smaller than its predictive margin of error (as you note), typically due to unreliability in a poll’s weighting scheme or likely voter filter. I’ll explore some visualization about this. Good idea!
Although polling has been quite good in recent years at predicting vote share, isn’t it still reasonable is it to expect at least one polling miss to occur during the primaries like we saw in Michigan in 2016? If so, which states would you watch out for a potential poll miss?
Many pollsters (in Michigan in particular) do not seem to be adapting to their 2016 polling errors. It seems likely we’ll see another big one, though perhaps not on the scale of MI’s 2016 Democratic primary (the 20-point miss was historically large).
As you approach more and more polls from MRP / model-based approaches, how does that change the way you quantify and, more importantly, think about uncertainty?
Response weighting prevent response bias from "skewing" the top line results, but wouldn't the variance of undersampled groups be higher than for more-sampled groups? Would it be correct to claim that projections of a candidate's standing with frequently undersampled groups (younger, Latino, less educated, etc.) based on polling ought to have wider error bars, so to speak?
It’s not very helpful. If 50% of people think something will happen, the science tells us that has practically correlation to an event’s actual probability.
Do head-to-head polls take into account differing turnout rates between, say, dem nominees vs Trump, or do they assume the same group of voters would be voting in each scenario?
Yeah, likely voter screens typically pick this up because we leverage respondents’ stated propensities to vote in the model. Not a huge worry for polls, but a good strategic question!
How do pollsters deal with most people not answering their phones unless they know who it is? Seems like there could be a bias there. Like, what subset of voters always answers their cell?
A good question. We do know that the type of people who take polls are much more political than the ones who don’t. So on questions of voting or political volunteering, etc polls massively overestimate engagement. Pollsters are typically careful of this—few use them to gauge political activity. I’d not this doesn’t create a whole lot of partisan bias, though. Once a pollster gets someone enough randomly-sampled people on the phone they can pretty much weight their way into a good sample after that.
Sorry, my mind is filled with polling questions now lol
How would practices differ if we elected officials with other electoral systems? Going from the electoral college to a national popular vote would mean looking more at the national polls than state polls. But what would pollsters do for an approval-based voting system? Or ranked choice? Or condorcet systems?
In Maine, pollsters have largely adapted by asking voters their second- and third-choice preferences. This produced reliable outcomes in 2018 (when the state first used IRV for federal elections). It is not so complex a method to adapt to polls. We have also experimented with asking voters to rank their preferences and then simulating IRV or AV from there.
are party registration and past vote reliable? seems like party registration fluctuates a good amount, and people might not be great at remembering past votes.
The trick is to get people’s past vote as soon as possible after they vote. Your concern over party registration is very pronounced in times of instability, but since 2010 party ID has basically been the same nationally. When I weight I use benchmarks relative to the 2016 election, allowing for some uncertainty in the benchmark.
oh, interesting! do you have any blog posts that walk through the different weighting schemes people use? I'd love to see more about weighting based on party ID and past votes
To follow up on weighting by race, age, and education... how do you account for geographic variability? e.g. how well Beto was doing in Texas in the Democratic field, and Kamala in California, etc.
A good question! Some estimates of the highest-quality polls, such as those from Pew Research, put costs in the 6 figures. Others are lower, say $40,000 for a sample of 1,000 adults, and there are other ways to decrease costs too. But the main answer is that they’re expensive, and will only continue to get expensive as response rates decrease!
Your responses will help me procrastinate....
It's clear that the 100,000 million 2016 nonvoters decided that election. I'd like to see polling of that group.
What I can say is that they seem to be much more inclined to vote for Democratic candidates than for Republican ones. The bigger problem for Dems is getting them to actually vote. Turnout is tough! Will have more on this as we get closer to November.
See https://www.washingtonpost.com/magazine/2020/01/21/i-was-60s-socialist-todays-progressives-are-danger-repeating-my-generations-mistakes/?arc404=true.
A long read I’ll have to save for tomorrow. Thanks for sharing!
What I want to see in polls is a question that asks the respondent where they primarily get their news from. My hypothesis is that the best indicator for support / opposition to an issue is the answer to that question. And yet, I rarely ever see that question asked.
Yup! Echo chambers are a thing. Pew has great work on this.
I feel like I've seen aspects of this before somewhere, but I think it would be interesting to have graphs/discussion of the (theoretical) relationship between sample size and margin of error in polls, compared to the the empirical relationship across the past few elections. Especially topical given the discussion over the small sample size of many democratic primary polls.
Good idea. The margin of error cited in a poll is often much smaller than its predictive margin of error (as you note), typically due to unreliability in a poll’s weighting scheme or likely voter filter. I’ll explore some visualization about this. Good idea!
Although polling has been quite good in recent years at predicting vote share, isn’t it still reasonable is it to expect at least one polling miss to occur during the primaries like we saw in Michigan in 2016? If so, which states would you watch out for a potential poll miss?
Many pollsters (in Michigan in particular) do not seem to be adapting to their 2016 polling errors. It seems likely we’ll see another big one, though perhaps not on the scale of MI’s 2016 Democratic primary (the 20-point miss was historically large).
As you approach more and more polls from MRP / model-based approaches, how does that change the way you quantify and, more importantly, think about uncertainty?
I'd like to see some applied theory about the relationship of opinion formation, salience, and language usage to question asking.
Response weighting prevent response bias from "skewing" the top line results, but wouldn't the variance of undersampled groups be higher than for more-sampled groups? Would it be correct to claim that projections of a candidate's standing with frequently undersampled groups (younger, Latino, less educated, etc.) based on polling ought to have wider error bars, so to speak?
Clear up confusion about error of measurement....Discuss other types of errors in polling data
Good idea. Non-sampling error is big, like twice as large as the typical margin of error.
good morning!
I wonder if the "who do you think is going to win" question has been looked at recently for validation
It’s not very helpful. If 50% of people think something will happen, the science tells us that has practically correlation to an event’s actual probability.
Er, "no correlation"
Do head-to-head polls take into account differing turnout rates between, say, dem nominees vs Trump, or do they assume the same group of voters would be voting in each scenario?
Yeah, likely voter screens typically pick this up because we leverage respondents’ stated propensities to vote in the model. Not a huge worry for polls, but a good strategic question!
How do pollsters deal with most people not answering their phones unless they know who it is? Seems like there could be a bias there. Like, what subset of voters always answers their cell?
A good question. We do know that the type of people who take polls are much more political than the ones who don’t. So on questions of voting or political volunteering, etc polls massively overestimate engagement. Pollsters are typically careful of this—few use them to gauge political activity. I’d not this doesn’t create a whole lot of partisan bias, though. Once a pollster gets someone enough randomly-sampled people on the phone they can pretty much weight their way into a good sample after that.
Sorry, my mind is filled with polling questions now lol
How would practices differ if we elected officials with other electoral systems? Going from the electoral college to a national popular vote would mean looking more at the national polls than state polls. But what would pollsters do for an approval-based voting system? Or ranked choice? Or condorcet systems?
In Maine, pollsters have largely adapted by asking voters their second- and third-choice preferences. This produced reliable outcomes in 2018 (when the state first used IRV for federal elections). It is not so complex a method to adapt to polls. We have also experimented with asking voters to rank their preferences and then simulating IRV or AV from there.
any chance we get to see how you implemented the dynamic dirichlet regression model?
Here’s a hint: https://discourse.mc-stan.org/t/dirichlet-regresion-using-brms/8591
How do you handle questions where the poll respondent may not want to answer truthfully or may unconsciously give a false response?
Like: Do you prefer a male/white/straight candidate or not?
Basically, how do you reliably account for biases people don’t know they have?
People “in the biz” call this “social desirability bias”. I’m not sure about it affecting support for male/white/straight candidates, but we do see it pop up for topics such as income or others of a more... sensitive... nature. One way to decrease the severity of such biases is to ask the survey online, where we’ve seen them depressed relative to SDB over the phone. Read this for more! https://www.pewresearch.org/fact-tank/2017/08/04/personal-finance-questions-elicit-slightly-different-answers-in-phone-surveys-than-online/
Education aside, what are the most important demographics to weight by?
Race and age! More advanced weighting schemes might also include political variables, such as party registration and past vote.
are party registration and past vote reliable? seems like party registration fluctuates a good amount, and people might not be great at remembering past votes.
The trick is to get people’s past vote as soon as possible after they vote. Your concern over party registration is very pronounced in times of instability, but since 2010 party ID has basically been the same nationally. When I weight I use benchmarks relative to the 2016 election, allowing for some uncertainty in the benchmark.
oh, interesting! do you have any blog posts that walk through the different weighting schemes people use? I'd love to see more about weighting based on party ID and past votes
This is one of the biggest components of the project I’m working on!
Can't wait to see this project then!
To follow up on weighting by race, age, and education... how do you account for geographic variability? e.g. how well Beto was doing in Texas in the Democratic field, and Kamala in California, etc.
How much does it cost a news outlet to do a reliable high-quality poll and what is the median cost of news outlet polling?
A good question! Some estimates of the highest-quality polls, such as those from Pew Research, put costs in the 6 figures. Others are lower, say $40,000 for a sample of 1,000 adults, and there are other ways to decrease costs too. But the main answer is that they’re expensive, and will only continue to get expensive as response rates decrease!
do you have any gauge as to how much cheaper online polling is compared to phone polling?
They can be half the cost, or sometimes even cheaper.
nice. are the main drawbacks with online polling just in selection bias of who chooses to answer?
Right. Have you read this piece? It outlines the struggles of online polling very well. https://www.nytimes.com/2019/07/02/upshot/online-polls-analyzing-reliability.html
I'm a fan of Nate Cohn, but have not read this yet. Will read!
A good suggestion, which will require a long explanation. Will devote a lot of space in the project for this.