Hey Elliott, thank you for sharing - the paper was an interesting read. Since they had the data available, they were able to estimate the core winner based on individual responses*, but I'm wondering what your thoughts are on a more general application to public polls where we maybe just have the toplines.
For example, some polls only ask about the top-two candidates while others ask about a whole bunch of candidates, and that can affect how the top two are scored. On one end of the analysis spectrum, you can consider the poll groups separately, like Nathaniel Rakich did in the linked article. On the other end of the spectrum, you can analyze them as one group but only consider the top candidates, like in Drew Linzer's presidential forecast paper. Both of those feel like it's leaving some information on the table. I've plunked around with mixing the different response scales in a model by estimating the proportion of 'other' voters who would respond with candidate A/B if their first choice isn't on the ballot, but it also feels.... odd (the math checks out but feels like I'm violating some statistical rule).
*edit: this is in reference to the ranked sample, but they found it generally matched up w/the pairwise sample too.
Hey Elliott, thank you for sharing - the paper was an interesting read. Since they had the data available, they were able to estimate the core winner based on individual responses*, but I'm wondering what your thoughts are on a more general application to public polls where we maybe just have the toplines.
For example, some polls only ask about the top-two candidates while others ask about a whole bunch of candidates, and that can affect how the top two are scored. On one end of the analysis spectrum, you can consider the poll groups separately, like Nathaniel Rakich did in the linked article. On the other end of the spectrum, you can analyze them as one group but only consider the top candidates, like in Drew Linzer's presidential forecast paper. Both of those feel like it's leaving some information on the table. I've plunked around with mixing the different response scales in a model by estimating the proportion of 'other' voters who would respond with candidate A/B if their first choice isn't on the ballot, but it also feels.... odd (the math checks out but feels like I'm violating some statistical rule).
*edit: this is in reference to the ranked sample, but they found it generally matched up w/the pairwise sample too.
Rakich's FTE article: https://fivethirtyeight.com/features/desantis-is-polling-well-against-trump-as-long-as-no-one-else-runs/
Linzer's paper: https://votamatic.org/wp-content/uploads/2013/07/Linzer-JASA13.pdf
My (brief) writeup: https://www.thedatadiary.net/posts/2023-01-21-trump-vs-desantis-in-2024-republican-primary-polling/
The model underneath: https://github.com/markjrieke/thedatadiary.net/blob/master/posts/2023-01-21-trump-vs-desantis-in-2024-republican-primary-polling/scripts/primary_poll_model.stan
Hi Mark. Very cool. I wonder what happens if you model support as a function of number of options?