Why Didn't Polls Predict the 2020 Election Results?
Known problems from 2016 and one new one may help explain the discrepancies.
Posted Nov 09, 2020
"To state the obvious, this is not a random sampling error because it was shared across all pollsters in the same direction. This is some kind of large systematic error, far larger than typically occurs in a presidential election year." —Sam Wang after the 2016 election.
In early November 2020, most influential polls predicted a decisive victory for Vice President Joe Biden over President Trump. They also made other specific predictions like a comfortable Biden victory in Florida, an increased number of seats for Democrats in the House, and so on.
These predictions did not pan out. The Presidential election was much closer and took days to be called, with recounts and legal challenges still ongoing a week later. The shift in voter preference towards Democrats was also much weaker than predicted. And when all the votes are tallied, the Democratic party may lose House seats instead of gaining them.
This is the second consecutive Presidential election where polls haven't been able to forecast significant outcomes correctly. Each time, most polls underestimated voter support for President Trump and the Republican party.
After the 2016 election, social scientists provided many reasons for this failure, which may help explain what happened in 2020. In this blog post, I want to explore four reasons why polls haven't done a good job with crucial predictions in the two elections. Some of the underperformance may be specific to President Trump's unique attributes. However, there are enough red flags to worry political pollsters and market researchers about the reliability and sustainability of survey-based research.
Why did the 2016 polls underestimate support for President Trump?
A political poll is a carefully designed and executed survey of a representative sample of respondents. Individuals who are eligible and likely to vote are asked about their voting preferences (among other things). Their responses are used to predict the election's outcome. Despite this seemingly straightforward methodology, there are significant difficulties involved in conducting a poll. Let's consider three of the most significant challenges discovered after the 2016 election. (To a lesser or greater extent, these factors are also likely to have played a role in the 2020 election.)
1) Voter turnout is difficult to predict.
One of the main culprits for the poor polling performance in 2016 was the failure to predict which eligible voters would cast their votes. Pollsters often use voter files to select their samples. The logic is to include people who have voted before and those who are eligible to vote. However, as psychologists know all too well, past behavior is no guarantee of future behavior. If one group of voters systematically votes at higher or lower rates than the baseline, this can substantially throw off predictions. As one example, in 2016, far more working-class, rural white men turned out and voted for President Trump than was expected based on their previous track records.
On the flip side, many respondents who say they intend to vote in a poll may not muster the enthusiasm to go out and vote on election day. Voting requires effortful behaviors such as taking time out of one's schedule, finding a polling station, traveling to it, waiting in line, and so on. In the 2016 election, Black Americans, a group that was more supportive of Secretary Clinton as a whole, showed a marked decrease in turnout on election day. This lack of follow-through can be a function of many contextual factors like bad weather, just not finding time to vote, or other personal idiosyncratic factors, along with the likeability and affinity to the candidates.
For the 2020 election, the Covid-19 pandemic may also have played a role. For example, Democratic voters are more conscious about virus spread and may have been more reluctant to go out and vote despite saying they would do so to pollsters. Contextual factors can influence voting behavior systematically but are virtually impossible for pollsters to account for in advance when making predictions. Their effects can only be discovered after-the-fact, and often, not even then.
2) Voter preference can change between the poll and the election.
There is a time lag between voters telling the pollster which candidates they will vote for and their voting behavior on election day. During this period, their voting preferences may change as new information about candidates becomes available, or their past voting preferences generate habitual behavior, which they had downplayed when answering the poll. This latter issue can be an influential driver of choice for those voters who vote along party lines. Additionally, many voters make up or change their minds at the last minute, which pollsters can't capture well. In 2016, these effects were significant and asymmetric for the two presidential candidates. One study found that:
"Trump picked up 4.0 percentage points among people who hadn't been with him in mid-October, and shed just 1.7 percentage points for a net gain of 2.3 points. Clinton picked up a smaller fraction — 2.3 points — and shed 4.0 points for a net loss of 1.7 points."
3) Voters provide socially desirable responses in polls.
In surveys, when questions are controversial or reflect on a person's character or reputation, respondents tend to be reticent about giving honest responses. Instead, they provide answers that they believe are socially acceptable and show them in a positive light to the interviewer and others. For the 2016 election, the controversial aspects of President Trump's candidacy, such as his provocative tweets, off-color comments about women, and his stand on issues like immigration, race relations, healthcare, and the environment, meant that many voters were reluctant to voice their support openly for him.
Public opinion researchers have called this response (voting for Trump despite not expressing an intention to do so) as the "Shy Trump Supporter Hypothesis." They have suggested that this led to an underestimation of support for President Trump in the 2016 polls. Although widely discussed in the media, it is worth noting that studies haven't found reliable evidence for the Shy Trump Supporter Hypothesis.
While it is not yet clear whether there were a lot of shy Trump voters in 2020, we can hypothesize that given Trump's controversial image combined with the growth of cancel culture and vitriolic social media debate since 2016, reticent responding is likely to have played at least some role in 2020 pre-election polling.
A fourth ominous challenge? Intentional misleading by respondents.
The three problems with polling have been well documented. There are dozens of papers in the social sciences (including psychology) studying these issues and considering how to resolve them. However, a fourth concern hasn't received as much attention and could be the most serious one of all: the loss of trust between researchers and respondents.
Trust is essential for conducting successful survey research. As a researcher, each time I conduct a survey, a powerful two-way bond of trust must be established with respondents.
Implicit in the request to participate in a survey is my trust in the participants' responses and their motivations. I trust they will tell me what they really think and give me their real opinions, preferences, and intentions, as they know and believe them. In turn, respondents trust the researcher's motives and abilities. My respondents trust that I will use their responses for the research's stated purpose as given, without distorting them. The validity and reliability of survey data and results hinges on two-way trust. But what if this essential bond of trust between the pollster and the respondent is breaking?
Many respondents may decline to participate in polls and surveys, making it harder to compose representative samples. (We have already seen this as response rates continue to decline every year.) And even when they participate, respondents may not take the questions seriously, providing superficial or incomplete responses. And worse of all, lacking trust, some participants may provide responses that mislead and lead the researcher to draw inaccurate conclusions. Not only do we not know how many respondents willfully provided misleading answers to pollsters before the election, but it is not clear how we can find out.
In today's environment of extreme partisanship, political stance-taking by previously neutral organizations and institutions, vitriolic discourse, and lack of tolerance for opposing views, the bond of trust that has sustained survey-based research for decades is fraying rapidly. We should be concerned about the future of political polling in particular, and survey-based research as a whole. The eroding trust of respondents represents an existential threat to the political polling and marketing research industries.