Our UC prediction got dunked on. Here’s why.

Our previous post did not age well.

By Kevin Bi, Lucy Li & Seth Billiau

Over the past four years, HODP has taken inspiration from Nate Silver’s FiveThirtyEight in our approach to Harvard data. This year, we decided to predict presidential elections like them too. HODP predicted that the winners of the Undergraduate Council presidential election would be Sanika and Rushi, but the results were even more unexpected than we could have thought.

We noted that this was the most uncertain election that we had predicted to date due to the unusual nature of the campaign. However, our main concern was that Aditya and Andrew, who ran a campaign to abolish the UC, would mobilize voters who would not typically vote in the election and who would not fill out our survey. We believed that if a ticket was going to upset Sanika and Rushi, it would be Andrew and Aditya. We were wrong about that too.

The results of the election were as follows:

1. James Mathew and Ifeoma “Ify” White-Thorpe

2. Aditya Dhar and Andrew Liang

3. Sanika Mahajan and Rushi Patel

4. Prashanth “PK” Kumar and Michael “Mike” Raji

5. Thor Larson and Case McKinley

On the bright side, we were correct about Aditya and Andrew placing second. Unfortunately, that was pretty much the only thing we were right about. We managed to get the fourth and fifth place finishers flipped, and we obviously got the winner and third place incorrect. Clearly, a lot went wrong, but we believe there are three main reasons why our predictions missed the mark.

What went wrong

Our sample was unrepresentative — garbage in, garbage out.

The first, and maybe simplest, explanation for why we got it wrong is that our poll drastically undersampled the number of James and Ify supporters. In our sample, James and Ify received approximately 19% of all first place votes, while in the actual election they comprised approximately 28% of first place votes. This means that the actual distribution of James and Ify supporters was about 40% higher in the population than in our sample.

However, this was far from the only issue with our sample. Even if James and Ify had 40% higher support than our sample predicted, Sanika and Rushi supporters would still have comprised 37% of our sample. In the actual election, Sanika and Rushi received only 26% of first place votes. This suggests that our sample highly overrepresented Sanika and Rushi supporters. We anticipated that James and Ify would need to more than double their levels of support in order to win, but this was holding the support of other candidates constant. The combination of overrepresenting Sanika & Rushi supporters and underrepresenting James & Ify supporters led to a result that we entirely did not anticipate.

While some missampling is expected in any polling, this degree of missampling suggests deeper issues with our poll. One possible reason is that we misunderstood James and Ify supporters. At face value, James and Ify ran a more traditional UC campaign, but it is possible that they also mobilized voters who would be less likely to fill out our poll. The second possible reason is that Sanika and Rushi supporters were disproportionately more energetic, and therefore more willing to fill out the poll, but less common in the overall voting population.

Strategic voting.

The second possible reason our poll was inaccurate is that while voters have no incentive to vote strategically in our poll, they do have an incentive to vote strategically in the actual election. This is due to the nature of the Borda count, the voting mechanism used in the UC presidential election. In a Borda count, a candidate ranked nth by a voter receives 1/n votes from that voter. This mechanism is particularly susceptible to strategic voting.

Suppose that a voter believes that some candidate is in first place. In this case, the general perception based on social media activity and UC endorsements — many of the same indicators we used to make our prediction — likely held that Sanika and Rushi were the frontrunners. If supporters of other candidates strategically voted, they would have an incentive to rank Sanika and Rushi last, even if this did not accurately represent their preferences.

Data provided by the UC Election Commission supported this hypothesis. Sanika and Rushi also received the most 5th place votes of any ticket, and nearly as many 5th place votes as 1st place votes. Aditya and Andrew suffered similarly. They received the second most 5th place votes, and the fewest 2nd place votes. This suggests that supporters of other campaigns either strategically voted, or heavily disliked the two perceived leading candidates.

Viral video.

Last but certainly not least, halfway through the election period, James and Ify’s campaign video went viral on Twitter, receiving several celebrity endorsements and more than two million likes. We didn’t place much weight on this video when making our prediction, and this may have been a mistake. It’s possible that the video brought greater attention to the UC election and turned out students who otherwise would not have voted. Voter turnout was substantially higher than we anticipated, increasing by nearly 900 votes over the year before. If the video did turn out a large number of voters, then it’s unsurprising that our poll results were wrong, since these students would not have been following the election as passionately or had the time to have responded to our survey. After all, our survey also closed very shortly after the video went viral, so we probably could not capture the video’s full effect.

Like so many NBA players, we were dunked on by DWade.

Our communication of results.

Perhaps our greatest mistake was the manner in which we calculated our results. We wrote that Sanika and Rushi would have a 98% chance of winning if our sample was representative. In this scenario, sampling error would be the only degree of variability. While we expressed concerns that our sample was not representative, we believe that we did not communicate those concerns effectively. In hindsight, this method of presenting the probability likely led to the impression that Sanika and Rushi were nearly guaranteed to win, when this was not the case.

Some interesting data about this election.

If there is one thing we were right about, it was the unusual nature of this election. The turnout, at 3762, represented 56% of the Harvard College population. This is the first time since HODP’s founding that a majority of undergraduates have voted in the UC presidential election.

This election is also by far the closest in recent memory. This year, James and Ify led Aditya and Andrew by just 72 points, and Sanika and Rushi by 92 points. Only 98 first place votes (2.6% of all votes) separated Aditya and Andrew, who received the most first place votes at 1063, and Sanika and Rushi, who were in third place. By contrast, in 2018, Sruthi and Julia led their nearest competitors by over 600 points; in 2017, Cat and Nick won 55% of first choice votes; and in 2016, Yasmin and Cameron received close to 300 more first place votes than their nearest competition.


But all of this belabors the point. We were wrong, and we will be looking for ways to improve our predictions for next year. In the meantime, congratulations to UC President-Elect James Mathew and Vice President-Elect Ifeoma White-Thorpe! We hope they will work to increase data transparency and data-driven decision making at the UC.

Harvard Open Data Project
© 2016-2024, Built with Sanity & Gatsby

Harvard Wiki

The code for this website is open source.
Subscribe to our monthly newsletter

Interested in open data? Join the team.