Quantitative Analysis of Harvard’s COVID-19 Response

How does Harvard's response track with the evolution of the global pandemic?

By Henry Austin & Kelsey Wu04-03-2020


Over the past few months, coronavirus has spread across the globe, eventually arriving at the gates of Harvard Yard. In light of recent administrative responses to the pandemic, we decided to track community messages from Harvard administration and health services over time. This included all messages from Harvard’s Updates & Community Messages page and emails sent to all Harvard community members. Through quantitative text analysis in R, we determined the most frequently used words in each correspondence, seeking to understand the university’s concerns, priorities, and actions as the situation evolved.

Here’s an approximate timeline of Harvard’s correspondences:

  • January 24: Harvard University Health Services (HUHS) issued its initial warnings, informing students that HUHS would be “monitoring the global concern for the novel coronavirus coming out of Wuhan, China.” HUHS also advised students to take general health precautions (e.g. washing hands often, coughing into a tissue, and avoiding contact with sick individuals) and discouraged travel to China.
  • Late February: The university began restricting travel to China, South Korea, Italy, Iran, and other countries with CDC Level 3 Warning. In addition, inbound travelers were asked to self-isolate and complete confidential health forms.
  • Early March: As many know, Harvard’s response to the virus escalated dramatically in March. HUHS prohibited any university-related international and non-essential domestic travel and strongly discouraged university affiliates from personal travel. The administration began closing events with more than 100 attendees and asking affiliates to familiarize themselves with Zoom.
  • March 10: Students received notice to leave campus as soon as possible and at least by March 15, and all classes were moved online. Some exceptions were granted.
  • March 13: The university informed its community of the first confirmed, Harvard-affiliated case.
  • Last 2 weeks of March: The administration thanked the Harvard community for its flexibility and adjustments. The administration also released additional guidelines for essential personnel, expressed the significance of anonymity during this crisis, and announced the cancellation of commencement. On March 24, President Larry Bacow announced that he and his wife had tested positive for the virus.

Data and Interpretation

To produce the table below, we used R (tidyverse and tokenizers packages) to produce a list of all the words in each correspondence and their frequency of appearance. We removed common words such as “and,” “the,” and “of” that did not contribute much meaning. The table shows the top 10 most frequently used words in each correspondence, in descending frequency. The last row of the table shows the top 10 most frequently used words among all correspondences.

The initial emails in late January indicate Harvard University Health Services (HUHS) and Harvard administration’s preparations for the pandemic. Namely, Harvard was in the process of producing protocol for COVID-19’s arrival to the U.S. or Harvard campus. During this time, words such as “travel”, “international”, and “China” are frequently used as the administration warns the community of potential dangers and uncertainty surrounding the novel coronavirus.

This frequency seems to suggest the remoteness of COVID-19, establishing a tone of comfort and security on campus. At this time, the virus had yet to appear on American soil.

Given the prevalence of the word “China” in Harvard coronavirus correspondence, we decided to track the frequency of its appearance over time. Initially (late January and early February), when the pandemic was largely centered across the Chinese mainland, the use of the word steadily increased. At the end of February, it dropped to zero. These changes are documented in the graph below.

In fact, the use of the word “China” in Harvard’s messages peaked as U.S. cases began to skyrocket. This trend is intuitive. As the pandemic moved across new borders and affected new communities around the globe, the university placed a new focus on their response — one centered around the domestic spread.

During early March, as the pandemic intensified across the country, Harvard’s correspondence became increasingly focused on the Harvard community.

Rather than using words like “China,” and “international,” the most frequent words shifted to “dorms,” “campus,” and “students.” As the virus spread around the world, especially the United States, university leaders likely realized that the pandemic was not a remote “international” issue. Rather, it was one that could turn campus life upside down.

Correspondence also focused heavily on the word “community” (see “Frequency of ‘community’ across Harvard’s messages” graph below). Although “community” appeared in all correspondences, it is interesting to note that “community” decreased in frequency as “China” increased in frequency. As mentioned earlier, this further indicates Harvard’s increasing emphasis on unifying its community, especially as the first Harvard-affiliated cases were reported.

Messages sent after students left campus (March 15) and after the second Harvard-affiliated case (March 16) used “community” less frequently. At this point, the university focused on establishing protocol for essential personnel and thanking students, faculty, and community members. Words such as “students”, “protect”, “campus”, and “best” began to appear more frequently during this early March time period. There was a strong commitment by administration, or at least the appearance of one, to prioritizing the safety of students.

Another prevalent word was “travel” (see “Frequency of travel across Harvard’s messages” graph below). As the frequency of “travel” increased, words such as “China” and “international” also appeared more frequently. During this time, the university discouraged travel and warned students of risks and precautions, should travel be necessary. However, as the pandemic moved closer to Harvard, the use of “travel” decreased. Harvard began focusing more heavily on issues related directly to campus and the student population in Cambridge. Following March 10, when students were informed they would have to leave campus in five days, the use of “travel” became frequent again as the university communicated information about leaving campus.

In early March, the tangible impacts of the pandemic on Harvard’s campus became evident. Email content began reflecting a stronger sense of urgency, utilizing words such as “possible”, “strongly”, and “protect”. This call to action was realized through the University’s announcement on March 10, when students received notice to leave campus within five days.

After students left campus, the tone of Harvard’s correspondence became increasingly dire, in line with the worsening state of the U.S. pandemic.

The administration used words such as “emergency,” “essential,” and “guidance” frequently to convey the gravity of the crisis. This tone differs drastically from the initial warnings in late January.

Overall, the urgency of the correspondence, as one might expect, aligns closely with the domestic upsurge of COVID-19. Initially seen as a solely international travel hazard, the virus rapidly progressed into a campus emergency, resulting in unprecedented action by campus leadership.

Further investigation of this data could assess whether the University’s response was adequate or called for. Should the university have acted sooner with greater urgency (given the increasing number of cases in the U.S.), was the response unnecessary, or was it exactly on time? As more information is published, data-driven answers to these questions may provide useful proposals for future crises.

Data Analysis Procedures

First, we collected all correspondences from Harvard and HUHS about COVID-19. This included all messages from Harvard’s Updates & Community Messages page and emails sent to all Harvard community members. We then used R (tidyverse and tokenizers packages) to produce a list of all the words in each correspondence and their frequency of appearance.

To remove common articles and prepositions (e.g. the, and, of, I), we utilized the Google Web Trillion Word Corpus, a dataset produced by Google’s web crawlers containing English word n-grams and their observed frequency counts. Essentially, the Corpus provides a list of the most commonly used words and the word frequency (measured by the percentage of the Trillion Word Corpus consisting of the given word). In our analysis of Harvard’s correspondences, we eliminated all words with a Corpus frequency > 0.1% — basically, we filtered exclusively for words that occurred less than once every 1000 Corpus words. This specification removed words that wouldn’t give us meaningful insight, which was key in identifying how the university framed the outbreak and its response. We completed this analysis for each correspondence between January 24 and March 27, 2020.

Learn More

Data on the COVID-19 pandemic can be found on the CDC website and in this map created by Johns Hopkins University. Sources used for this project can be found here and here. The most up-to-date information on Harvard’s response can be found on the university’s COVID-19 response website.

Harvard Open Data Project
© 2016-2020, Built with Sanity & Gatsby

Harvard Wiki

The code for this website is open source.
Subscribe to our monthly newsletter

Interested in open data? Join the team.