Over the past few months, coronavirus has spread across the globe, eventually arriving at the gates of Harvard Yard. In light of recent administrative responses to the pandemic, we decided to track community messages from Harvard administration and health services over time. This included all messages from Harvard’s Updates & Community Messages page and emails sent to all Harvard community members. Through quantitative text analysis in R, we determined the most frequently used words in each correspondence, seeking to understand the university’s concerns, priorities, and actions as the situation evolved.
Here’s an approximate timeline of Harvard’s correspondences:
To produce the table below, we used R (tidyverse and tokenizers packages) to produce a list of all the words in each correspondence and their frequency of appearance. We removed common words such as “and,” “the,” and “of” that did not contribute much meaning. The table shows the top 10 most frequently used words in each correspondence, in descending frequency. The last row of the table shows the top 10 most frequently used words among all correspondences.
The initial emails in late January indicate Harvard University Health Services (HUHS) and Harvard administration’s preparations for the pandemic. Namely, Harvard was in the process of producing protocol for COVID-19’s arrival to the U.S. or Harvard campus. During this time, words such as “travel”, “international”, and “China” are frequently used as the administration warns the community of potential dangers and uncertainty surrounding the novel coronavirus.
Given the prevalence of the word “China” in Harvard coronavirus correspondence, we decided to track the frequency of its appearance over time. Initially (late January and early February), when the pandemic was largely centered across the Chinese mainland, the use of the word steadily increased. At the end of February, it dropped to zero. These changes are documented in the graph below.
In fact, the use of the word “China” in Harvard’s messages peaked as U.S. cases began to skyrocket. This trend is intuitive. As the pandemic moved across new borders and affected new communities around the globe, the university placed a new focus on their response — one centered around the domestic spread.
Rather than using words like “China,” and “international,” the most frequent words shifted to “dorms,” “campus,” and “students.” As the virus spread around the world, especially the United States, university leaders likely realized that the pandemic was not a remote “international” issue. Rather, it was one that could turn campus life upside down.
Correspondence also focused heavily on the word “community” (see “Frequency of ‘community’ across Harvard’s messages” graph below). Although “community” appeared in all correspondences, it is interesting to note that “community” decreased in frequency as “China” increased in frequency. As mentioned earlier, this further indicates Harvard’s increasing emphasis on unifying its community, especially as the first Harvard-affiliated cases were reported.
Messages sent after students left campus (March 15) and after the second Harvard-affiliated case (March 16) used “community” less frequently. At this point, the university focused on establishing protocol for essential personnel and thanking students, faculty, and community members. Words such as “students”, “protect”, “campus”, and “best” began to appear more frequently during this early March time period. There was a strong commitment by administration, or at least the appearance of one, to prioritizing the safety of students.
Another prevalent word was “travel” (see “Frequency of travel across Harvard’s messages” graph below). As the frequency of “travel” increased, words such as “China” and “international” also appeared more frequently. During this time, the university discouraged travel and warned students of risks and precautions, should travel be necessary. However, as the pandemic moved closer to Harvard, the use of “travel” decreased. Harvard began focusing more heavily on issues related directly to campus and the student population in Cambridge. Following March 10, when students were informed they would have to leave campus in five days, the use of “travel” became frequent again as the university communicated information about leaving campus.
In early March, the tangible impacts of the pandemic on Harvard’s campus became evident. Email content began reflecting a stronger sense of urgency, utilizing words such as “possible”, “strongly”, and “protect”. This call to action was realized through the University’s announcement on March 10, when students received notice to leave campus within five days.
After students left campus, the tone of Harvard’s correspondence became increasingly dire, in line with the worsening state of the U.S. pandemic.
Overall, the urgency of the correspondence, as one might expect, aligns closely with the domestic upsurge of COVID-19. Initially seen as a solely international travel hazard, the virus rapidly progressed into a campus emergency, resulting in unprecedented action by campus leadership.
Further investigation of this data could assess whether the University’s response was adequate or called for. Should the university have acted sooner with greater urgency (given the increasing number of cases in the U.S.), was the response unnecessary, or was it exactly on time? As more information is published, data-driven answers to these questions may provide useful proposals for future crises.
First, we collected all correspondences from Harvard and HUHS about COVID-19. This included all messages from Harvard’s Updates & Community Messages page and emails sent to all Harvard community members. We then used R (tidyverse and tokenizers packages) to produce a list of all the words in each correspondence and their frequency of appearance.
To remove common articles and prepositions (e.g. the, and, of, I), we utilized the Google Web Trillion Word Corpus, a dataset produced by Google’s web crawlers containing English word n-grams and their observed frequency counts. Essentially, the Corpus provides a list of the most commonly used words and the word frequency (measured by the percentage of the Trillion Word Corpus consisting of the given word). In our analysis of Harvard’s correspondences, we eliminated all words with a Corpus frequency > 0.1% — basically, we filtered exclusively for words that occurred less than once every 1000 Corpus words. This specification removed words that wouldn’t give us meaningful insight, which was key in identifying how the university framed the outbreak and its response. We completed this analysis for each correspondence between January 24 and March 27, 2020.
Data on the COVID-19 pandemic can be found on the CDC website and in this map created by Johns Hopkins University. Sources used for this project can be found here and here. The most up-to-date information on Harvard’s response can be found on the university’s COVID-19 response website.