Every year, Harvard hosts a notable speaker to deliver a commencement speech as part of its graduation ceremony. In the last twenty years, icons from Bill Gates to Oprah Winfrey have spoken to the graduating students and have covered a variety of topics to jumpstart the class’s post-academic, “real world” lives. Our group sought to understand what, if any, correlation existed between these speakers’ words and the situation in the world at large:
An understanding of these ideas reflects not only which global topics are the most salient and pressing, but also our responsibilities as students to act on them.
Data was collected from commencement speech transcripts from 2000 to 2020, with the exclusion of 2001, 2003, and 2006, for which no accurate transcripts could be found. These texts were then quantified and analyzed using Text Analytics from Microsoft Azure and a second text-processing API. The data on speeches was compared to web-based search data, namely from Google Trends and Wikipedia’s Timeline of the 21st Century.
Each speech was first analyzed using Microsoft Azure’s sentiment analysis, which scored the percentages of positive, neutral, and negative content in the speech. As shown below, Harvard commencement speeches have been largely negative in tone, with only two speeches reaching a positivity score of 50%.
A subjective examination of Microsoft Azure’s sentiment analysis appears to show that the software flags controversial topics or words related to global problems as negative, even when presented in a hopeful manner. For instance, a passage describing a student who “has every reason to be cynical” who chose instead to follow a sense of purpose and “bring people along with him” would be seen as hopeful or positive by most viewers, but words describing the struggles he overcame resulted in a negativity score of 83%. Applying this to the speeches at large, the consistently high negativity scores may indicate that Harvard students are made aware of global problems not necessarily to apply an omnipresent pressure of pessimism, but as a chance to eliminate them and bring about a better future.
To improve our analysis, we used a second text-processing API whose sentiment analysis produced a more nuanced interpretation of the overall sentiment of each speech. Using this API, we found a similar sentiment pattern between the speeches, but with consistently lower negativity scores—the text-processing API was able to identify hopeful sentences as predominantly positive or neutral, even if the sentence included negative words.
Next, we compared the sentiment of each speech to the general global environment at the time. To do this, Wikipedia’s Timeline of the 21st Century was used as a proxy for the past 20 years as it is relatively consistent in presentation and format, while Google search data was used for more recent years as well. The individual sentiment scores for every year of Wikipedia’s entries are shown below. Also, a comparison of the speech versus Wikipedia summary yields the following:
Next, we performed a similar comparison between the sentiment of each year’s commencement speech and year summary using the word-processing API. Again, we found a slight negative correlation between the negativity of the speech and of the year.
Speculatively, the slight negative correlation may allude to a perceived responsibility on the part of the Harvard community to maintain a sense of optimism during difficult times; however, due to the small sample size, weak correlation, and lack of detail in Wikipedia’s summary, further research is necessary with a larger dataset of commencement speeches.
Each speech was then analyzed for key phrases using Text Analytics’ key phrase extraction API. The API used natural language processing to identify and evaluate approximately 200 key words and phrases in each transcript.
To better understand this data, we began by looking for the key phrases that were repeated most often among the speeches. We wanted to learn whether certain words and ideas were universal to Harvard commencement speeches, regardless of the specific speaker. The word cloud below is a visual representation of the frequency of words and phrases that appeared in Harvard commencement speeches between 2000-2020. The word cloud provides a high-level snapshot of some of the important ideas, such as an outward focus on other people, the world, and the future, that Harvard wishes to instill in their students.
Next, we found the percentage of keywords in each of the speeches that were related to what was going on in the world. Again, we used Wikipedia’s Timeline of the 21st Century to guide our comparison. However, we recognize the limitations of using Wikipedia as reference for real-world significance because Wikipedia’s summary only covers a narrow scope of the world’s events. As such, the graph below is more helpful for comparing the speeches relative to each other, rather than focusing on the individual percentages.
To incorporate data from Google to give a more comprehensive view of pertinent real-world topics in recent years, we used an unofficial Google Trends API, where one feature is the ability to track the interest over time of specific words based on year and location. The phrase “interest over time” is represented through numbers from 0-100 that “represent search interest relative to the highest point on the chart”. For example, the term “American election” would perhaps reach a value of 100 during the month November, because that is the peak popularity for said term, whereas in another month, it may have a value of 50 because at that point in time the term is only half as popular. Finally, a score of 0 means that there wasn’t enough data present for the term.
We took the common key phrases from above and checked to see their interest over time for the year they were mentioned in the speech. We then took the average of these values and plotted them below:
On average, the interest over time of phrases mentioned in the commencement speeches had a minimum value of about 56 and a maximum value of about 69. In other words, the topics said in these speeches were in some ways relevant to events occurring around the world. This makes sense, seeing as how these speeches are in a way, the formal greeting into the “real world” after college.
There are some limits of using an unofficial API, as Google themselves did not create it. Furthermore, we had to remove years prior to 2006, as the scope of data available in Google Trends was limited and did not accurately reflect major events of these years.
There are some questions left unanswered by our work due to the limitations of the available datasets. For example, the slight negative correlation between the negativity of the speech and of the year leads us to question whether such a slight trend is actually relevant, and to answer this, it would be necessary to analyze more commencement speech data. The second question would be the reason for such a negative trend, if it exists. Although we might speculate that a slight negative trend might be representative of an urge to remain optimistic even during difficult times, ultimately this is just a prediction. The slight negative trend might also be indicative of something else entirely, and ultimately it would be necessary to conduct more research before drawing any definitive conclusions.
Through our research, our group sought to understand what, if any, correlation existed between Harvard commencement speakers’ words and the situation in the world at large. As the final call to action for Harvard students as they leave the college and go into the larger world, commencement addresses give a unique image of what Harvard believes a graduate’s responsibilities are.
From our analysis of how common words and phrases in a given commencement speech mirrored relevant events occurring in the world, we found that commencement speeches did reference world events in a significant way, consistent with Harvard’s larger mission “to educate the citizens and citizen-leaders for our society.”
You can check out the analysis code and raw data for this project here.