Harvard Commencement Speeches: Historical Context and Themes

Is the final call to action for graduates reflective of the world at large?

By Justin Ye, Cindy Liu, Jacinta Olonilua & Katherine McPhie

Every year, Harvard hosts a notable speaker to deliver a commencement speech as part of its graduation ceremony. In the last twenty years, icons from Bill Gates to Oprah Winfrey have spoken to the graduating students and have covered a variety of topics to jumpstart the class’s post-academic, “real world” lives. Our group sought to understand what, if any, correlation existed between these speakers’ words and the situation in the world at large:

How does the final call to action in a commencement speech change as the world changes?

An understanding of these ideas reflects not only which global topics are the most salient and pressing, but also our responsibilities as students to act on them.


Data was collected from commencement speech transcripts from 2000 to 2020, with the exclusion of 2001, 2003, and 2006, for which no accurate transcripts could be found. These texts were then quantified and analyzed using Text Analytics from Microsoft Azure and a second text-processing API. The data on speeches was compared to web-based search data, namely from Google Trends and Wikipedia’s Timeline of the 21st Century.

Sentiment Analysis

Each speech was first analyzed using Microsoft Azure’s sentiment analysis, which scored the percentages of positive, neutral, and negative content in the speech. As shown below, Harvard commencement speeches have been largely negative in tone, with only two speeches reaching a positivity score of 50%.

Microsoft Azure’s sentiment analysis API reveals the percentage of each year’s commencement speech that is positive, neutral, or negative in tone. Specifically for Harvard commencement speeches, they are generally more negative than positive.

A subjective examination of Microsoft Azure’s sentiment analysis appears to show that the software flags controversial topics or words related to global problems as negative, even when presented in a hopeful manner. For instance, a passage describing a student who “has every reason to be cynical” who chose instead to follow a sense of purpose and “bring people along with him” would be seen as hopeful or positive by most viewers, but words describing the struggles he overcame resulted in a negativity score of 83%. Applying this to the speeches at large, the consistently high negativity scores may indicate that Harvard students are made aware of global problems not necessarily to apply an omnipresent pressure of pessimism, but as a chance to eliminate them and bring about a better future.

To improve our analysis, we used a second text-processing API whose sentiment analysis produced a more nuanced interpretation of the overall sentiment of each speech. Using this API, we found a similar sentiment pattern between the speeches, but with consistently lower negativity scores—the text-processing API was able to identify hopeful sentences as predominantly positive or neutral, even if the sentence included negative words.

A text-processing API with more nuanced sentiment analysis shows that Harvard commencement speeches are generally more positive and neutral in tone.

Next, we compared the sentiment of each speech to the general global environment at the time. To do this, Wikipedia’s Timeline of the 21st Century was used as a proxy for the past 20 years as it is relatively consistent in presentation and format, while Google search data was used for more recent years as well. The individual sentiment scores for every year of Wikipedia’s entries are shown below. Also, a comparison of the speech versus Wikipedia summary yields the following:

Using Microsoft Azure’s API to compare the sentiment between each year’s Wikipedia summary entry and commencement speech indicates a slight, albeit weak, negative correlation.

Next, we performed a similar comparison between the sentiment of each year’s commencement speech and year summary using the word-processing API. Again, we found a slight negative correlation between the negativity of the speech and of the year.

Analysis by a word-processing API reinforces the trend found using the Azure API. Both methods show a slight negative correlation between the negativity of the speech and of the year.

Speculatively, the slight negative correlation may allude to a perceived responsibility on the part of the Harvard community to maintain a sense of optimism during difficult times; however, due to the small sample size, weak correlation, and lack of detail in Wikipedia’s summary, further research is necessary with a larger dataset of commencement speeches.

Key Phrase Analysis

Each speech was then analyzed for key phrases using Text Analytics’ key phrase extraction API. The API used natural language processing to identify and evaluate approximately 200 key words and phrases in each transcript.

To better understand this data, we began by looking for the key phrases that were repeated most often among the speeches. We wanted to learn whether certain words and ideas were universal to Harvard commencement speeches, regardless of the specific speaker. The word cloud below is a visual representation of the frequency of words and phrases that appeared in Harvard commencement speeches between 2000-2020. The word cloud provides a high-level snapshot of some of the important ideas, such as an outward focus on other people, the world, and the future, that Harvard wishes to instill in their students.

The most common key words that appear in Harvard commencement speeches appear to be ones pertaining to the lives of people around the world.

Next, we found the percentage of keywords in each of the speeches that were related to what was going on in the world. Again, we used Wikipedia’s Timeline of the 21st Century to guide our comparison. However, we recognize the limitations of using Wikipedia as reference for real-world significance because Wikipedia’s summary only covers a narrow scope of the world’s events. As such, the graph below is more helpful for comparing the speeches relative to each other, rather than focusing on the individual percentages.

A small portion of the key phrases in Harvard commencement speeches are found within each year’s Wikipedia summary.

Google Trends Analysis

To incorporate data from Google to give a more comprehensive view of pertinent real-world topics in recent years, we used an unofficial Google Trends API, where one feature is the ability to track the interest over time of specific words based on year and location. The phrase “interest over time” is represented through numbers from 0-100 that “represent search interest relative to the highest point on the chart”. For example, the term “American election” would perhaps reach a value of 100 during the month November, because that is the peak popularity for said term, whereas in another month, it may have a value of 50 because at that point in time the term is only half as popular. Finally, a score of 0 means that there wasn’t enough data present for the term.

We took the common key phrases from above and checked to see their interest over time for the year they were mentioned in the speech. We then took the average of these values and plotted them below:

A comparison between the key phrases that appear in Harvard speeches and their popularity among Google searches indicate that Harvard speeches are generally in the purview of relevant world topics.

On average, the interest over time of phrases mentioned in the commencement speeches had a minimum value of about 56 and a maximum value of about 69. In other words, the topics said in these speeches were in some ways relevant to events occurring around the world. This makes sense, seeing as how these speeches are in a way, the formal greeting into the “real world” after college.

It would only be fair to recap on important events that happened throughout the year as a guiding place for graduates to feel prepared at entering an entirely new environment.

There are some limits of using an unofficial API, as Google themselves did not create it. Furthermore, we had to remove years prior to 2006, as the scope of data available in Google Trends was limited and did not accurately reflect major events of these years.

Unanswered Questions

There are some questions left unanswered by our work due to the limitations of the available datasets. For example, the slight negative correlation between the negativity of the speech and of the year leads us to question whether such a slight trend is actually relevant, and to answer this, it would be necessary to analyze more commencement speech data. The second question would be the reason for such a negative trend, if it exists. Although we might speculate that a slight negative trend might be representative of an urge to remain optimistic even during difficult times, ultimately this is just a prediction. The slight negative trend might also be indicative of something else entirely, and ultimately it would be necessary to conduct more research before drawing any definitive conclusions.


Through our research, our group sought to understand what, if any, correlation existed between Harvard commencement speakers’ words and the situation in the world at large. As the final call to action for Harvard students as they leave the college and go into the larger world, commencement addresses give a unique image of what Harvard believes a graduate’s responsibilities are.

From our analysis of how common words and phrases in a given commencement speech mirrored relevant events occurring in the world, we found that commencement speeches did reference world events in a significant way, consistent with Harvard’s larger mission “to educate the citizens and citizen-leaders for our society.”

You can check out the analysis code and raw data for this project here.

Harvard Open Data Project
© 2016-2023, Built with Sanity & Gatsby

Harvard Wiki

The code for this website is open source.
Subscribe to our monthly newsletter

Interested in open data? Join the team.