Participate

Who Gets Mentioned in The Crimson?

The demographics of the most discussed Harvard students, as revealed by Crimson mentions.


By Carmen Chan, Teddy Lin, Felicia Roman & Christina Xiao
03-13-2021

The best part of reading The Crimson is the sudden awareness of “Oh hey, I know that person!” when a friend gets name-dropped in an article. (Some of us experience that rush more than others.) Who are these Crimson-famous students, and what do we know about them? Here at HODP, we’ve analyzed all current Harvard students mentioned by name in The Crimson within the past four years to find out what the ultimate Crimson-famous Harvard student looks like in terms of their Harvard introduction: year, House, concentration, sports, and Crimson affiliation.


Data and Methodology

We started with a dataset containing the names of every current undergraduate mentioned in The Crimson between 2016 and 2020, extracted using named entity recognition. We then scraped the Harvard College Facebook to find the names, years, and House affiliations of all current students. We also scraped the Harvard athletic teams’ rosters to associate student athletes with their respective teams.

Using Python libraries like Google search, matplotlib, and pandas, we were able to find some interesting patterns regarding which groups of students were mentioned most often in The Crimson.


Analysis

What class year gets mentioned most?


First, we took a look at the simplest part of the Harvard intro: class year. Our data was taken over a four-year timespan, which would be a full cycle from freshman to senior year for the Class of 2021, assuming a four-year college experience. Since each cohort of current students would then only be available to be mentioned in The Crimson for the number of years associated with their class year, we expected current seniors to appear the most often in the data:



As we can see, the Class of 2021 was indeed the most discussed class year, with a whopping total 3,234 mentions. The Class of 2022 followed with 2,212 mentions, the Class of 2023 with 904 mentions, and the newest Class of 2024 with only 203 mentions.

To control for time enrolled, we then hypothesized that each group was mentioned approximately the same number of times per year. So we divided each class year’s total number of Crimson mentions by the number of years they’d been at Harvard to get the following graph:




Here, we can see that while the Class of 2021 is still the most mentioned class year, the differences between the other years have been leveled off slightly. The Class of 2024 still has relatively low mentions, which we think could be attributed to how new and unconnected incoming freshmen are. It’s also possible that students are mentioned increasingly in The Crimson as they go through their 4 years at Harvard because they take on more leadership positions and become more involved in campus matters, making them invaluable to Crimson articles. Whatever the case, it seems a pretty safe bet to say that the Class of 2021, our current seniors, are the most Crimson-mentioned class year.


What House is mentioned most?

Next, we wanted some Crimson-verified evidence for which House held the most mentioned students. We did not include first-year dorms because there are a large number of them and first-year students rotate out of their first-year dorms rather quickly. Taking a look at how many students from each House, as listed in the Harvard Facebook, were mentioned in The Crimson, we got the following results:






By mentions alone, Adams students were the most discussed by far, with 668 total mentions. Meanwhile, Eliot and Pforzheimer students were both pretty low in mentions, with 166 and 159 total mentions, respectively.

But once we controlled for House size, as collected from the College Facebook data, we found a slightly different story:



While Adams definitely still leads the pack, with an impressive 2.98 average mentions per student, a couple other Houses have shifted around in ranks: namely, Currier, Cabot, and Kirkland, as well as Pforzheimer and Eliot.

Interestingly, Adams, Quincy, and Lowell, the top three most represented Houses, are the ones closest to The Crimson building.

Given the strikingly high average Crimson mentions for Adams House, we wondered if proximity to The Crimson building led to more Adams students becoming writers for The Crimson. Crimson writers will sometimes sign their name under an article they write, thus boosting their mentions in The Crimson as recognized by named entity recognition. Unfortunately, named entity recognition doesn’t allow us to distinguish between names found in the main body of the text and the signed addendum. This could explain why Adams House’s average Crimson mentions per student were so much higher than other houses’.

We ran a statistical analysis to determine the significance of Adams House’s comparatively higher average Crimson mentions per student. We used 12 normal approximations of binomial distributions. Based on the average mentions per article by house, we calculated the z-score to be 2.60, with p-value 0.0094. Since p-value is less than 0.5, the result is statistically significant. So Adams House indeed holds the most mentioned students in The Crimson!

The significantly higher occurrences of Adams House residents in The Crimson are perhaps a reflection of the House’s relative desirability. Or maybe Adams House just has the most interesting or well-connected students! Whatever it is, The Crimson definitely writes about them the most frequently.


Which concentrators are mentioned the most?

Next, we wondered what concentrations students commonly mentioned in The Crimson were pursuing. Using data from the College Facebook, we were able to find the following results:



We chose to show only the top 10 concentrations as mentioned in The Crimson for clarity. From these top 10 concentrations, we can see Economics concentrators are the most often mentioned in The Crimson, at 865 overall mentions. It’s also interesting to see how many of these top concentrations as mentioned in The Crimson are in the social sciences and humanities departments, with engineering fields nowhere to be seen.

We were curious to see how the distribution of concentrations mentioned in The Crimson measured up to the college’s overall distribution, so we looked to the College Facebook again to see the top 10 concentrations by student population across all enrolled students at Harvard College.



Compared to the concentrators mentioned in The Crimson, the actual spread of concentrations at Harvard is slightly different. While Economics, Government, Applied Mathematics, Neuroscience, and Psychology are all top 10 concentrations in the college and The Crimson, Computer Science, there are some notable differences between the two populations.

Undeclared students, which vastly outnumber all other concentrators at the college, are not mentioned so drastically often in The Crimson — perhaps because they have yet to specialize in a field that would make them more likely to be reported on. Computer Science, notably the second most declared concentration at the college, isn’t within the top 10 most mentioned concentrations in The Crimson; it’s at number 12.

So it seems like concentrators as represented in The Crimson aren’t exactly representative of the larger Harvard College community of concentrators. But one thing’s for certain: Economics concentrators are the most common declared concentration in the college and The Crimson alike.

Are athletes more likely to be mentioned?

Since The Crimson has a thriving Sports section, we felt that Harvard students who played varsity sports would have a higher chance of being name-dropped in The Crimson. We averaged the mentions of athletes versus those of non-athletes to see which group is more discussed in The Crimson:





On average, every athlete is mentioned approximately 1.94 times in The Crimson, whereas a non athlete is only mentioned around 1.13 times. We wondered if this difference in mentions was significant, so we performed a test to find out.

We calculated the z-score for the total number of athlete mentions to be 17.46. Assuming comparison to if there was an equal probability of athletes and non-athletes being mentioned in The Crimson, this means we are over 17 standard deviations away from the mean. So student athletes are indeed more likely to be mentioned in The Crimson as compared to their non-athlete counterparts; we suspect this is due to the Sports section of The Crimson.


What sports teams are most mentioned?

Now that we know athletes are referenced more than non-athletes, we wanted to find out what sports the most Crimson-covered athletes play. With the data scraped from the athletic team rosters, we found the following:




Basketball players were the most mentioned in The Crimson, with 249 mentions, followed by ice hockey players at 182, football players at 157, and track and field athletes at 134. Meanwhile, skiers, heavyweight rowers, and cross country runners were all pretty rarely discussed, with 5, 3, and 0 total Crimson mentions, respectively.


These seem pretty accurate to the general coverage of sports even outside of Harvard, with large team sports generally being more discussed than more esoteric or individual sports, although there are definitely exceptions, like baseball players’ lower mentions. We’d have to look more into what The Crimson’s Sports section likes to publish to find out more!

Conclusions


All in all, the most likely demographics to be mentioned in The Crimson, based on current students name-dropped in the past 4 years of Crimson articles, are: the Class of 2021, Adams House residents, students concentrating in Economics, and student athletes (specifically in basketball). So to all seniors, Adams residents, Ec concentrators, or varsity basketball players: see you in the next Crimson article!


Harvard Open Data Project
© 2016-2021, Built with Sanity & Gatsby

Resources
Docs
Harvard Wiki

The code for this website is open source.
Subscribe to our monthly newsletter

Interested in open data? Join the team.