Hacking Harvard open data to fight crime, save energy, and improve student life

An introduction to people working on Harvard Open Data Project.

By Athena Kan • 08-27-2016

Brian Sapozhnikov and the Harvard Computing Society couldn’t figure out when the next shuttle around campus was coming or when their laundry was done, and Harvard didn’t offer any useful apps or services to help with that. But they discovered public Harvard APIs that no one knew about, and since then, they’ve been building a laundry and shuttle tracking app.

Harvard offers over 2,700 courses per semester, but undergrads weren’t happy with Harvard’s official course catalog. Using public but obscure raw course data, students in Harvard’s introductory CS course (CS50) made their own course catalog, which is still the most popular way for Harvard students to find courses.

As we’ve seen from stories like the course catalog and the shuttle tracker, Harvard students can quickly solve problems facing the student body once they’re able to find and use open data.

And Harvard is a prolific creator of data: the university has published everything from historical tuition costs to data about every one of Harvard libraries’ 12 million books.

The problem? Students can’t find this data in the first place. Much of Harvard’s data is decentralized, lost in a tangle of subdomains. And, they often can’t use this data even once they find it: data is often in non-machine-readable formats like PDFs or graphs.

Enter the Harvard Open Data Project

That’s why Neel Mehta , Brian Sapozhnikov, and I teamed up with other students and faculty around Harvard to start the Harvard Open Data Project (HODP).We’re a student-faculty team dedicated to opening and analyzing Harvard data to empower our community members to improve campus life.

We’re working with faculty and advisors across the university, including:

Nick Sinai, the former deputy CTO of the United States who led President Obama’s Open Data Initiative. He’s now an adjunct professor at Harvard.
Erie Meyer, a founding member of the U.S. Digital Services and Joan Shorenstein fellow this year.
David Eaves, a professor at Harvard’s Kennedy School of Government (HKS) who teaches about technology and government.
Jim Waldo, a computer science and HKS professor and the former Harvard CTO.

We’re also working with some of Harvard’s finest classes and organizations: the Harvard Academic Computing Committee, Harvard’s introductory computer science course (the aforementioned CS50, which boasts 700+ students every year), Harvard’s data science course (CS109), HackHarvard, Harvard Data Ventures, and more.

Cataloging and hacking

Our ultimate goals are to convince Harvard to start standardizing and organizing its data and to empower Harvard students and community members to make powerful apps, services, and policies with this data.

To that end, we’re starting off by cataloging and cleaning Harvard’s messy and scattered data and getting Harvard students to build examples of impactful products and policies using this data.

We’re partnering with organizations around campus to gather data, analyze it, and produce apps, visualizations, and policies to improve our community and student life. And, to make it happen, we’ve brought in a team of approximately 30 undergrads who are passionate about technology, policy, and hacking to improve campus. We’ve chosen a star-studded lineup of students to lead these projects:

Harshita Gupta: estimating the price of course books
Chris Kuang: creating a Harvard club/organization database
Ike Jin Park: tracking Harvard buildings’ energy consumption patterns
Yong Li Dich: understanding food waste in Harvard dining halls
Nabib Ahmed: mapping Harvard police crime logs

Each team lead has built a cross-functional team of 4–5 students who will be working on everything from data science and visualization to policy writing and partnerships. They’ll be writing about their projects on this publication, so follow us to learn about the exciting apps, visualizations, and policy recommendations they’ll be developing!

Brian Sapozhnikov will also be leading a team to develop Harvard’s first open data catalog, where we’ll centralize and showcase Harvard’s scattered datasets and spotlight some exciting apps that students have already made with them. Neel Mehta and I will lead a R&D team to forge new partnerships, connect with media outlets, and find our next big projects.

It’s an exciting time for all of us at the Harvard Open Data Project, and we can’t wait to see where we can make an impact around Harvard. Check us out, and be in touch!