Outside Their Bubbles
The Undergraduate Data Science Hangout connects students and faculty across seemingly disparate disciplines.
Brent Cebul and Shane Jensen share a research interest in urban planning—but that’s where their professional overlap ends.
Or does it?
Cebul, Assistant Professor of History, examines how urban planning policies have reinforced economic and racial segregation over time—while Jensen, Professor of Statistics at Wharton, analyzes modern data like Census demographics and land use zoning information in relation to a city’s crime rates. But however different, both researchers’ work provided perfect fodder for discussion in this year’s Undergraduate Data Science Hangout, held every Thursday over eight weeks this summer.
“Brent and Shane each addressed how to establish causation in their own datasets, which were fascinating—but the best part was seeing the linkages between a statistician in Wharton and a historian in Penn Arts & Sciences,” says Bhuvnesh Jain, Walter H. and Leonore C. Annenberg Professor in the Natural Sciences, who coordinated the Hangout. “What makes data science so appealing to me is that shared methodologies connect students and faculty across disciplines.”
Data science is the study of data and how to extract knowledge from it. Any Penn student whose summer research involved the quantitative analysis of data—in the humanities, social sciences, or natural sciences—could join the Data Science Hangout (remotely). Introduced in summer 2019 as an informal classroom experience during which students could share their work, learn about analytics tools, listen to faculty talks, and collaborate on projects, the event was more structured this time around, with two online faculty presentations followed by a data science tutorial every week.
This year, researchers from 10 departments covered 22 topics ranging from virus tracking to electoral politics to the universe’s expansion. The eclectic mix surprised physics major Sarah Kane, C’23, whose research project involved searching for new planets.
“The hangouts really let you see outside your own little bubble of data analysis,” she says. “Especially in the hard sciences, like physics and chemistry and biology, it is easy to think of data as just numbers and points on a graph. But there were data scientists from so many other fields—linguistics, psychology, criminology—that I realized data can be words, or really almost anything. That is a valuable lesson, especially for an undergrad.”
Although she’s majoring in biology, Katelyn Boese, C’23, found herself most enthralled with a presentation by Phil Gressman, Professor of Mathematics, who used mathematical models to simulate the spread of COVID-19 on college campuses.
“We got to learn how building a simulation requires synthesis of a lot of real-world information—in this case facts and statistics about how the virus transmits,” she says. “And, of course, it was very relatable to all of us.”
Jain developed the 2020 program with David Brainard, RRL Professor of Psychology and Associate Dean for the Natural Sciences, and Ann Vernon-Grey, Senior Associate Director for Undergraduate Research. Sara Casella, a doctoral student in economics, helped organize the activities and taught some of the tutorials. Undergraduate students are often wrangling large datasets for the first time, and the group sought to expose them not only to different data analysis techniques, but also to peers whose research looked nothing like their own.
“When I entered academia, I thought a variety of disciplines would be within easy reach. But, in fact, it is very hard to step out of one’s narrow—even sub-disciplinary—boundaries,” Jain says. “One of my motivations for the Hangout was to interact across disciplines while also showing students that they can expand their horizons, one week posing a question in sociology and the next week posing a question in astronomy.”
And the questions kept coming, he says; almost every week, no matter what the subject, Q&A sessions ran so long that eventually he had to intervene.
“Students were really engaged with the talks. The Hangout was a way to fulfill their needs and also give us ideas for how to build something bigger around data science programming in the future,” Jain says. “It’s nice to have ways to bring students from across the university together to learn the same set of analytics tools they all need to use.”