Big data—the analysis of extremely large data sets that reveals patterns and associations—is at the forefront of the modern research process. This past summer, Penn Arts & Sciences offered two opportunities for students—at all levels and in all disciplines—to share their current work involving big data and learn about the analytics tools available to them: The Data Science Hangout, held in the Collaborative Classroom in the Weigle Information Commons at the Van Pelt Library, and the Introduction to Python for Data Science Summer Boot Camp.
The Data Science Hangout was designed as an informal gathering for undergraduate students whose summer research involved the quantitative analysis of datasets, including variants of machine learning. It gave students an opportunity to present their work to peers, learn about research being conducted by undergraduates in other departments, and listen to talks on data science from faculty. The program was overseen by faculty mentors Bhuvnesh Jain, Walter H. and Leonore C. Annenberg Professor in the Natural Sciences; David Brainard, RRL Professor of Psychology and Associate Dean for the Natural Sciences; and Emily Hannum, Professor of Sociology and Associate Dean for the Social Sciences.
Sebastian Gonzalez, C’20, and Tara DaCunha, C’22, both physics and astronomy students mentored by Jain, attended the Hangout. Gonzalez, who presented on neural networks, says that before the Hangout, he didn’t fully understand how data science and analysis could be used in other disciplines. “The meetings gave me a chance to branch out and see what others were studying,” he says.
Other attendees included Jennifer Locke, C’22, a physics and astronomy student working with Masao Sako, Associate Professor of Physics and Astronomy, to use Dark Energy Survey data to categorize variable stars; Kassidy Houston, C’21, who studies psycholinguistics and uses data sets to find patterns in speech; and Lilian Zhang, C’22, who is in the Biological Basis of Behavior Program and studies Spanish, and is applying big data to the fields of psychology and cognitive science.
"There are commonalities across all of these fields in the ways you go about trying to make inferences from large data sets," says guest speaker Martha Farah, Walter H. Annenberg Professor in the Natural Sciences, who is working with her students to analyze large, publicly available neuroimaging databases to study whether people's adult socioeconomic status and its relation to intelligence can, in part, be explained by differences in brain structure. "And so even though a particular field of research might have nothing to do with what one of these students has been studying in school, they are still able to understand and ask excellent questions."
Robert DeRubeis, Professor of Psychology, who previously organized a data tournament in which teams from universities and labs across the world used big data to present strategies for improving and personalizing patient treatment, was also a guest speaker.
The Introduction to Python for Data Science Summer Boot Camp for graduate students acted as a primer course for the powerful programming platform, one of the main languages used in modern machine learning and data analysis. It did not require any prior programming experience.
The program was led by Dillon Brout, GR’19, a postdoctoral researcher in the Department of Physics and Astronomy who learned how to program as an undergraduate. Cyrille Doux, a Physics and Astronomy postdoc, and Sara Casella, a graduate teaching assistant in the Department of Economics, also acted as mentors. The course was sponsored by the School of Arts & Sciences Office of the Dean, the Center for Particle Cosmology, and MindCORE, Penn's hub for the integrative study of the mind.
Doux is a cosmologist who uses large catalogs of galaxies to answer questions about the universe. He says Python is a programming language with a simple syntax that gives you a clear interface with your data. "You don't need to go very deep into understanding how you're manipulating numbers and the memory on the computer itself for it to be a powerful tool," he says.
Yiran Chen, a first-year doctoral candidate in linguistics, is using behavioral data to study subjects like tonal phenomena and music-language intersection in an interdisciplinary effort with MindCORE, while Jennifer Stiso, a fourth-year doctoral student in neuroscience at the Perelman School of Medicine who studies learning and memory, is using recordings of brain and behavioral data from subjects all over the world who participate in tasks online.
"A lot of people are interested in jobs related to data science," says Stiso. "If you're trying to figure out which customers are going to want a specific deal, or if you’re a healthcare company trying to determine who responds better to different kinds of treatments, both of those questions benefit from big data analysis."