When Bhuvnesh Jain and Greg Ridgeway co-founded the Data Driven Discovery Initiative (DDDI) in 2021, their initial goal was to support the undergraduates, graduate students, and postdocs across the School of Arts & Sciences (SAS) who were applying data science tools in their studies and research.
As DDDI progressed, though, Jain kept hearing about the intriguing work being done by staff data scientists employed in labs throughout SAS and beyond, in departments that, at first glance, had little to do with each other. Whether it was the political scientists analyzing massive amounts of voting and polling data from national elections or the radiologists turning to artificial intelligence to glean insights from brain images, these researchers were all using the same tools and encountering similar challenges. But there was no central support system and few points of connection, at least around the technical aspects of their research.
“They seemed so spread out and almost isolated,” says Jain, Walter H. and Leonore C. Annenberg Professor in the Natural Sciences. “But our analysis questions have some common elements, so we can learn from each other.”
Jain’s observation spurred a new effort within DDDI to connect Penn’s more than 100 staff data scientists—professionals who work alongside faculty, using advanced computer science techniques to push research forward in fields as disparate as linguistics, psychology, and biology. Newer developments in computer science and artificial intelligence are making such connections possible; for example, a machine learning algorithm that analyzes an image the same way whether it’s of a neuron or a galaxy. That means data scientists in neuroscience and astrophysics, for instance, are suddenly speaking the same language.
Jain, along with Colin Twomey, DDDI’s executive director and a former DDDI postdoc, wanted to provide a forum for these researchers to talk to one another. “It presents an opportunity to discover a ton of fascinating different topics that are worked on at Penn,” Twomey says. “If you’re working in your lab in biology, you might not fully be aware of what’s going on in linguistics. And yet they’re using similar tools and addressing sometimes similar problems.”
Speed Dating for Data Scientists
The job of data scientist has evolved into its modern role in the past 20 years. Their role of these scientists is to use the latest computer science techniques to store, analyze, model, and visualize the increasingly vast amounts of data generated by modern living and collected by researchers.
Even when the applications look different, the tools are often the same, Twomey says. The data scientists could learn a lot from each other about how to apply those resources in their own work—but only if they have the chance to interact, he adds.
Jain, Twomey, Ridgeway, and their DDDI colleagues decided to create that opportunity. They started with a low-stakes social event, held in December 2023 in the common space used by the research data and digital scholarship librarians in Van Pelt-Dietrich Library. Though SAS was the main focus, they also advertised to colleagues in Penn Medicine, Penn Engineering, and Wharton.
Penn has this really amazing and diverse community of people. Without creating the opportunity to meet people doing this kind of work, it’s not going to happen just by chance.
At the first of what they called Penn Data Science Meetups, 30 scientists from across the schools came to hear lightning talks and network with colleagues. One speaker, for example, talked about how Penn Libraries researchers are using machine learning to digitize archival records, some of which date back hundreds or thousands of years.
Teaching a computer to recognize letters and characters so different from modern ones is a massive challenge. “They’re using the best that machine learning has these days to try and tackle these challenges specific to this really unique dataset,” Twomey says. “I would never have known that if we didn’t foster this opportunity to have folks come share the work they do.”
Networking Nodes
After first successful meeting, the DDDI team organized a second one in May 2024, this time in the Singh Center for Nanotechnology’s Glandt Forum. The audience expanded to include the Penn Institute for Biomedical Informatics and the Innovation in Data Engineering and Science initiative out of the Engineering school.
In addition to the lightning talks, the meeting featured a panel of senior data scientists speaking about their career paths and their views on the future of the field. Lindsay Warrenburg, a senior data scientist at Penn Medicine’s Healthcare Transformation Institute, was among the speakers.
Warrenburg joined Penn in July 2023 and learned about DDDI through a talk held at the Erdős Institute, a data science career development program of which Warrenburg is associate director. She saw the value of DDDI’s efforts to connect data scientists and reached out to Twomey. “In this type of setting, people are pretty siloed,” Warrenburg says. “They don’t even know that other data scientists really exist and how to find them.”
The DDDI meetups help to break down those silos, she says. “Different disciplines can use slightly different methodologies,” she says. “So, you not only learn perhaps a new way of approaching a problem, but it creates a more holistic understanding of what data science can be.”
The Future of Data Science
Data scientists—and the researchers and students they work with—will only have more to learn from each other as technology advances, Jain says. “The knowledge of state-of-the-art AI is something that a lot of faculty and staff are now interested in,” Jain says.
The University, in fact, is betting big on researchers’ interest in artificial intelligence: Last October, Provost John L. Jackson, Jr. announced the Penn Advanced Research Computing Cluster, an initiative that, among other things, enables all Penn researchers to access a high-performance computing cluster at a nearby data center. The program will double the computing capacity available to researchers, opening new avenues for scientists in many fields.
That means staff data scientists will have even more questions, ideas, and opportunities to discuss with each other, says Jain, who adds that he hopes the DDDI’s efforts will serve as scaffolding upon which this group of Penn scientists can build their own community.
DDDI is planning another meetup before the end of spring semester and eventually wants to expand to include researchers across the University. “Penn has this really amazing and diverse community of people,” Twomey says. “Without creating the opportunity to meet people doing this kind of work, it’s not going to happen just by chance.”