OMNIA 101: Generative AI

Bhuvnesh Jain and Greg Ridgeway, co-directors of the Data Driven Discovery Initiative, explain ChatGPT and generative AI, and what this technology means for learning and education.

Friday, May 19, 2023

By Lauren Rebecca Thacker

Illustration by Marcin Wolski

With the release of ChatGPT and Microsoft’s AI-powered Bing search engine, generative AI is on people’s minds. Chatbots can retrieve information, create code, and, in some cases, participate in conversations that can feel eerily human. But how do these AI chatbots work, and what does it mean for learning and education? To find out, we spoke with the co-directors of Penn Arts & Sciences’ Data Driven Discovery Initiative, Bhuvnesh Jain, Walter H. and Leonore C. Annenberg Professor in the Natural Sciences, and Greg Ridgeway, Professor and Chair of Criminology.

Greg Ridgeway, Professor and Chair of Criminology

How does generative AI work?

BJ: Essentially all the text that exists on the internet is fed into a deep learning network with the goal of ranking the set of words that are most likely to occur next, given a prompt. It doesn’t know any underlying logic, whether it’s the logic of grammar or math or causation. It simply makes the best possible guess based on the preceding sequence of words and its huge “training data.”

Are there meaningful differences between chatbots?

GR: The training dataset is really key in this model of just trying to predict the next word. If you train a chatbot on The New York Times, for example, it’s going to have a particular style that replicates the kind of vocabulary and phrases in that publication. If you train it on Twitter, it’s going to be wildly different.

And then there is the fact that private organizations own the chatbots. We don’t know how they decide on guardrails. I played with this idea and asked a chatbot to write a plan to overturn the 2020 election. And it would not do that. It chastised me and said I should have more trust in my elected officials. That is a stance the chatbot has not learned from the internet as a whole, but because someone engineered the bot in a certain direction. 

Bhuvnesh Jain, Walter H. and Leonore C. Annenberg Professor in the Natural Sciences

What does the rise of generative AI mean for your research?

GR: For a criminologist, there are a lot of data buried in text records. For example, we might look at a transcript to see whether a case involved defensive gun use. Previously, researchers would either rely on armies of research assistants or customized, task-specific machine-learning algorithms. But now you can pull the data more easily and start on analysis. 

BJ: In astronomy and physics, numbers and images and plots are very important, and they’re not things ChatGPT is designed for. But even in my field, I find its ability to summarize quite complex parts of the literature quite impressive. It makes mistakes, but it is also able to answer questions about subtle topics like light bending by a black hole.

What about the classroom?

GR: I teach a data science class and the typical student has never coded before. In the past, my goal was to teach them how to write code from scratch. But now ChatGPT and other code builders can do that for them. So, I’ve raised my expectations for what my students can do. For students, there is a lot of distance between asking the question and using data to answer it. But with ChatGPT as an aide, the distance shrinks. 

BJ: I totally agree. Recently, I guided students through a simulated live ChatGPT learning session. In a 90-minute class, they were able to produce code for the challenging task of how to separate a star and a galaxy in faint images—one of the central image analysis problems in astronomy. I didn’t expect them to understand everything, but they understood enough. And then in a follow-up lecture, I was able to walk them through and deepen their understanding. 

What is the future of generative AI?

BJ: I think at least every six months, we’re going to learn about capabilities that will surprise us. It may seem like a cliché, but I’m quite excited and occasionally scared about possibilities we have not even foreseen. 

It will certainly improve on its existing capabilities. But as with technological changes of the past, which ones will actually get adopted on a large scale remains to be seen. I think there’s a good chance that its impact on scientific research will be largely positive. No guarantees, but I expect generative AI will boost learning, it will boost exploration, and emerge as a versatile digital assistant.