These are just some of the critical questions that decision makers in the criminal justice system—judges, magistrates, parole boards, and prison administrators among them—must make every day. They weigh the facts and make their decisions, often with too little time and incomplete information, which can result in poor outcomes.
Richard Berk wants to increase the chances that these decisions will be more accurate and more fair by applying big data to criminal justice practice. Berk, a professor of criminology, is an international leader in the use of machine learning: he amalgamates very large datasets with hundreds of thousands of observations and many hundreds of variables. Berk uses many different data sources—including records from police departments, courts, prisons, and departments of probation and parole—to provide information about large numbers of offenders.
The data is fed into a computer; then, Berk’s algorithms find relationships and patterns that can “dramatically improve forecasting the likelihood individuals will engage in criminal behavior,” he says.
He uses standard programming languages that have the basic structure for the algorithms needed and then hand tailors them for each application. He then passes that along to the criminal justice agencies that implement the software on their computers and with their data sets.
Berk, who is also chair of the Department of Criminology and holds an appointment as a professor of statistics in Wharton, describes his approach to creating what he calls “categories of risk” for potential offenders as an “actuarial method to help inform criminal justice decisions, but not to determine those decisions.” He has expounded on his method in many academic papers and his most recent book, Criminal Justice Forecasts of Risk: A Machine Learning Approach.
“There’s no predetermined model,” he says. “We give the information we have to the computer and let an algorithm figure out what the relationships are.”
One predictive relationship, for example, is between the age at which someone commits their first offense and future criminal activity.
“Individuals who start early are predisposed to longer and more violent criminal careers,” Berk says. “The algorithm also confirmed that street crime is a young man’s game that peaks in his late teens and early twenties, after which the risks fall off very rapidly. In contrast, domestic violence offenders can remain high risk into their forties and beyond.”
Berk, collaborating with Professor Susan Sorenson, Director of the Ortner Center in the School of Social Policy and Practice, has used machine learning to predict which offenders reported in domestic violence incidents would re-offend. Based on this research, Berk and Sorenson have been working with the Philadelphia Police Department to help improve how domestic violence incidents are documented and processed.
In another research study—conducted with Professor Sorenson, and Geoffrey Barnes, a lecturer at Cambridge University’s Institute of Criminology—data were collected on over 28,000 arraignment cases in which offenders in Philadelphia faced domestic violence charges. The data were analyzed with machine learning to project which offenders posed a significant threat during the time before their charges were officially resolved. Of particular importance were forecasts of violence in which serious injuries would result.
In a paper published in the Journal of Empirical Legal Studies last March, the researchers posited that using the forecasts could cut in half the number of people rearrested; in other words, the number of offenders arrested again for battering their victims.
“The algorithm confirmed that street crime is a young man’s game that peaks in his late teens and early twenties ... domestic violence offenders can remain high risk into their forties and beyond.”
“The most important decision in an arraignment is whether to release an offender before his or her next court date,” the article stated. “If magistrates used the methods we have developed and released only offenders forecasted not to be arrested for domestic violence within two years after an arraignment, as few as 10 percent might be rearrested.”
Berk has also applied machine learning to recidivism for violent crimes more generally. The Pennsylvania Board of Probation and Parole is using his software to forecast whether inmates released on parole would be rearrested, especially for crimes of violence. This information helps determine whether parole will be granted. The pilot project showed a 20 percent reduction in recidivism. Related work for the Philadelphia Adult Probation and Parole Department is currently used to determine the kind of supervision provided for individuals once they are placed back into their communities. One key finding is that a large number of individuals on probation pose almost no risk to public safety and can be minimally supervised. This has led to a more efficient use of supervisory resources.
In addition to collaborating with criminal justice organizations across the country, Berk helps agencies responsible for child welfare, both in the U.S. and abroad. He is starting a project to develop forecasting software for the State of New South Wales in Australia in order to predict which children whose families are currently receiving social services are at risk of life threatening child abuse. Closer to home, he has consulted with Child Protective Services of the Maryland Department of Human Resources.
Berk frequently collaborates on projects with Penn colleagues across disciplines. “It’s one of the great joys of my work,” he says. “They’re a talented, wonderful bunch. I’m lucky they are all on campus because together we are making real progress that might not happen otherwise.”
“Social scientists usually create a theoretical model about the way the world works ... but what is learned depends on how good the theory is. With big data we can, with no apologies, go where the data take us."
In addition to Professor Sorenson, he is working on projects with Aaron Roth, associate professor in the Department of Computer and Information Science; and Michael Kearns, Professor and National Center Chair in the Department of Computer and Information Science, and the founding director of Penn’s Warren Center for Network and Data Sciences; and Cary Coglianese, Edward B. Shils Professor of Law and director of Penn’s Program on Regulation, which specializes in regulatory law.
Last April, Berk and Coglianese won seed funding from Penn’s Fels Policy Research Initiative to launch a pilot effort to examine how machine learning can innovate government policy. The project, called “Optimizing Government: Policy Challenges in the Machine Learning Age,” involves a series of five seminars—one of which was held last spring—that are designed to foster collaboration across Penn and bring leading analysts and government leaders working on policy applications of machine learning to campus.
The conferences will center around the risks and rewards of using machine learning not only in law enforcement, but also in other government agencies, such as those responsible for regulatory decision-making, security and defense, social service delivery, energy management, and economic forecasting, among other areas. There is some urgency to these discussions; as Berk and Coglianese have noted, “Already some government agencies are exploring limited ways to use machine learning in support of a range of governmental responsibilities.”
That there may be challenges associated with introducing these new tools to a variety of policy applications may come as no surprise. Last July, for example, Bloomberg Media’s website posted a story about machine learning that featured Berk and stated, “Supporters of these tools claim they’ll help solve historical inequities, but their critics say they have the potential to aggravate them, by hiding old prejudices under the veneer of computerized precision."
Berk acknowledged that “underlying issues about race and gender are real, [but] critics assume the goal is perfection, and that’s silly. We just need to improve current practice,” he says. Berk notes that judges make decisions “all the time by grouping people together. They look at a person and think, ‘I’ve been on the bench 10 years and I’ve seen lots of offenders, and I know this type.’ They’re estimating risk based on their experience. That’s current benchmark,” he says.
Berk’s goal is for criminal justice decisions to be more accurate and fair, and he is confident that this can be done. He compared machine learning to traditional methods of research in criminology.
“Social scientists usually create a theoretical model about the way the world works and then test that model with data,” Berk explains. “But what is learned depends on how good the theory is,” he added. “Sometimes the theory provides insight, and sometimes it is a straightjacket. With big data we can, with no apologies, go where the data take us. Often that is a very good place.”
Fellow social scientist Herbert Smith, a professor in the Department of Sociology and director of the School’s Population Studies Center, agrees. “There is no doubt that big data will have a big influence on how people think about social issues in the future,” he says. “Seeing the future is in part prediction, and this is where it seems to me that big data—the telescoping of scale and the computational revolution that allows us to see connections that we could not see (or imagine) previously—is going to become important in our understanding of social issues.” Given his groundbreaking role in this new wave of social science, Smith notes, “We are fortunate to have Richard Berk at Penn.”