A couple of days ago, Scott Greenfield was writing about some of the complexities of federal sentencing, when a commenter named Jake proposed a crazy solution:
Was there ever a task in the courtroom more ripe for automation?
Well, yes, there’s tons of administrative crap that can be, or has been, automated. However, Jake had a particular problem in mind that he’d like to solve:
As a representative of the ignorant masses, I find comfort in the notion that everyone would be given sentences using the same criteria, and never again subjected to the whimsy of some of the judges I’ve read about on this blog.
This “problem,” that judges impose disparate sentences on seemingly like-situated defendants, has long been a vexing problem. It was one of the foundational arguments for the Sentencing Guidelines, to create greater consistency in sentencing across the country, so that a judge sentencing a defendant in a drug conspiracy in Wichita would impose a sentence reasonably the same as one in Brooklyn. Consistency was the goal, and from a substantial distance, it appeared to achieve that goal.
The problem was that it failed miserably to accommodate the myriad personal details that comprise the heart of sentencing. Indeed, it precluded judges from doing so, forcing lawyers into striving mightily to come up with arguments about why their defendant’s circumstances fell outside the “heartland” of the guidelines. Most of the time, these arguments failed. One size fits all sentencing was imposed, and those who feared mercy slept well at night.
Interestingly, a commenter named David hit on the obvious solution almost immediately, which is to use a randomly assembled team of sentencing judges and take the average of their sentences as the final sentencing result (with some complications if the sentences don’t converge sufficiently). Scott dismisses this idea as off-topic, but in fact it directly addresses the exact problem Scott described. Because the judges wouldn’t be working from a strict guideline, they are free as judges to “accommodate the myriad personal details that comprise the heart of sentencing.” Yet because they are averaging the sentences across a group of judges, there will be less likelihood of imposing “disparate sentences on seemingly like-situated defendants.”
That conclusion falls out of some basic math and statistics. If you have a sample population that exhibits a certain variance between samples — such as judges passing sentences — and you collect the samples into groups, then the variance between the group averages will be smaller than the variance between the individual samples. This is why diversifying a stock portfolio reduces risk, and it’s why people pool their risk by purchasing insurance for disasters they can’t afford. Since each judge’s sentence is averaged in with the others, no single judge indulging his whim can change the sentence too much.
This is, of course, a highly impractical idea that would be difficult to organize and expensive to operate. (Although lots of sporting events use it, and shouldn’t criminal sentencing be at least as orderly as the judging in Olympic ice dancing?) However, it’s still a lot more realistic than the idea of automated sentencing.
Actually, Jake may have been imagining something fairly modest. Perhaps he only meant to automate the calculations. I’m pretty sure that lawyers who have to work with the guidelines already have worksheets and spreadsheets for that. It wouldn’t be much of a step to write some sort of program that asks questions or presents forms to fill out and then calculates the sentencing range, kind of like TurboTax for federal sentencing.
I’m surprised there isn’t an iPad app for that already. I tried looking for one, but all I could find were copies of the guidelines that you could install. There was nothing to help with the calculations. There is the U.S. Federal Sentencing Guidelines calculator website written by Josh Goldfoot, which seems to walk you through a sentencing calculation, but that was just a personal project that appears to no longer be maintained. In any case, it’s certainly a doable project.
But it may not be a worthwhile project. After all, when it comes to automating things on a computer, the calculations are the easy part. The hard part is the work done by the lawyers and judges: Interpreting the guidelines and determining whether or not they apply to a particular case. It’s probably not possible with current technology to teach a computer to think like a lawyer.
But maybe we can cheat. That’s what Google does.
Search engines can do some amazing things these days, but they don’t actually understand what’s written on a web page. The science of natural language understanding hasn’t yet come far enough for computer programs to understand a natural human language the way humans do. What Google does is generate complex statistical information about the words on web pages, and then it observes human behavior in creating and clicking on links to determine which pages have information that is relevant to user queries. Google doesn’t understand (at least not the way a human would) what’s written on a web page, or what a user wants from a query, but that doesn’t stop it from “learning” how to help people find information.
The legal world has already begun to use this kind of machine learning technology during e-discovery to make document review more efficient. If a party to litigation responds to a discovery request with 100,000 documents, the other side will have to have a team of lawyers review the documents to decide which ones are actually relevant to the matter at hand. If those documents are in electronic form, however, it’s possible to use predictive coding to speed up the review process.
The way it works is that the document review team starts by reviewing a representative sample of the document set, scoring each document based on what relevance it might have to the case. The predictive coding software generates statistical summaries of the documents, and it uses those statistical summaries to analyze the choices made by the human document reviewers. This is similar in concept to the way Google looks at how people use links on the web. The software then tries to predict how the human reviewers would score all of the remaining documents. This guess can then be used to prioritize the review of the remaining documents, to try to find the most useful material as soon as possible.
In theory, we should be able to build a Sentence-O-Matic 1000TM using the same principles. We would start with a training set of documents from, say, 100,000 criminal cases. We’d input all of it into a machine learning system. Some of the data would be structured values, such as the identity of the laws under which the defendant is being charged, his prior convictions, and demographics data. Much of the data, however, would simply be the text of the documents themselves, along with tags to identify what they are — motions, briefings, arguments, testimony, transcripts, and so on. The data would also have to include the resulting sentence.
We’d then let the system crunch on the data for a while, to try to find relationship rules between the structured and unstructured data about the cases and the resulting sentences. It could, for example, discover that certain words in certain documents in certain types of cases are correlated with higher or lower sentences. Once we have a complete set of rules, we can run the algorithm the other way around: We feed documents from another test set of, say, 10,000 cases, and let it apply the rules to predict the sentences, and we score it on the accuracy of the result. We repeat the learn-and-test cycle over and over, tweaking the algorithms each time, until it’s accurate enough for our purposes. The resulting system will respond like a hypothetical average judge.
At least that’s the theory.
And it’s a theory that sometimes works. Predictive coding for e-discovery is a real thing, and there’s a reason why so many of the world’s browsers use Google as their home page. But from what I know about predictive analytics, it’s not ready for a task like this. It’s great for supporting a human task — finding websites to read or prioritizing documents for review — but I can’t see it replacing humans at critical tasks. There’s a reason we don’t use analytics engines to replace doctors or engineers, and I can’t see them replacing lawyers or judges either.
(Remember back when a lot of companies tried using automated document searches in place of customer service representatives for emailed support questions? That didn’t work out very well, did it?)
Of course, if we actually did try something like this, you know what would happen, right? All of those annoying SEO “experts” would start offering their Criminal Sentencing Optimization (CSO) services to lawyers, to help them prepare documents that are stuffed full of whatever it takes to game the Sentence-O-Matic. “We’ll show you how to fill your briefs with proven sentence-reducing keywords!”
I don’t think anyone wants to live in that world.