Is Consistency Enough?

People I trust have been saying good things about Jeff Gamso’s blog, Gamso – For the Defense, and I’ve been meaning to check it out for months now. I finally got around to it, and I’m glad I did, because I discovered a fascinating post called “Hobgoblins of Little Minds.”

It’s about what experts mean when they say a piece of evidence is “consistent”:

The criminalist who did the ballistics comparison wasn’t sure he had a match…The most he could say is that the gun was “consistent with” the one that fired the bullet that killed the young woman. The murder weapon.
“Consistent with.” What the hell does that mean?
It means “might be.” It means “maybe or maybe not.” It means “sure it’s possible.” It means “who knows.” All of which is a way of saying that it means not much of anything at all.

I have no idea what the ballistics expert means by “consistent,” but if he has any scientific integrity, the word “consistent” has a slightly more precise meaning than Gamso is allowing for.

Consider Gamso’s next paragraph:

“He’s not desperately poor.” That’s consistent with the guy who got laid off from the plant and is struggling to get by on unemployment and food stamps and also with Bill Gates and his billions. It tells you nothing.

But it does tell me something. It rules out the possibility that he’s desperately poor. Assuming we have a reasonable definition for “desperately poor,” it tells me he’s not living in the streets, sick and starving.

“Not desperately poor” is an awkward phrase, because it’s the negation of “desperately poor” rather than a positive assertion the way “consistent” is. But that leads us to a clearer understanding of what “consistent” means in ordinary usage: It means not inconsistent. That is, when the expert testifies that the gun he tested is “consistent with” the murder weapon, it means he cannot rule it out.

The only possible results of any test are that it is consistent or inconsistent with the idea being tested. It sounds pretty weak, doesn’t it? Saying you can’t rule something out is a long, long way from saying it’s true. As a matter of philosophy of science, however, this is as good as it gets. Scientific tests never really prove anything is completely true. Our technological civilization is built on scientific theories which have never been proven true, but which have survived countless attempts to prove them false.

“Consistent” means something, and when you have enough consistent results, it comes as close to certainty as science can get.

Gamso quotes from the Federal Rules of Evidence:

Rule 401. Definition of “Relevant Evidence”
“Relevant evidence” means evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be without the evidence.

That definition bothers me for a reason that is probably a bit pedantic. In particular, I’m botherd by the phrase “make the existence of any fact…more probable or less probable”. I think I know what the rules are trying to say, but I believe it is an error in reasoning to say that a fact can be more probable or less probable.

The facts may be unclear, confusing, complex, uncertain, or unknown. But whatever the facts are, they happened. “Probable” has nothing to do with it. There’s no way that evidence or testimony at a trial can somehow reach back in time and change what really happened, or change the probability that something happened. Evidence can’t make reality more probable or less probable, because reality is fixed.

Evidence in science is no different when you examine it carefully. For example, a public health study might be reported in the nightly news as estimating that “10 million Americans have Greenfield’s disease.” A newspaper report might add that the study has an error of “plus or minus 2%.” That sounds like a strict cutoff, but a scientist would explain that it’s really a confidence interval. If you delve into the study, you’ll probably find out that the newspaper reporter used the study’s 95% confidence interval. The scientist would explain that this means there’s a 95% chance that the true number of Americans with Greenfield’s disease is within plus or minus 2% of 10 million.

The scientist would be wrong, however, for the same reason the rules of evidence are wrong. However many Americans have Greenfield’s disease—let’s say it’s 9,982,458—that’s how many have Greenfield’s disease, and there’s no chance or probability involved. What our 95% confidence interval of plus or minus 2% is really saying is that conducting this scientific study has a 95% chance of giving us a result that is within plus or minus 2% of the true number. Or, to put it another way, our result is consistent with the theory that Greenfield’s disease affects about 10 million people.

Getting back to our ballistics expert, when he says the defendant’s gun is consistent with the murder weapon, he’s not—despite what the Rules of Evidence say—making it more likely that the gun is the murder weapon. Rather, he’s saying that with some degree of scientific confidence, the prosecutor’s theory that the gun is the murder weapon was not disproved by the ballistic examination.

Now let’s look at a simpler example.

Suppose we suspect that a coin has been modified so that when flipped it always come up heads. We think this modification is subtle and undetectable to the naked eye (and we have no instruments available). How can we prove that the coin has been gimmicked if we can’t detect the modification?

Simple: We flip the coin.

If we flip it once and it comes up heads, that proves almost nothing. The coin will do that half the time even if it’s perfectly legitimate.

So we flip the coin again, and it comes up heads again. With two tests of the coin in our data set, the possibility that it’s a gimmicked coin is slightly higher, because this result will happen by random chance only one time in four. Do a third test, and it’s one time in eight. Four tests will come up all heads only one time in 16 with a fair coin, and so on.

If we keep flipping the coin and we keep getting heads, the possibility that this is a fair coin gets smaller and smaller. Ten heads in a row is only a 1-in-1024 possibility with a fair coin. By the time we get to 20 straight heads in a row, the odds of this being a fair coin are less than one in a million. It’s safe to conclude there’s something wrong with the coin.

(I’ve just made the same mistake the Rules of Evidence made. The coin is either gimmicked or it’s not. The 1-in-a-million probability is really a statement about the accuracy of the testing method. That is, it’s not really that the odds of this being a fair coin are less than 1 in a million. Rather the odds of a fair coin behaving this way are less than 1 in a million.)

The coin testing process I just described is good science for three basic reasons. First, it puts numbers to its results. Real science almost always involves some math, and real scientific studies usually state their results in form of probabilities and confidence intervals. Gamso does not report that the ballistics expert gave any probabilities with his conclusions.

Second, and more generally, our conculsion about the coin includes information about the error rate of our testing process: The chances of a coin that is not gimmicked behaving this way are less than 1 in a million. When the ballistics expert testified that the gun was consistent with the murder weapon, did he quantify or even characterize the possibility that it wasn’t the murder weapon? For example, did he explain what percentage of all guns would be consistent with the murder weapon? If it’s 1 in a million, that’s a pretty good sign that you’ve got the right guy. If it’s 1 in 10, the expert’s conclusion is just barely relevant.

Third, our conclusion about the coin is based on a series of independent tests. Each flip of the coin is a test. The results of any single flip indicate very little, because even a fair coin will come up heads (produce a false positive) 50% of the time. However, when we conduct a series of 20 independent tests, we can reduce the false positive to one in a million. In general, the more tests we conduct, the more we can reduce the liklihood of a false positive.

This last point is crucial to reaching a conclusion because (in theory, anyway) that’s logical rationale behind how the evidence in a trial builds up to a conclusion. Let me see if I can illustrate this with some data that I totally made up.

Let’s pretend that the ballistic match is a very simple two-step process. First, we match the caliber of the gun, which must be one of 10 possible calibers which occur in equal numbers—i.e. for any given caliber, 10% of all guns are a match. Second, we match the land-and-groove pattern within the barrel, of which there are 10 possible patterns, all occuring in equal numbers. Since each matching step eliminates 90% of the guns, a ballistic match that passes both steps has eliminated 99% of the guns, meaning that only 1 in 100 guns will match.

In addition, we have a witness ID, which we’ll assume is also 90% accurate. Combined with the gun match, this eliminates 90% of the remaining false positives, meaning that only 1 in 1000 gun owners match the criteria. We’re getting somewhere.

It all goes wrong, however, if there are hidden connections between the criteria. For example, how did the police narrow down the suspect list that they presented to the witness? If they already had the ballistic report, perhaps they did a database search for people who owned guns of the same caliber as the murder weapon, and used the resulting list to build their suspect list.

If so, this means that the witness ID and part of the ballistic examination are correlated and not independent. And to the extent that they’re correlated, we have to factor that out of the calculation. In this case, every suspect presented to the witness was known to have a gun that matched the caliber of the murder weapon, so the ballistic expert’s discovery of this fact adds nothing new. This eliminates the 1-in-10 ratio for the caliber match, and we’re back down to a 1 in 100 chance of a random person matching the known facts about the murderer.

One of the reasons DNA evidence is considered so good is that scientists have a pretty good understanding of the prevalence of various DNA markers in the human population and of the correlations between them. In fact, DNA testing is explicitly based on statistics, which is why DNA test results usually include an estimate of the chance of a false positive. With a good DNA sample, the chance of a random match is often less than 1 in a billion, and lawyers love to bring that number out in trial because it is so impressive.

By comparison, Gamso’s account of fingerprint experts saying things like “There is no error rate. It’s 100 percent accurate.” is infuriating. Only abstractions are perfect. Everything in the real world has an error rate.

Sometimes that error rate is vanishingly small, which allows us to say that something is “error-free” when speaking informally. But if you press for a number, a real scientist should be able to find one.

Share This Post

Comments

Jonathan Hansen says
February 2, 2010 at 3:38 pm
Very good post. As a trained scientist, I appreciate your explication of the manner science progresses, and the essential logic behind rejecting hypotheses. Most people really never think about these issues or understand them, yet they are the ones called for juries.
Not directly related, but probability is a part of the core of reality. Even physicists were blown away when the best explanation for quantum phenomena entailed an innate randomness, even today not interpretable in any intuitive manner…
Mark Draughn says
February 2, 2010 at 6:46 pm
I’ve had some science classes, I’ve read a lot of popular science, and I’ve worked with scientists, but I’m not really a scientist myself, so it’s good to hear I got something right. Thanks.
It’s also good to hear someone say that quantom science is not intuitive, since it sure leaves me clueless…

Related

Reader Interactions

Comments

Leave a ReplyCancel reply