Mathematics

Steve Marmel posts this infographic he apparently got from Mother Jones magazine that purports to show institutionalized racism in Ferguson, Missouri:

[Image reads: Institutional racism by the numbers. In 2013 in Ferguson: 483 black people were arrested, 36 white people were arrested, 92% of searches and 86% of car stops involved blacks.]

I wonder if either Marmel or MJ realize that these numbers don’t prove a thing about racism. They don’t even hint at racism. Not by themselves. Without knowing the racial makeup of Ferguson, we can’t tell if police stops, searches, and arrests of black people are disproportionately high or disproportionately low. If Ferguson is 93% black, then police are arresting white and black people equally as a proportion of the population.

Now as it turns out, according to 2010 census data, Ferguson is about 70% black and 30% white. (I’m leaving out the 3.3% of the population that identify as some other race, including mixed races.) This means that in 2013 Ferguson police arrested 0.6% of white residents and 3.4% of black residents, which means the relative risk of arrest for black people was about 5.8 times that for white people. The relative risk was about 5 times higher for being searched and about 2.6 times higher for having their car stopped.

Now that doesn’t prove that Ferguson police are racists — you’d have to rule out confounding factors such as age, you’d have to find out what fraction of the arrested are from out-of-town, and you’d have to figure out whether the arrests were justified by differential crime rates — but it does strongly suggest that something is going on that merits further investigation.

And it really wouldn’t have been hard to add that information to the infographic.

Over at Reason, Nick Gillespie takes a look at spree-killer (and ex-cop) Christopher Dorner’s “manifesto” and pronounces it useless:

If there is a message buried deep within Dorner’s incoherent litany of recriminations, anger, and random name-checks, it’s this: People who go on shooting sprees typically tell us very little about society at large. They are by definition far, far beyond the range of normal (or even abnormal) behavior and, as such, shouldn’t be used to generalize about larger social forces at work.

That sounds right to me, but let’s put some numbers to it:

Mother Jones magazine (unlikely to downplay shooting statistics) counts 29 people who committed mass shootings in this country in the last decade. In that same period, the United States had 50 Nobel prize winners.

Using an average of 3 shootings per year, and making the unrealistic assumption that each shooter lives out his full U.S. life expectancy of almost 80 years, that works out to a total of 240 mass shooters among us, many in jail (and many only among us in theory by my unrealistic assumption) in a population of over 313 million. That’s less than one in a million of us, even by the most generous assumptions. By comparison, we have over 400 billionaires.

These killers do not represent us. They are nothing like us.

In one of his posts today, New York criminal defense lawyer Scott Greenfield writes about the error rate for drug-sniffing dogs:

More to the point was the dog hits simply aren’t anywhere nearly as worthy of credit as courts have held. Consider whether it would be equally acceptable for a cop to flip a coin in order to establish probable cause to search.  For a dog whose established ability to sniff out drugs runs in the typical 50% range, it’s no more likely to be accurate than a flip of a coin.

I’m guessing the “50% range” figure comes from a Chicago Tribune article a few weeks ago based on an analysis of state drug dog data in Illinois, which found a relatively low accuracy rate:

The dogs are trained to dig or sit when they smell drugs, which triggers automobile searches. But a Tribune analysis of three years of data for suburban departments found that only 44 percent of those alerts by the dogs led to the discovery of drugs or paraphernalia.

That 44% figure for success means that the false-positive ratio is a whopping 56%. Scott was being generous when he rounded down to 50%. However, in comparing dogs to flipping a coin, Scott makes a very common math mistake by confusing the dog’s false alert ratio with the dog’s total alert ratio.

It helps if we make up some numbers. Suppose the police dogs in some department are used in 1000 sniffs, and the dogs alert in 200 of them, but a search only finds drugs on 88 of those people. This means the other 112 are false positives, and we can calculate the false positive ratio as the number of false alerts divided by the total number of alerts:

fp = 112 / 200 = 56%

To keep the situation simple, let’s assume the dog never misses any drugs, so the 88 drug carriers are all there were in the sample population of 1000. In other words, 8.8% of the people are carrying drugs.

Now we can calculate what would happen if the police officer flipped a coin instead. Out of 1000 people, the coin would be expected to “alert” for 500 of them. Since 8.8% of the people are carrying drugs, we would expect 44 of these people to have drugs, meaning the other 456 are false positives. Thus the false positive ratio would be:

fp = 456/ 500 = 91.2

That’s a heck of a lot worse than the dog’s 56% ratio. The only way the coin could achieve a false positive ratio as good as the dog’s is if 44% of all the people sniffed are carrying drugs. Then you’d expect the 500 searches to find drugs on 220 people with the other 280 being false positives:

fp = 280/ 500 = 56

As long as less than 44% of the population is carrying drugs, a dog with a known 56% false positive ratio is performing quite a bit better than a random coin flip.

Not that that’s saying much. And it doesn’t really hurt Scott’s point, either, because the dog is still wrong more than half the time, and each time it’s wrong, some innocent person has to endure the humiliation of a police search.

As is probably often the case, although Scott was wrong, the opposition is even wronger:

Dog-handling officers and trainers argue the canine teams’ accuracy shouldn’t be measured in the number of alerts that turn up drugs. They said the scent of drugs or paraphernalia can linger in a car after drugs are used or sold, and the dogs’ noses are so sensitive they can pick up residue from drugs that can no longer be found in a car.

This might be correct in a narrow sense. Dogs certainly are capable of detecting trace odors left behind by things that are no longer there. It’s a reasonable defense of the dog’s nasal prowess.

But so what? This isn’t about the dog, it’s about whether the search is justified. The only reason the police are allowed to invade your privacy and seize your property is because they have a good reason to believe they will find evidence of a crime. If the police aren’t finding evidence as often as they expect to, it suggests their reason for the search is not as good as they say it is. The cause of their error isn’t as important as the fact that they are in error.

I’m no lawyer, but I’m pretty sure a judge isn’t supposed to grant a search warrant because a location might once have had evidence of a crime. The police are supposed to have reason to believe that the evidence will be there when they search. If that’s a good rule for a judge, it ought to be a good rule for a dog. But it’s clear that in at least 56% of the cases when a dog alerts, the evidence isn’t there.

As if that wasn’t bad enough, the Tribune story gives us good reason to believe that the 56% error rate is optimistic.

The Tribune obtained and analyzed data from 2007 through 2009 collected by the state Department of Transportation to study racial profiling. But the data are incomplete. IDOT doesn’t offer guidance on what exactly constitutes a drug dog alert, said spokesman Guy Tridgell, and most departments reported only a handful of searches based on alerts. At least two huge agencies — the Chicago Police Department and Illinois State Police — reported none.

The Tribune asked both agencies for their data, but state police could not provide a breakdown of how often their dog alerts led to seizures, and Chicago police did not provide any data.

That leaves figures only for suburban departments. Among those whose data are included, just six departments averaged at least 10 alerts per year, with the top three being the McHenry County sheriff’s department, Naperville police and Romeoville police.

In other words, the 56% error rate is for dogs working in departments that were willing to disclose their dogs’ performance statistics. We can only wonder how bad the numbers are in departments that don’t want to reveal how well their dogs were doing. And then there are the departments that apparently don’t even care enough to keep statistics.

The most damning item in the Tribune article, however, is that the dogs’ success rate declines to 27% when the person being sniffed is Hispanic.

This is a reminder that these statistics aren’t a measure of the dog’s performance, they’re a measure of the performance of the dog-and-handler system, and I don’t think it’s the dogs that are likely to be prejudiced against Hispanics.

The most benign explanation for these numbers is that police dog handlers are more likely to expect Hispanics to have drugs, and that they somehow inadvertently cue the dog to alert. For example, if they lead a non-alerting dog around the cars of Hispanic drivers for a longer period of time than other drivers, the dog may learn that he can get his master to stop by doing a drug alert.

This sort of unintentional cueing is sometimes called the Clever Hans effect, after a horse that appeared to be able to accomplish all sorts of amazing mental feats, signalling his answers by stomping his foot. Eventually, scientists figured out that his owner would tense up when the horse was supposed to start answering a question and then relax as soon as he reached the right number of stomps. There is evidence that some drug dogs are doing the same thing.

Other explanations for the high error rate with Hispanics are that the police dog handlers are more likely to misinterpret a dog’s behavior as an alert, are intentionally cueing the dog to alert, or are simply lying about the alert because they want to do a search.

(It might also seem possible that Hispanics and their cars are simply exposed to drugs more often–perhaps due to greater involvement in drug culture–and that the dogs are alerting to drug traces. But I can’t think of an explanation for how Hispanics could have increased the rate at which they have had drugs without also increasing the rate for which they have drugs when searched. It seems to me those statistics should rise and fall together, which would not affect the dogs’ error rate.)

A big part of the problem with drug dogs is the lack of standards:

Experts said police agencies are inconsistent about the level of training they require and few states mandate training or certification. Jim Watson, secretary of the North American Police Work Dog Association, said a tiny minority of states require certification, though neither he nor other experts could say exactly how many.

A federally sponsored advisory commission has recommended a set of best practices, though they are not backed by any legal mandate.

Compare this to the situation with the breath testing devices used by police to detect intoxicated drivers. Those things are calibrated and tested regularly. If you get busted for blowing 0.09 and your lawyer can show that the testing device hasn’t been calibrated and tested according to the proper schedule, there’s a pretty good chance you’ll go free.

But if a dog at the side of the road alerts at your car, the cops are going to search you, and whatever they find will be usable, because the judges always believe the dogs.

Update: Radley Balko is taking on this same topic today.

Scott Greenfield, font of so much that I can riff off of, has a complaint about gang experts who try to paint every action by the defendant as related to his membership in a gang:

If a defendant has a tattoo, the expert will testify that tattoos are “brands” typically worn by gang members.  If the tattoo happens to say “Tiffany”, then the testimony is changed ever so slightly to accommodate, by the expert then saying that gang members typically brand themselves with the names of their girlfriends.  You get the message.  No matter what the evidence, the defendant can’t win.  It’s always connectible to being a gang member, according to the expert.

Aside from the implication by Scott that police gang experts are pulling answers out of their ass don’t really know much about gangs, I can also see something of a logical problem with the so-called expert’s theory. The cop’s statement that gang members have specific types of tattoos is a logical statement: If he’s a gang member, then he’ll have a specific tattoo, say of a snarling dog. If we try to get all mathematical, the statement it would look like this:

gang member ==> dog tattoo

But that’s not what the prosecutor wants the jury to believe. The prosecutor doesn’t care about the tattoo. He’s trying to prove that the defendant is a gang member. He’s trying to prove the converse:

dog tattoo ==> gang member

The problem is, as a matter of math, the truth of a statement does not imply the truth of its converse. Even if it’s true that all gang members have dog tattoos, it doesn’t mean all people with dog tattoos are gang members. It’s easier to see with a more obvious example: All gang members have noses:

gang member ==> nose

but that doesn’t in any way prove

nose ==> gang member

That is, not all people with noses are gang members.

So why is it that “all people with noses are gang members” is obviously wrong, but “all people with dog tattoos are gang members” seems plausible? The answer is a little more complicated because it involves the real world, the statistics of experimental design for testing hypotheses, and the availability heuristic.

Suppose you wanted to test the hypothesis “all people with X are gang members,” where X is either “noses” or “dog tattoos.” You’d do it by setting up an experiment—in this case a survey of the population—to look for counterexamples to the hypothesis. That is, you’d try to find people who have X but are not gang members. Finding even one proves it’s not absolute truth.

The real world is a little fuzzy—especially in the social sciences—and our methods of testing are less than perfect, so real-world hypothesis testing usually involves testing a statistical relationship. In this case, we’d be testing “people with X have a high probability of being gang members” and we’d still be looking for people who have X but are not gang members. The more such counterexamples we find, the lower the probability of a relationship.

A scientific test of this kind of hypothesis would involve conducting random surveys and gathering enough data to reach statistically reliable conclusions. But when it’s not a scientific investigation, when it’s just us trying to figure something out, we don’t do a scientific study. We just try to think of counterexamples.

When X is “noses” it’s easy. We all know lots of people with noses, and nearly all of them are not gang members. Such a large number of counterexamples makes it easy to destroy the hypothesis that all people with noses are gang members.

When the hypothesis is “all people with dog tattoos are gang members,” it’s a little harder to think of counterexamples, simply because dog tattoos are so rare that we may not know of anybody who has one. Our inability to find counterexamples makes the hypothesis seem plausible.

This way of thinking is called the availability heuristic. We assume something is likely because we can easily bring to mind examples. The more examples we can think of, the more likely we believe it to be.

Although the availability heuristic is not as rigorous and generalized as conducting a scientific investigation, in essence it’s a similar process. A scientific investigation gathers data using randomized trials, controlled studies, and careful surveys and then analyzes the data to arrive at results. The availability heuristic does the same kind of analysis, but it works only on the data we have in our heads at that moment. There is no data gathering process.

The availability heuristic is a perfectly valid way of thinking about our day-to-day world, about which we have lots of data but don’t have time to gather more. It tends to fail us, however, when thinking about parts of the world with which we are unfamiliar. That’s why we have science.

What are the practical implications of all this philosophy when it comes to thinking about gang experts? Probably not much. But if I ever find myself on a jury listening to this kind of testimony, I hope I’ll keep a few points in mind.

Basically, any assertion of a general rule—all fish have fins, all cats have fur, all people with dog tattoos are gang members—is equivalent to an assertion that counterexamples do not exist: There are no fish that don’t have fins, there are no cats that don’t have fur, there are no people who have dog tattoos who are not in gangs.

(Actually, the gang expert will likely invoke several indicators of gang membership in combination—gang tattoos, gang hats, gang shoe laces, gang jewelry—and the standard in the courtroom is not that there are absolutely no counterexamples, but rather that counterexamples are rare enough that they do not constitute a cause for reasonable doubt. Nevertheless, an assertion of a general rule is still an assertion about the rarity of counterexamples.)

So if you hear someone say that dog tattoos are sign of gang membership, you should be wondering why that person believes there are no (or few) counterexamples. Remember, it’s not just about gangs. It’s also about tattoos. If he’s really an expert on gangs, he may very well have observed that gang members have dog tattoos, but how does he know that non-gang-members don’t have dog tattoos as well? He’d have to know a lot about tattoo prevalence in society at large. In addition to being a gang expert, he’d also have to be a tattoo expert.

Or at least he’d have to have received reliable information from a tattoo expert or be aware of a scientific study of some kind that addressed the issue. If I were on the jury, I’d want to hear about that.

Eugene Volokh catches an apparent problem in an article in Oregon State’s Daily Barometer:

According to a press release issued by the Women’s Center, 2,000 rapes occur every five minutes.

This amounts to the claim that, on average, every woman in the United States is raped once every 9 months, which is absurd. Eugene tracks down the actual press release, which says:

About 2,000 rapes are committed daily at the rate of about one every 5 minutes.

That’s completely different from the newspaper quote. But it’s still not right: One every five minutes would only be 288 per day. That’s a seven-fold discrepancy. I don’t know which of those numbers is correct, but that sentence is definitely wrong.

Eugene goes on to find some other statistics that make more sense. You can read the rest if you want to.

I’m more interested in one of the comments, by someone called dk35, who I think felt that Eugene was minimizing the problem of rape by focusing on the statistics:

Who doesn’t think rape isn’t a serious problem? What I don’t get is why you need statistics at all to convince people that rape is a serious problem.

What I don’t get is how else dk35 expects people to learn that rape is a serious problem. Divine inspiration? By being raped?

Here’s why you need statistics: I have never been raped. I have never even met someone who told me they were raped. There has been no rape in my life. By my direct observation and by second-hand accounts, the incidence of rape is exactly zero.

It is only through indirect evidence such as reliable statistical reports that I can be aware of the depressing frequency of rape.

I just heard a radio ad for Harper College that was…surprising.

The ad lists off a bunch of pairs of things where one is greater than the other and eventually suggests that I’d be greater with a Harper education than without one. What caught my attention, however, was the very first pair of things listed, which went something like this:

“x to the n is greater than x to the n minus one.”

I suppose that you could read that a couple of ways, but if they mean

xn > x(n-1)

then I think they haven’t checked enough cases.

In case you forgot, xn means x times x times x and so on until x has been multiplied by itself n times. For example, if x = 2 and n = 4, we have 24 = 2 x 2 x 2 x 2 = 16. In this case, the ad’s assertion is true, because x(n-1) = 2(4-1) = 23 = 2 x 2 x 2 = 8.

The problem is that for a mathematical statement to be true it has to work for all possible assignments of variables, and that’s not the case here. For example, if x is a fraction between 0 and 1, then the opposite is true:

xn < x(n-1)

for 0 < x < 1

The ends of that range are bad too: The number 1 multiplied by itself any number of times is always 1, so if we set x = 1 then both sides of the equation are equal to 1, meaning neither side can be greater than the other. The same thing is true for x = 0. And if x is a negative number, the sign changes every time you multiply it by itself, so one side of the equation is negative, depending on the value of n.

I don’t normally expect mathematical rigor from advertising, but when there’s a formula, and it’s an educational institution…

(I tried to email the Harper math department, but the first two people on the department web page bounced. I think I may have found the address of someone who teaches Quantitative Literacy at Harper. I’ll update this if I hear anything back.)

Update: I got a friendly reply from Gary Schmidt thanking me for the contact and telling me (with just a hint of concern) that this doesn’t represent Harper’s mathematics program.

I knew that. Actually, it brought on a pang of sympathy. I spent a lot of time as a student and staff member at the Illinois Institute of Technology, and I can remember some things coming out of our marketing department that made us cringe a bit.

In the spirit of the mathematicians’ Erdos Number, I got the idea to introduce an InstaPundit number. I was going to call it a Reynolds Number because that sounds more scientific, but it turns out that designation is already used by fluid dynamics folks. Actually, it also turns out that Erik Jones at The Bind invented the InstaPundit Number a few months ago, but since he’s apparently stopped maintaining that site, maybe he won’t find out.

My InstaPundit number is 3, using this path:

InstaPundit –> No Watermelons Allowed –> Bill’s Content –> WindyPundit

(There is a webring link from Watermelons to me, but since the owners of sites don’t choose their webring relationships, I don’t think it should count.)