Category Archives: Mathematics

How To Win the Big Powerball Drawing!!!

There’s only about a dozen hours left to buy Powerball lottery tickets before the big drawing Wednesday evening. As I write this, the estimated jackpot prize is $1.4 billion dollars, making it the largest jackpot in the history of the game. And because I love my readers so much, I’m going to tell you how to win the Powerball jackpot!

I’m also going to tell you how to make the most possible money playing the Powerball lottery, which, it turns out, is not the same thing. I’ll even have a few thoughts on picking numbers.

But first let’s discuss some misinformation about winning the lottery. Self-proclaimed “lottery expert” Richard Lustig has in the past reportedly offered these tips:

  1. Set a “lottery budget” each month and use it to buy multiple tickets for the same game.
  2. Do not use the “quick pick.” Select random numbers for better odds.
  3. Play the same numbers every week.
  4. Select sequential numbers.
  5. Reinvest your winnings.
  6. Play numbers that haven’t been part of a winning combination in a long time.

Of these suggestions, #1 has no effect on the game, but may help you avoid spending too much. #3 makes no difference, #2 and #4 will reduce how much you earn, and I think so will #6. And Lustig’s #5 tip is the worst idea in the world.

I’ll explain all that later. But first, I’d like to talk about one thing Richard Lustig gets absolutely right. I notice it because Ashley Feinberg made fun of it on Gawker, as did Ryan Grenoble at HuffPo and Jameson Parker at Addicting Info:


Technically, Richard Lustig’s advice as shown in this image is correct. The key is the phrase “increase chance” of winning. The math is straightforward: A Powerball drawing can result in one of 292,201,338 possible number combinations, and you win if you own a ticket with that number. So if you own 1 ticket, your odds of winning are 1 in 292,201,338. If you own 2 tickets with unique number combinations, your odds of winning are 2 in 292,201,338. And so on. The more unique tickets you buy, the more chances you have to win. So if you want to win the jackpot, buy a lot of tickets to get the best odds you can afford.

(The number combinations have to be unique, because two or more tickets with the same number combination don’t increase your chances of winning. Also, in practice you can play more than one set of numbers on a ticket, but in terms of your winnings it’s the equivalent of using that many single-play tickets, so I’m just calling each play a ticket.)

As you buy more and more uniquely numbered tickets, your odds of winning increase linearly all the way up until you buy 292,201,338 tickets with unique number combinations — one for every possible number combination in the game — at which point the Powerball drawing must choose a number combination on one of your tickets.

See? I told you I’d tell you how to win the Powerball jackpot: Just buy all possible tickets!

Unfortunately, just because you win doesn’t mean you’ll make money. Powerball plays cost $2 each, so guaranteeing yourself a win by playing all 292,201,338 combinations will cost you $584,402,676, or just over half a billion dollars. Powerball jackpots are usually much less than that, so normally you’d be losing a lot of money if you bought all the tickets. (Actually, since you bought every ticket, you’d also win all the other prizes, which adds up to about $93.5 million, but you’d still lose a ton of money.)

Ah, but if the prize is $1.4 billion, as it is now, wouldn’t you make a net profit of over $800 million? Doesn’t that mean that buying a lottery ticket actually makes sense? Aren’t you guaranteed to make money if you do this?

As you can probably guess, it does not. There are three reasons for this.

The first is that the Powerball lottery is run by the state governments, and they allow themselves to lie to their customers in ways no one else can. The main Powerball jackpot number always the sum of the amount paid out in a series of 30 annual payouts. If they say it’s a $300 million jackpot, that means $10 million a year for 30 years. If you take it all in a lump sum, you get a lot less (but you get it all now). I’m pretty sure that no ordinary financial investment or gambling operation would be able to get away with misleading customers like this.

The current Powerball jackpot prize pool is estimated to be $868 million, so if you win it all, you can have that amount right away. (Unless your state is run by morons.) Or you can choose to let the state keep the money for you, in which case they will invest it in an annuity that will pay you about $46.7 million per year for 30 years, for a total of $1.4 billion. You could do the same thing yourself with the lump sum if you could find a way to invest the money that earns 3.7% interest, which is slightly better than the current prime rate.


The second reason you aren’t guaranteed to make money buying all 292,201,338 possible lottery tickets is because of income taxes. The top bracket of 39.6% kicks in at just under half a million, so if you won the $868 million prize pot, you’d pay $344 million in taxes and get to keep about $524 million. Since it cost you $584 million to buy all the tickets, you’d have a net loss of $60 million.

If you were professional gambler, you might be able to argue that the $584 million cost of the tickets was a legitimate business expense, so you’d only pay taxes on your net profit of $283.5 million, allowing you to keep about $171 million of the total. Not bad, right? Too bad you don’t actually have $584 million to spend on lottery tickets.

When you only purchase a fraction of the 292,201,338 possible tickets, you have to think in terms of something called expected value, which is the value of the jackpot prize multiplied by the probability of your actually winning it. So if you buy only 50% of all possible tickets (146,100,669) then you only have a 50% chance of winning, so the expected value of your winnings is half the lump sum payout, or $434 million. If you buy only one ticket, the expected value is 1/292,201,338 of the lump sum payout, which works out to $2.97. Since a ticket only costs $2.00, your expected pre-tax earnings are 97 cents for every ticket you buy.

You’ll have to pay taxes, of course, and since your expense is only $2, you owe taxes on the entire prize, leaving you with $524 million as described above, so the expected value of a ticket is $524,000,000/292,201,338 = $1.79.

That’s the second answer I promised you: Since every ticket you buy gives you an expected loss of 21 cents, the way to make the most money playing the Powerball lottery is to minimize your losses by purchasing exactly zero tickets. You cannot lose if you do not play.

(This is why Lustig’s advice to “reinvest” your winnings is so bad. If you lose money every time you play, then spending your winnings on playing the lottery just means you’ll lose that money too.)

The good news is that the expected value of each ticket depends on the odds of winning the jackpot, which never change, and the amount of the prize pool, which is creeping up. By my calculations, when the jackpot hits about $968 million you will get to keep just enough after taxes (assuming you live in a state that doesn’t have income tax) to cover the cost of buying all 292,201,338 possible tickets, which means the expected value of a $2 ticket will be 2 dollars, so buying tickets won’t lose you money.

Or it wouldn’t if it weren’t for…

The third reason it’s hard to make money on Powerball is that the grand prize is awarded on a parimutuel basis. That means the prize pool is split among all the jackpot winners. If two people have winning tickets, each one gets only half the prize pool. That’s normally pretty unlikely in Powerball, but when the prize gets very big, a lot of people buy tickets, and that increases the probability that more than one person will pick the winning number combination.

The previous Powerball drawing had a lump sum jackpot of $950 million, and the next one is estimated to be about $500 million higher. Since the jackpot gets about 32.5% of the ticket income, that means the Powerball people are expecting to sell $1.5 billion worth of tickets. At $2 a piece, it means that about 750 million tickets will be sold. Since there are only 292 million number combinations, there’s a pretty good chance that two or maybe three people will win, and that will reduce the expected value of a ticket below the $2 purchase price, making the expected value a losing proposition.

Ethan Wolff-Mann at Time estimates the break-even for the next drawing is at about $1.5 billion lump sum, or about $2.6 billion annuity payout, when adjusted for the probability of multiple winners. Anything less than that leaves you with an expected loss.

Now that I’ve thoroughly destroyed the value proposition of the Powerball lottery, are you still trying to come up with an excuse to buy a ticket? Well let me see if I can help by pointing out a couple of twists in my analysis.

First, it’s not clear that the mathematical expectation (amount won multiplied by the probability of winning) is the correct measure for a lottery player. It depends on your attitude toward risk. If I offered you either $100,000 or a 50/50 chance of winning $200,000, which would you choose? Since both choices have the same expected value ($100,000) most people will probably prefer the choice with less uncertainty and take the flat $100,000 award. But some people will choose to gamble. They have a greater tolerance for risk, and there’s no compelling way to say how much risk tolerance is the correct amount.

Among other things, it depends on your situation. If you need $200,000 to start a business and $100,000 won’t be enough, then you might take a chance on $200,000-or-bust to try to start your business. If you’ll die next week without a life-saving medical procedure that costs $200,000 then the certain $100,000 award is worthless to you. So maybe it makes sense in your personal analysis to risk the almost certain loss of $2 in order to have a really small but non-zero chance of retiring early and living happily ever after.

The second problem with my analysis is that I’m only counting the financial benefits of playing Powerball. There can be other benefits, such as the fun of knowing that for at least a little while there’s a very real (but very small) chance you could win big. Really big. If that fantasy becomes the most important thing in your life, you may have a gambling problem, but otherwise there’s nothing wrong with buying a ticket just for the fun of it. It’s certainly no worse than other ways we spend money to entertain ourselves, such as going to concerts, riding roller coasters, or sitting in the bleachers at Wrigley watching the Cubs lose.

So now that you’ve decided to play anyway, what numbers should you pick? Richard Lustig advised picking numbers that haven’t been winners in a long time, but that’s a classic fallacy: The balls have no memory. There’s no way for them to know that they’re “due” to win. Each lottery drawing is independent, and what happened in past drawings has no effect on what happens in the next drawing.

Mathematically-inclined people will also tell you that the drawing of numbers is completely random, so it doesn’t matter which numbers you pick. Both parts of that statement are wrong. The numbers drawn are not completely random, and it does matter which ones you pick.

The numbers aren’t completely random because they are produced by an imperfect mechanical system. The “identical” balls almost certainly aren’t. Not perfectly. Variations in the process that produces them will cause variations in properties like size, weight, roundness, weight distribution, and surface friction, and those variations will make some balls slightly more likely to be drawn than others. The effect will be small — otherwise it would be quickly detected and the ball set would be replaced — but it will be there, and it might show up in the history of the drawings.

A similar trick can work in casinos that are too cheap to replace expensive equipment like roulette wheels. Alert gamblers could spot when numbers start coming up non-randomly and change their bets to try to beat the house. You probably can’t do that with Powerball because the variations are too small and the lottery has a much bigger house edge than most casino games, but couldn’t you at least limit your losses a teeny bit by betting on the numbers most likely to come up?

Probably not. This version of Powerball has been going on for a few months, and over that period the four most common numbers drawn are 35, 25, 29, and 20, with the fifth most common number being a tie between 11, 31, 45, 49, and 51. The most common “powerball” number is a tie between 7 and 15. These anomalies appear in the data because either (a) there’s something special about those balls, or (b) it’s random chance. If there’s something special going on — if the balls are defective or rigged — then betting those numbers could slightly improve your chances of winning. But it it’s all just random fluctuations, then your chance of winning is unaffected by the numbers you pick, so betting the most common winning numbers can’t possibly hurt. It sounds like win-win.

Except that the lottery jackpot is perimutuel, and thousands of people have read the same statistics I just used. If a hundred of them decide to pick those numbers (a single $20 bill covers all 10 combinations), you could end up splitting the jackpot with all of them. The disadvantage of splitting the purse overwhelms any statistical advantage to picking the most likely numbers.

In fact, the effects of the perimutuel system are the most important factor in choosing lottery numbers, and they are the only factor you can control. Slight variations in the lottery balls notwithstanding, the numbers drawn in the lottery are random, but the numbers that lottery players pick for their tickets are not.

Humans are terrible at picking random numbers. No matter how hard we try, we fall into patterns and choose collections of numbers that badly fail the statistical tests for randomness. And a lot of people don’t even try. They reason, correctly, that all numbers have (very nearly) the same chance of winning, so they pick numbers from birth dates, address, and telephone numbers. Those numbers follow patterns — months less than 13, days less than 32 — that skew their distribution, causing ticket buyers to collectively cluster around some numbers and avoid others.

I don’t know which numbers people pick and which they avoid, but I can think of at lease one way for you to find out: Go around to as many different lottery sales locations as you can, and start collecting discarded tickets from the trash. It doesn’t matter that they’re all losers. Your purpose is not to learn which numbers win, it’s to learn which numbers people pick.

Go to as many different kinds of ticket places as you can, in as many parts of the country as you can. You probably need several thousand tickets to get a good statistical sample, and since about 70-80% of tickets sold are random Quick Picks generated by lottery computers, you’ll have to sift through a lot more to find the ones you need. I think Quick Picks are marked “QP,” but that may not be the case everywhere. If you can’t tell quick picks from hand-selected numbers, then you need even more tickets in your sample to overcome the random statistical noise of the computer’s choices. I’d aim for 10,000 tickets.

Once you’ve got your tickets, tally up how often people pick each number. Find the numbers that everyone else is picking the least, and use those numbers when buying your tickets. The lottery is still (very nearly) random, so you won’t be any more likely to win, but if you do win, you’ll be less likely to have to share it with someone else.

Or you could just buy a Quick Pick. It’s a lot more random than if you picked the numbers yourself, so there’s less chance of you and a bunch of other people clustering around a shared group of numbers that force you to split the jackpot, and it’s a lot less work than going through the trash.

(I’ve heard that the trash thing has actually been done successfully on some sweepstakes drawings. The house take was relatively small, the prizes were smaller and easier to win, and they were all awarded from perimutuel pools, so people who played thousands of tickets could expect to win a bunch of times, and the statistical analysis helped them reduce payout splits.)

Since I started writing this, the estimated Powerball jackpot has bumped up another $100 million to reach a total of $1.5 billion, or $930 million as a lump sum. I’m not going to go back and change the numbers in this post to match the new numbers from Powerball. They’ll probably just going to go up again, anyway. And besides, my basic conclusions still hold: There are three ways to improve your net lottery winnings (or more realistically, reduce your losses):

  1. Play the fewest tickets you can, ideally none. (At least while the jackpot is below $2.6 billion).
  2. Play only when the pool is very large. (Because it makes the payout higher.)
  3. Play numbers that are less likely to be played by other people.

So if you’re feeling that lottery fever, this would be a good time to treat yourself to a $2 Quick Pick and fantasize about what you’d do with all that money…

Mother Jones’s Weak Math

Steve Marmel posts this infographic he apparently got from Mother Jones magazine that purports to show institutionalized racism in Ferguson, Missouri:

[Image reads: Institutional racism by the numbers. In 2013 in Ferguson: 483 black people were arrested, 36 white people were arrested, 92% of searches and 86% of car stops involved blacks.]

I wonder if either Marmel or MJ realize that these numbers don’t prove a thing about racism. They don’t even hint at racism. Not by themselves. Without knowing the racial makeup of Ferguson, we can’t tell if police stops, searches, and arrests of black people are disproportionately high or disproportionately low. If Ferguson is 93% black, then police are arresting white and black people equally as a proportion of the population.

Now as it turns out, according to 2010 census data, Ferguson is about 70% black and 30% white. (I’m leaving out the 3.3% of the population that identify as some other race, including mixed races.) This means that in 2013 Ferguson police arrested 0.6% of white residents and 3.4% of black residents, which means the relative risk of arrest for black people was about 5.8 times that for white people. The relative risk was about 5 times higher for being searched and about 2.6 times higher for having their car stopped.

Now that doesn’t prove that Ferguson police are racists — you’d have to rule out confounding factors such as age, you’d have to find out what fraction of the arrested are from out-of-town, and you’d have to figure out whether the arrests were justified by differential crime rates — but it does strongly suggest that something is going on that merits further investigation.

And it really wouldn’t have been hard to add that information to the infographic.

On the Significance of Mass Shooters

Over at Reason, Nick Gillespie takes a look at spree-killer (and ex-cop) Christopher Dorner’s “manifesto” and pronounces it useless:

If there is a message buried deep within Dorner’s incoherent litany of recriminations, anger, and random name-checks, it’s this: People who go on shooting sprees typically tell us very little about society at large. They are by definition far, far beyond the range of normal (or even abnormal) behavior and, as such, shouldn’t be used to generalize about larger social forces at work.

That sounds right to me, but let’s put some numbers to it:

Mother Jones magazine (unlikely to downplay shooting statistics) counts 29 people who committed mass shootings in this country in the last decade. In that same period, the United States had 50 Nobel prize winners.

Using an average of 3 shootings per year, and making the unrealistic assumption that each shooter lives out his full U.S. life expectancy of almost 80 years, that works out to a total of 240 mass shooters among us, many in jail (and many only among us in theory by my unrealistic assumption) in a population of over 313 million. That’s less than one in a million of us, even by the most generous assumptions. By comparison, we have over 400 billionaires.

These killers do not represent us. They are nothing like us.

Some Dog-Sniffing Math

In one of his posts today, New York criminal defense lawyer Scott Greenfield writes about the error rate for drug-sniffing dogs:

More to the point was the dog hits simply aren’t anywhere nearly as worthy of credit as courts have held. Consider whether it would be equally acceptable for a cop to flip a coin in order to establish probable cause to search.  For a dog whose established ability to sniff out drugs runs in the typical 50% range, it’s no more likely to be accurate than a flip of a coin.

I’m guessing the “50% range” figure comes from a Chicago Tribune article a few weeks ago based on an analysis of state drug dog data in Illinois, which found a relatively low accuracy rate:

The dogs are trained to dig or sit when they smell drugs, which triggers automobile searches. But a Tribune analysis of three years of data for suburban departments found that only 44 percent of those alerts by the dogs led to the discovery of drugs or paraphernalia.

That 44% figure for success means that the false-positive ratio is a whopping 56%. Scott was being generous when he rounded down to 50%. However, in comparing dogs to flipping a coin, Scott makes a very common math mistake by confusing the dog’s false alert ratio with the dog’s total alert ratio.

It helps if we make up some numbers. Suppose the police dogs in some department are used in 1000 sniffs, and the dogs alert in 200 of them, but a search only finds drugs on 88 of those people. This means the other 112 are false positives, and we can calculate the false positive ratio as the number of false alerts divided by the total number of alerts:

fp = 112 / 200 = 56%

To keep the situation simple, let’s assume the dog never misses any drugs, so the 88 drug carriers are all there were in the sample population of 1000. In other words, 8.8% of the people are carrying drugs.

Now we can calculate what would happen if the police officer flipped a coin instead. Out of 1000 people, the coin would be expected to “alert” for 500 of them. Since 8.8% of the people are carrying drugs, we would expect 44 of these people to have drugs, meaning the other 456 are false positives. Thus the false positive ratio would be:

fp = 456/ 500 = 91.2

That’s a heck of a lot worse than the dog’s 56% ratio. The only way the coin could achieve a false positive ratio as good as the dog’s is if 44% of all the people sniffed are carrying drugs. Then you’d expect the 500 searches to find drugs on 220 people with the other 280 being false positives:

fp = 280/ 500 = 56

As long as less than 44% of the population is carrying drugs, a dog with a known 56% false positive ratio is performing quite a bit better than a random coin flip.

Not that that’s saying much. And it doesn’t really hurt Scott’s point, either, because the dog is still wrong more than half the time, and each time it’s wrong, some innocent person has to endure the humiliation of a police search.

As is probably often the case, although Scott was wrong, the opposition is even wronger:

Dog-handling officers and trainers argue the canine teams’ accuracy shouldn’t be measured in the number of alerts that turn up drugs. They said the scent of drugs or paraphernalia can linger in a car after drugs are used or sold, and the dogs’ noses are so sensitive they can pick up residue from drugs that can no longer be found in a car.

This might be correct in a narrow sense. Dogs certainly are capable of detecting trace odors left behind by things that are no longer there. It’s a reasonable defense of the dog’s nasal prowess.

But so what? This isn’t about the dog, it’s about whether the search is justified. The only reason the police are allowed to invade your privacy and seize your property is because they have a good reason to believe they will find evidence of a crime. If the police aren’t finding evidence as often as they expect to, it suggests their reason for the search is not as good as they say it is. The cause of their error isn’t as important as the fact that they are in error.

I’m no lawyer, but I’m pretty sure a judge isn’t supposed to grant a search warrant because a location might once have had evidence of a crime. The police are supposed to have reason to believe that the evidence will be there when they search. If that’s a good rule for a judge, it ought to be a good rule for a dog. But it’s clear that in at least 56% of the cases when a dog alerts, the evidence isn’t there.

As if that wasn’t bad enough, the Tribune story gives us good reason to believe that the 56% error rate is optimistic.

The Tribune obtained and analyzed data from 2007 through 2009 collected by the state Department of Transportation to study racial profiling. But the data are incomplete. IDOT doesn’t offer guidance on what exactly constitutes a drug dog alert, said spokesman Guy Tridgell, and most departments reported only a handful of searches based on alerts. At least two huge agencies — the Chicago Police Department and Illinois State Police — reported none.

The Tribune asked both agencies for their data, but state police could not provide a breakdown of how often their dog alerts led to seizures, and Chicago police did not provide any data.

That leaves figures only for suburban departments. Among those whose data are included, just six departments averaged at least 10 alerts per year, with the top three being the McHenry County sheriff’s department, Naperville police and Romeoville police.

In other words, the 56% error rate is for dogs working in departments that were willing to disclose their dogs’ performance statistics. We can only wonder how bad the numbers are in departments that don’t want to reveal how well their dogs were doing. And then there are the departments that apparently don’t even care enough to keep statistics.

The most damning item in the Tribune article, however, is that the dogs’ success rate declines to 27% when the person being sniffed is Hispanic.

This is a reminder that these statistics aren’t a measure of the dog’s performance, they’re a measure of the performance of the dog-and-handler system, and I don’t think it’s the dogs that are likely to be prejudiced against Hispanics.

The most benign explanation for these numbers is that police dog handlers are more likely to expect Hispanics to have drugs, and that they somehow inadvertently cue the dog to alert. For example, if they lead a non-alerting dog around the cars of Hispanic drivers for a longer period of time than other drivers, the dog may learn that he can get his master to stop by doing a drug alert.

This sort of unintentional cueing is sometimes called the Clever Hans effect, after a horse that appeared to be able to accomplish all sorts of amazing mental feats, signalling his answers by stomping his foot. Eventually, scientists figured out that his owner would tense up when the horse was supposed to start answering a question and then relax as soon as he reached the right number of stomps. There is evidence that some drug dogs are doing the same thing.

Other explanations for the high error rate with Hispanics are that the police dog handlers are more likely to misinterpret a dog’s behavior as an alert, are intentionally cueing the dog to alert, or are simply lying about the alert because they want to do a search.

(It might also seem possible that Hispanics and their cars are simply exposed to drugs more often–perhaps due to greater involvement in drug culture–and that the dogs are alerting to drug traces. But I can’t think of an explanation for how Hispanics could have increased the rate at which they have had drugs without also increasing the rate for which they have drugs when searched. It seems to me those statistics should rise and fall together, which would not affect the dogs’ error rate.)

A big part of the problem with drug dogs is the lack of standards:

Experts said police agencies are inconsistent about the level of training they require and few states mandate training or certification. Jim Watson, secretary of the North American Police Work Dog Association, said a tiny minority of states require certification, though neither he nor other experts could say exactly how many.

A federally sponsored advisory commission has recommended a set of best practices, though they are not backed by any legal mandate.

Compare this to the situation with the breath testing devices used by police to detect intoxicated drivers. Those things are calibrated and tested regularly. If you get busted for blowing 0.09 and your lawyer can show that the testing device hasn’t been calibrated and tested according to the proper schedule, there’s a pretty good chance you’ll go free.

But if a dog at the side of the road alerts at your car, the cops are going to search you, and whatever they find will be usable, because the judges always believe the dogs.

Update: Radley Balko is taking on this same topic today.

Mathematics In Gangland

Scott Greenfield, font of so much that I can riff off of, has a complaint about gang experts who try to paint every action by the defendant as related to his membership in a gang:

If a defendant has a tattoo, the expert will testify that tattoos are “brands” typically worn by gang members.  If the tattoo happens to say “Tiffany”, then the testimony is changed ever so slightly to accommodate, by the expert then saying that gang members typically brand themselves with the names of their girlfriends.  You get the message.  No matter what the evidence, the defendant can’t win.  It’s always connectible to being a gang member, according to the expert.

Aside from the implication by Scott that police gang experts are pulling answers out of their ass don’t really know much about gangs, I can also see something of a logical problem with the so-called expert’s theory. The cop’s statement that gang members have specific types of tattoos is a logical statement: If he’s a gang member, then he’ll have a specific tattoo, say of a snarling dog. If we try to get all mathematical, the statement it would look like this:

gang member ==> dog tattoo

But that’s not what the prosecutor wants the jury to believe. The prosecutor doesn’t care about the tattoo. He’s trying to prove that the defendant is a gang member. He’s trying to prove the converse:

dog tattoo ==> gang member

The problem is, as a matter of math, the truth of a statement does not imply the truth of its converse. Even if it’s true that all gang members have dog tattoos, it doesn’t mean all people with dog tattoos are gang members. It’s easier to see with a more obvious example: All gang members have noses:

gang member ==> nose

but that doesn’t in any way prove

nose ==> gang member

That is, not all people with noses are gang members.

So why is it that “all people with noses are gang members” is obviously wrong, but “all people with dog tattoos are gang members” seems plausible? The answer is a little more complicated because it involves the real world, the statistics of experimental design for testing hypotheses, and the availability heuristic.

Suppose you wanted to test the hypothesis “all people with X are gang members,” where X is either “noses” or “dog tattoos.” You’d do it by setting up an experiment—in this case a survey of the population—to look for counterexamples to the hypothesis. That is, you’d try to find people who have X but are not gang members. Finding even one proves it’s not absolute truth.

The real world is a little fuzzy—especially in the social sciences—and our methods of testing are less than perfect, so real-world hypothesis testing usually involves testing a statistical relationship. In this case, we’d be testing “people with X have a high probability of being gang members” and we’d still be looking for people who have X but are not gang members. The more such counterexamples we find, the lower the probability of a relationship.

A scientific test of this kind of hypothesis would involve conducting random surveys and gathering enough data to reach statistically reliable conclusions. But when it’s not a scientific investigation, when it’s just us trying to figure something out, we don’t do a scientific study. We just try to think of counterexamples.

When X is “noses” it’s easy. We all know lots of people with noses, and nearly all of them are not gang members. Such a large number of counterexamples makes it easy to destroy the hypothesis that all people with noses are gang members.

When the hypothesis is “all people with dog tattoos are gang members,” it’s a little harder to think of counterexamples, simply because dog tattoos are so rare that we may not know of anybody who has one. Our inability to find counterexamples makes the hypothesis seem plausible.

This way of thinking is called the availability heuristic. We assume something is likely because we can easily bring to mind examples. The more examples we can think of, the more likely we believe it to be.

Although the availability heuristic is not as rigorous and generalized as conducting a scientific investigation, in essence it’s a similar process. A scientific investigation gathers data using randomized trials, controlled studies, and careful surveys and then analyzes the data to arrive at results. The availability heuristic does the same kind of analysis, but it works only on the data we have in our heads at that moment. There is no data gathering process.

The availability heuristic is a perfectly valid way of thinking about our day-to-day world, about which we have lots of data but don’t have time to gather more. It tends to fail us, however, when thinking about parts of the world with which we are unfamiliar. That’s why we have science.

What are the practical implications of all this philosophy when it comes to thinking about gang experts? Probably not much. But if I ever find myself on a jury listening to this kind of testimony, I hope I’ll keep a few points in mind.

Basically, any assertion of a general rule—all fish have fins, all cats have fur, all people with dog tattoos are gang members—is equivalent to an assertion that counterexamples do not exist: There are no fish that don’t have fins, there are no cats that don’t have fur, there are no people who have dog tattoos who are not in gangs.

(Actually, the gang expert will likely invoke several indicators of gang membership in combination—gang tattoos, gang hats, gang shoe laces, gang jewelry—and the standard in the courtroom is not that there are absolutely no counterexamples, but rather that counterexamples are rare enough that they do not constitute a cause for reasonable doubt. Nevertheless, an assertion of a general rule is still an assertion about the rarity of counterexamples.)

So if you hear someone say that dog tattoos are sign of gang membership, you should be wondering why that person believes there are no (or few) counterexamples. Remember, it’s not just about gangs. It’s also about tattoos. If he’s really an expert on gangs, he may very well have observed that gang members have dog tattoos, but how does he know that non-gang-members don’t have dog tattoos as well? He’d have to know a lot about tattoo prevalence in society at large. In addition to being a gang expert, he’d also have to be a tattoo expert.

Or at least he’d have to have received reliable information from a tattoo expert or be aware of a scientific study of some kind that addressed the issue. If I were on the jury, I’d want to hear about that.

Knowing About Rape

Eugene Volokh catches an apparent problem in an article in Oregon State’s Daily Barometer:

According to a press release issued by the Women’s Center, 2,000 rapes occur every five minutes.

This amounts to the claim that, on average, every woman in the United States is raped once every 9 months, which is absurd. Eugene tracks down the actual press release, which says:

About 2,000 rapes are committed daily at the rate of about one every 5 minutes.

That’s completely different from the newspaper quote. But it’s still not right: One every five minutes would only be 288 per day. That’s a seven-fold discrepancy. I don’t know which of those numbers is correct, but that sentence is definitely wrong.

Eugene goes on to find some other statistics that make more sense. You can read the rest if you want to.

I’m more interested in one of the comments, by someone called dk35, who I think felt that Eugene was minimizing the problem of rape by focusing on the statistics:

Who doesn’t think rape isn’t a serious problem? What I don’t get is why you need statistics at all to convince people that rape is a serious problem.

What I don’t get is how else dk35 expects people to learn that rape is a serious problem. Divine inspiration? By being raped?

Here’s why you need statistics: I have never been raped. I have never even met someone who told me they were raped. There has been no rape in my life. By my direct observation and by second-hand accounts, the incidence of rape is exactly zero.

It is only through indirect evidence such as reliable statistical reports that I can be aware of the depressing frequency of rape.