Skip to content.

Kitchen > PrivateWebHome > WebLog > FalsePositivesPart2
17 Aug 2005 - 20:32

more on false positives

Another version of the False Positives challenge. This one ran in John Kay's column in the Financial Times yesterday. (Probably only available to subscribers.)

...intuition does not correspond to the mathematics of probability. One person in a 1,000 suffers from a rare disease. A friend has just tested positive for this illness and the test gives a correct diagnosis in 99 per cent of cases. How likely is it that your friend has the disease? Not at all likely. In random groups of 1,000 people an average of 10 would display false positives and only one would be correctly diagnosed with the disease. But most people, including most doctors, think otherwise. “The human mind,” said science writer Stephen Jay Gould, “did not evolve to deal with probabilities.”

Hmmm. Let's see. This problem does give us false negatives, right???

OK, let me think.

[pause]

Good grief. Not only can the human mind not intuit Bayesian probability; apparently the human mind equally cannot produce consistently lucid prose. (Nothing wrong with Mr. Kay's lucidity on a normal day.)

Kay's example, too, appears to assume a false negative rate of 0.

As far as I can tell.

update

This is funny. I was skimming Amazon reviews of Stephen Jay Gould's Mismeasure of Man, and I found this:

As Oxford academician Richard Dawkins says (see Bryson, "A Short History of Nearly Everything", pp. 330-332) "If only Stephen Gould could think as clearly as he writes!"

It's a Core Principle in the Writing Biz (& definitely in the Writing Instruction Biz) that you can't write clearly without thinking clearly. (True in my experience; that's for sure.)


low birth weight paradox (& Monty Hall)
Monty Hall, part 2
Monty Hall, part 3
false positives
false positives, part 2
Doug Sundseth on Monty Hall
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident
probability question from Saxon 8/7



Back to main page.



Comments

After entering a comment, users can login anonymously as KtmGuest (password: guest) when prompted.
Please consider registering as a regular user.
Look here for syntax help.


You do have a change because of the false negatives, but it's very small when the rate is 1%. (He's also assuming that there are 10 false positives when the rate in the problem as stated is 9.99 false positives/1000 people.)

Using the same equation as in the other post, the number here would be 9.99/(9.99+.99) = 90.98% of the positive results would be false. If you assume a 0 false negative rate, the result would be 90.90%.

In comments to the other post, I used an unrealistically high 50% false positive rate to illustrate why you needed to know the rate, but the false positive rate in low-probability cases like this is the primary driver of the counter-intuitive result.

I've actually seen this analysis done more on things like face-recognition software used to identify possible terrorists. The problem there is even worse, since the false positive rate is higher than 1% (ISTR 5% being advertised) and the probability that any given person is a terrorist is rather a lot lower than 1 in 1000. Since terrorists would be doing their best to increase the false negative rate, the value of such systems is at best problematic.

If we assume 1 terrorist per 100,000 people checked, a 5% false positive rate, and a 50% false negative rate, you'll get approximately 10,000 false positives per real positive, each of which will have to be carefully checked by a human. (All numbers here are WAGs.)

-- DougSundseth - 17 Aug 2005


If we assume 1 terrorist per 100,000 people checked, a 5% false positive rate, and a 50% false negative rate, you'll get approximately 10,000 false positives per real positive, each of which will have to be carefully checked by a human. (All numbers here are WAGs.)

Wow.

-- CatherineJohnson - 17 Aug 2005


Veering off on a tangent…

I used to work with some speaker recognition technology. There are myriad variations, but it’s basically a security measure in which a speaker claims to be a certain person, and that claim is checked by comparing the speaker’s utterance of some password to the person’s “voiceprint.” In evaluating speaker recognition technology, the common figure of merit is Equal Error Rate (EER). That is the point at which you have adjusted your acceptance threshold so that the rate of false positives is equal to the rate of false negatives. The EER can only be determined when you are using a known database, so you know true speakers from imposters.

In practice, no one would ever attempt to operate such a system at the EER operating point. Depending on the consequences of errors, you would try to set a reasonable operating point. For something like protecting a telephone long distance calling card, you would bias your levels to make false negatives very rare. You don’t want paying customers to be denied service. As a consequence, the rate of false positives would rise, but an imposter would still have to work hard to obtain the password, plus fool the system with his voice to get through. It’s too inefficient to try to profit from making imposter calls. So, it’s okay if 10% of imposters with correct passwords succeed if there are effectively zero such imposters.

No one would ever use current speaker verification technology as the only security measure controlling access to, say, a launch site for nuclear weapons. If you did, though, you would be far more worried about the rate of false positives. I think similar logic applies to other biometric measures like thumbprint analysis and retina scanning.

-- DanK - 18 Aug 2005


Question: why do you start out with the EER as a standard?

-- CatherineJohnson - 18 Aug 2005


So are we nowhere near being able to use any kind of biometric measures for security purposes?

-- CatherineJohnson - 18 Aug 2005


I know nothing about the accuracy of thumbprints or any other biometrics. I only know where speaker verification was several years back. My contacts in that area haven't given me any indication that there have been breakthroughs, but I haven't really been asking a lot.

Just because a biometric can't provide adequate accuracy BY ITSELF doesn't mean it has no use. You know that knowledge of information like your mother's maiden name is often used for security. The security of that word would be enhanced if we also performed speaker verification when you uttered it. So the imposter would have to know some word that is unlikely to be known, plus he would have to be able to fool the verifier with it. If your credit card company expects to be spoofed on the mother’s maiden name question, say, 2% of the time, then applying speaker verification might reduce that percentage by an order of magnitude.

One problem with speaker verification is the distribution of the errors it does make. There tends to be a "sheep and goats" phenomenon. That is, if your technology is good enough to get the false negative rate down to 2%. That wouldn't be too bad; people might need to make a second attempt one time out of fifty, right? Well, that's not how it works out. It isn't that every speaker will succeed 98% of the time. The vast majority of the population are "sheep," whose voices work well with the system. They would see error rates below 1%. But sprinkled in will be a few "goats." They make up maybe 3% of the population, but they might see error rates over 50%. What if some of the key people with security clearance are goats? Do you have to set up an alternative identity check for them? If so, how useful is that speech-based system? I don’t know how this issue affects other biometric systems.

Finally, getting back to EER. It is used as an objective point of comparison. If vendor A says, “when we allow for 4% false positives, we see 6% false negatives,” and vendor B says “we allowed 5% false positives, and saw only 3% false negatives,” whose system is better? If they both determine their EER, then we’re comparing apples to apples. The smaller EER is better. It’s just that when you apply the system in a real world setting, you set the thresholds so that it operates as an orange, not an apple.

-- DanK - 19 Aug 2005


Re biometric devices:

<ocd>Do you really want to have to put your thumb on a sensor that hundreds of other people have put their icky thumbs on?</ocd>

-- KtmGuest - 21 Dec 2005


Do you really want to have to put your thumb on a sensor that hundreds of other people have put their icky thumbs on?

I love it!

Christopher's social studies teacher, who is a STICKLER FOR NEATNESS, sprays all the kids' desks with Fantastik and wipes them all down after each class.

Poor thing.

-- CatherineJohnson - 21 Dec 2005