Kitchen > PrivateWebHome > WebLog > FalsePositives
17 Aug 2005 - 15:58

## Bayes Watch

A couple of days ago, Carolyn explained the difference between frequentist statistics and Bayesian. She's a Bayesian, she said.

Well, that explained a lot, because it turns out I'm a Bayesian, too. I just didn't know it. Obviously, that's why Carolyn and I constantly find ourselves traveling the exact same thought path, even though we've never met, and didn't know each other until a year ago.

Of course, a real Bayesian (that would be a Bayesian who knew what she was doing, which would not be me) would probably not conclude that the reason she likes a person well enough to start a vast time-gobbling math-ed web site with her is that you both subscribe to the same school of statistical thought. I'll have to ask Carolyn.

I'm a Bayesian aspirant.

I'm having quite a little midlife run of Self-Discovery here, I must say. First I find out I'm Scots-Irish; next I'm hearing I'm a Bayesian.

I just hope no one's gonna tell me I'm adopted.

### I have a question

My question concerns a passage in a terrific book called What the Numbers Say: A Field Guide to Mastering Our Numerical World by Derrick Niederman & David Boyum. Boyum, it turns out, majored in applied mathematics at Harvard--I didn't know there was such a thing as a major in applied mathematics at Harvard!

Or anywhere else, for that matter.

I wish I'd know that when I was 17.

'Bayes Watch' is Niederman & Boyum's title for this passage:

Years ago a study asked the following question of students and doctors at Harvard Medical School:

If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5 percent, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person's symptoms or signs?

Ed and I both understand the answer now (neither of us got it right), but we still have a question about the precise calculations. (Don't hit this link unless you want to see the answer.)

### update

I've just checked Niederman & Boyum. They do not specify a zero rate for false negatives. They say nothing about false negatives one way or the other. (Neither does John Kay in false positives, part 2, assuming I'm understanding him correctly).

### Bayes & God

I actually bought this book a couple of years ago, though I haven't read it yet:

I believe it's intended to be a Bayesian proof of the existence of God, although I don't know how the word 'proof' is used either in the book or in the context of Bayesian statistics.

low birth weight paradox (& Monty Hall)
Monty Hall, part 2
Monty Hall, part 3
false positives
false positives, part 2
Doug Sundseth on Monty Hall
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident
Bayes & the human mind
Bayesian reasoning, intuition, & the cognitive unconscious
most bell curves have thick tails
ECONOMIST explanation Bayesian statistics
Bayesian certainty scale
probability question from Saxon 8/7

Bayesianprobability

Back to main page.

After entering a comment, users can login anonymously as KtmGuest (password: guest) when prompted.
Please consider registering as a regular user.
Look here for syntax help.

The answer to the original question is actually underspecified. We need to also know the false negative probability.

If we assume that the latter probability is 0 (unlikely, but simplifying), there will be 999*.05 false positives (49.95, not 50, because there are only 999 people without your notional disease in the sample) and 1 true positive. The probability that any given person with a positive test result will be without the disease is 49.95/(49.95+1).

The numbers above assume that all numbers are accurate to an arbitrary precision, however. If we assume single-digit accuracy (the accuracy indicated by the numbers used in the hypothetical), the probability of a false positive is 1, or more usefully the probability of a true positive is 2%. More precision isn't warranted by the available data.

-- DougSundseth - 17 Aug 2005

It does not seem necessary to me to find the probability of the false negative to solve THIS problem. I certainly could be missing something though.

Take a look at this picture:

It shows 1,000 asterisks. The rectangular area shows the number of people who took the test and got a positive result for the disease. And the poor schlub who actually has it is circled.

The noncircled people in the rectangular area are "false positives," thus, there are 50 of them out of 1,000.

The probability that one tests positive for the disease and actually has it is the ratio:

number of people with disease/number of people who tested positive

So a person has approximately 1 chance in 51, or a 2% chance, of actually having the disease if he or she tests positive.

Percent is simply a ratio of a number to 100, so the percent here is simply renaming the probability 1/51--like solving the proportion 1/51 = x/100.

I'll have to do a post about percents, rates, ratios, and fractions--not enough time here.

-- JdFisher - 17 Aug 2005

Doug--sorry: there are no false negatives. That was specified! (I didn't include it.)

-- CatherineJohnson - 17 Aug 2005

oops--have to answer the phone--back later--

-- CatherineJohnson - 17 Aug 2005

Back briefly...I'll have to read Doug's answer more closely (since it's new to me...)

J.D.'s answer is exactly what Ed & I were seeing (although I'm pretty sure it's the case that you have to know the false negative rate, since the book specifies that--and since the FT had an example like this one in the paper yesterday, which I'll post later on. In that example they specified a false negative rate above 0.)

-- CatherineJohnson - 17 Aug 2005

"It does not seem necessary to me to find the probability of the false negative to solve THIS problem."

The problem with 0 false negatives gives a result of approximately 98.0373% of the positive results being false (1.9627% of the positive results being correct). If you have (say) 50% false negatives, the one person who is actually sick has only a 50% chance of being in the population of people with a positive result.

The original question asked, "what is the chance that a person found to have a positive result actually has the disease". The resulting equation ends up as 49.95/(49.95+.5) = 99.0089% of the population with a positive result is not sick, or 0.9911% of the population with a positive result is sick.

This would be about half as effective as assuming a 0 false negative rate, hence its importance in calculating the probability.

In case anyone was wondering about the last paragraph in my original comment, 1 + 0.02 = 1 for sufficiently small values of 1. It's kind of like transfinites, but smaller.

8-)

-- DougSundseth - 17 Aug 2005

-- JdFisher - 17 Aug 2005

Doug--I'm just starting to read through your answer, and this raises a question that was also in my mind, as I was trying to create a 'frequentist' model.

there will be 999*.05 false positives (49.95, not 50, because there are only 999 people without your notional disease in the sample) and 1 true positive

I had wondered about this, too: at first it seemed wrong to me to say that, within 1000 people, there were 50 false positives & 1 real positive, i.e. 51 positives altogether.

But then, thinking it over, it seemed right except that I didn't think you could really do this kind of representation using 1000 people.

You'd need to use 1001 people.

Except then you're throwing the percentages off (ever so slightly, of course)....

So am I correct in concluding that a 'frequentist' representation of this problem is never 'truly' correct?

Each category (false positives & true positives) is compared against 1000, but when you try to combine the two categories, and compare both against 1000, you can't do it.

Or, rather, you can't do it and maintain the whole numbers.

Or am I getting this wrong?

-- CatherineJohnson - 17 Aug 2005

Well, whole numbers (other than 0 and 1, of course) are kind of hard to come by for probabilities. 8-)

I think the issue is that you there is quite a bit of implicit (and entirely justified, btw) rounding going on. If you have one sick person per 1000 and ten false positives per thousand, the probability of a false positive isn't 1%, it's 10/999, or approximately 1.001%. Since getting four significant digits in medicine is probably impossible (two significant digits is pretty unlikely, much less four), this is treated as 1%.

I started discussing the limits of the calculation since you'd indicated an interest in whether 1/50 or 1/51 was more accurate. But actually, that difference is almost certainly down in the (copious) noise.

I think the concept of the limits of accuracy is a very important one that is poorly taught, at least until college-level science classes. As an aside, my best-taught overview of the subject was in an astronomy class, where the available numbers were thought to be accurate only to about +/-30%. It's entertaining to be told (in a serious science class) that pi can be treated as equal to 3 without affecting the accuracy of the calculations. (It's also occasionally useful to know that there are approximately pi*10^7 seconds in a year -- astronomy trivia.)

-- DougSundseth - 17 Aug 2005

8:40--I'm going to come back tomorrow & read through carefully!

You're right, of course, about the limits of accuracy.....and that there's a lot of rounding going on--although I definitely needed to be told that.

I'm used to seeing whole numbers, since I'm both teaching and re-learning elementary math, so that's the operative category for me right now.

I do have at least some 'common sense' notion of the limits of accuracy, but I still wanted to know how these math problems are set up, i.e. what is being compared to what......

Thanks!

(Back tomorrow.)

-- CatherineJohnson - 18 Aug 2005

The car ride didn't help much, but I'm still stuck on this point:

If we have a false positive ratio of 5%, then 50 of the 1,000 people who take the test will test positive but will not, in fact, have the disease.

We need make no assumptions about false negatives to solve the problem as it was written. Even if the false negative ratio was 100%--that is, every time a person tested negative he actually had the disease--it wouldn't matter, because what we are dealing with is the question of the probability of having the disease if one tests positive.

No?

-- JdFisher - 18 Aug 2005

I'll use your example of a false negative rate of 100%. In this case, if you test positive, you are guaranteed to not have the disease (people with the disease never test positive and aren't in the group). Therefore, the ratio of (people with the disease in the group of people who test positive) to (people who don't have the disease in that group) is infinite (the inverse ratio is 0).

The original question was, "what is the chance [probability] that a person found to have a positive result actually has the disease". In the case of a 100% false negative ratio, the probability is 0.

-- DougSundseth - 18 Aug 2005

"In this case, if you test positive, you are guaranteed to not have the disease."

I disagree with this statement. The false negative ratio of 100% tells me only that every person who tests negative HAS the disease.

With no other information than this, I cannot make any assumption about what the POSITIVE test tells me.

Is this not correct? Sorry to be a pain.

-- JdFisher - 18 Aug 2005

5:36 pm, and I am now officially too zapped to follow this, so I will start fresh in the morning!

-- CatherineJohnson - 18 Aug 2005

False positive: Person who gets a positive result in error. That is, a person who does not have the disease, but for whom the test says he does have the disease.

False positive rate: Ratio of (false positives) to (total of people who took the test and didn't have the disease).

False negative: Person who gets a negative result in error. That is, a person who does have the disease, but for whom the test says he does not have the disease.

False negative rate: Ratio of (false negatives) to (total of people who took the test and had the disease).

If you have a false negative rate of 100%, then all the people who take the test and have the disease will test as not having the disease. In that situation, you can only test as positive if you don't have the disease, (i.e., your test must be a false positive).

-- DougSundseth - 19 Aug 2005

Interesting. (I missed this whole thread when it first came out -- I was on vacation!)

I think of this problem a bit differently.

False positive rate: the probability that you will test positive for a disease, given that you don't have it.

False negative rate: the probability that you will test negative for a disease, given that you do have it.

If you use Bayes' theorem to calculate the probability of having the disease given that you tested positive for it, you will need to know both these rates. The formula explicitly calls out the probability that you test positive if you have the disease, which is 1 minus the false negative rate (in this case 1, since the false negative rate is assumed to be zero).

-- CarolynJohnston - 11 Sep 2005

not related to the probability issues --
others have covered that ground very well --
catherine should be aware that, for better or worse,
"applied mathematics" is used in mathematics departments
as code for "partial differential equations".
there are some fairly good historical reasons
for this, but the terminology is badly dated
(to say the least).

-- VlorbikDotCom - 11 Sep 2005

hmmmm

What is a 'partial differential equation'???

-- CatherineJohnson - 12 Sep 2005

the so called "differential calculus" concerns itself
with rates of change (think of the "slope
of a line" for a motivating example if you've studied
enough elementary algebra for the phrase
to actually mean something to you; otherwise
think of, say, the speed of an object
[in "miles per hour", say] -- the rate of change
in its position with respect to time).

the next level of the game -- or anyway, one such--
after one has become comfortable with this concept
is to consider the "rate of change of the rate of change"--
the so-called acceleration (how fast
is the needle of my speedometer moving?).
and so on ... studying rates of change
turns out to've been a pivotal point in history:
the beginning of the "scientific revolution".

and speed and velocity, say)
are called "differential equations" (because a rate
is the quotient of two differences; think of "miles per hour" again ... how far
[from beginning to end; a subtraction problem]
divided by how long [in time; another]).

the "order" of a given ("ordinary")differential equation
is the number of times the operation "find the rate"
has been invoked (hence in the example of
"position, speed, and acceleration", we have
a second-order ODE (ordinary diff-e-q, natch)--
speed is the "first derivative" of position
(its rate of change) and acceleration is the
"second derivative"
of position (the "rate of change in its
rate of change -- (i.e., the first
derivative of the speed).

one of newton's (many) great achievements was to have
demonstrated that by making some very simple
assumptions (the "law of gravity": F =m1*m2*g/d^2)
been found out by centuries of painstaking observation)
that, for example, planetary orbits are ellipses.

finally PDE (partial diff-e-q) is the study
of differential equations in several("independent")
variables (for example, the weather at some
point on the map might depend simultaneously
on the humidity, the temperature, and the barometric pressure).

i myself ran screaming -- majored in "pure" mathematics.
but PDE ("applied mathematics") is undoubtedly
rich and strange. if i could've continued
in grad school forever, id've come around to it eventually.
i hope this is helpful ...

-- VlorbikDotCom - 12 Sep 2005

Vlorbik -- well done. Thanks!

-- CarolynJohnston - 12 Sep 2005

V, I know exactly where you're coming from. As a student I spurned probability and statistics as grubby... now I find it just fascinating and wish I'd specialized in it.

-- CarolynJohnston - 12 Sep 2005