Kitchen > PrivateWebHome > WebLog > BayesAndTheBellCurves
23 Jan 2006 - 15:45

## Bayes and the bell curve(s)

I'm still cruising Edge's Annual Question, 2006.

I can't possibly form an educated opinion of Bart Kosko's 'dangerous idea.'

And yet, after reading the opening paragraphs, I'm convinced he's right. Bayes strikes again. (Have I mentioned I'm an early adopter?)

I'm going to be needing some Bayesian Rules Of Thumb pretty soon here.

When is it OK to trust your priors, and when is it a really bad idea?

Most bell curves have thick tails

Any challenge to the normal probability bell curve can have far-reaching consequences because a great deal of modern science and engineering rests on this special bell curve. Most of the standard hypothesis tests in statistics rely on the normal bell curve either directly or indirectly. These tests permeate the social and medical sciences and underlie the poll results in the media. Related tests and assumptions underlie the decision algorithms in radar and cell phones that decide whether the incoming energy blip is a 0 or a 1. Management gurus exhort manufacturers to follow the "six sigma" creed of reducing the variance in products to only two or three defective products per million in accord with "sigmas" or standard deviations from the mean of a normal bell curve. Models for trading stock and bond derivatives assume an underlying normal bell-curve structure. Even quantum and signal-processing uncertainty principles or inequalities involve the normal bell curve as the equality condition for minimum uncertainty. Deviating even slightly from the normal bell curve can sometimes produce qualitatively different results.

The proposed dangerous idea stems from two facts about the normal bell curve.

First: The normal bell curve is not the only bell curve. There are at least as many different bell curves as there are real numbers. This simple mathematical fact poses at once a grammatical challenge to the title of Charles Murray's IQ book The Bell Curve. Murray should have used the indefinite article "A" instead of the definite article "The." This is but one of many examples that suggest that most scientists simply equate the entire infinite set of probability bell curves with the normal bell curve of textbooks. Nature need not share the same practice. Human and non-human behavior can be far more diverse than the classical normal bell curve allows.

Second: The normal bell curve is a skinny bell curve. It puts most of its probability mass in the main lobe or bell while the tails quickly taper off exponentially. So "tail events" appear rare simply as an artifact of this bell curve's mathematical structure. This limitation may be fine for approximate descriptions of "normal" behavior near the center of the distribution. But it largely rules out or marginalizes the wide range of phenomena that take place in the tails.

Again most bell curves have thick tails. Rare events are not so rare if the bell curve has thicker tails than the normal bell curve has. Telephone interrupts are more frequent. Lightning flashes are more frequent and more energetic. Stock market fluctuations or crashes are more frequent. How much more frequent they are depends on how thick the tail is — and that is always an empirical question of fact. Neither logic nor assume-the-normal-curve habit can answer the question. Instead scientists need to carry their evidentiary burden a step further and apply one of the many available statistical tests to determine and distinguish the bell-curve thickness.

[ed.: this is where I fall off the cliff] One response to this call for tail-thickness sensitivity is that logic alone can decide the matter because of the so-called central limit theorem of classical probability theory. This important "central" result states that some suitably normalized sums of random terms will converge to a standard normal random variable and thus have a normal bell curve in the limit. So Gauss and a lot of other long-dead mathematicians got it right after all and thus we can continue to assume normal bell curves with impunity.

That argument fails in general for two reasons.

etc.

I should probably use this article as a benchline for Progress in Understanding Statistics, once I actually take a course in statistics.

What courses would I have to take — what would I have to know — to Read The Whole Thing?

on the other hand

Asking a bunch of Big Brains what their 'dangerous idea' is is a dangerous idea, as far as I'm concerned. This exercise reminds me of all the 800-lb. gorillas in Hollywood — movie directors mostly — whose work invariably collapsed the instant they were so powerful they could do what they wanted, instead of answering to the studios giving them the money to do it.

There's a lot of claptrap in this year's WORLD QUESTION CENTER......there's so much claptrap, I'm thinking maybe I should revise my flash-judgment that Wow! Yes! The standard bell curve has a thicker tail than we think! Cool!

The tails over at the WORLD QUESTION CENTER aren't seeming too thick at the moment.

But maybe I'm wrong.

"Competing bell curves"

(website may not always respond)

these don't look like wide tails to me

I'm confused

Bayes statistics & false positives
does human mind use Bayesian reasoning?
Bayesian reasoning, intuition, & the cognitive unconscious
most bell curves have thick tails
ECONOMIST explanation Bayesian statistics
Bayesian certainty scale

Bayesianprobability

-- CatherineJohnson - 23 Jan 2006

Back to main page.

After entering a comment, users can login anonymously as KtmGuest (password: guest) when prompted.
Please consider registering as a regular user.
Look here for syntax help.

I gather that thick tails on bell curves tell us 'sh** happens' a lot more often than we think, right?

-- CatherineJohnson - 23 Jan 2006

also, I guess people hit the jackpot more often, too

-- CatherineJohnson - 23 Jan 2006

or am I completely at sea?

-- CatherineJohnson - 23 Jan 2006

What courses would I have to take — what would I have to know — to Read The Whole Thing?

Just a regular prob and stat course -- you'd need calculus first to be able to get the bit about the central limit theorem.

All 'bell curves' pretty much look alike. The ones with thick tails generally don't look any different to the naked eye.

-- CarolynJohnston - 23 Jan 2006

All 'bell curves' pretty much look alike. The ones with thick tails generally don't look any different to the naked eye.

oh!

Great, thanks.

-- CatherineJohnson - 23 Jan 2006

I seem to always get in on the end of conversations, so maybe no one will read this, but here goes:

If my memory serves, I think you need a bit of analysis (advanced Calculus) to do the proof the of the Central Limit Theorem, however you can do your own demonstration of the CLT (like you might see in an Intro Probability and Statistics course).

Flip a coin (or do the computer equivalent). Record the proportion of heads (1 or 0). Do this a bunch of times and make a histogram of your results. Now do the same thing except flip the coin 10 times and record the proportion of heads. Do this a bunch of times and make a histogram of the results. Do the same process using 100 flips instead of 10. Your histogram of proportions should look much more bell shaped.

As for Laplace, that's a pretty big name in Math/Engineering, so there are quite a few different things named after him.

Given that we live in a finite world, I'm not sure that I am convinced that it makes sense to talk about real world things having infinite variance. However, I'm happy to concede that if the models work better by considering distributions with infinite variance, then it's better to go with what works. I've also always (since I learned about it in graduate school, anyway) been a little unimpressed with the reasoning that things like height are normally distributed because they involve the sum of a bunch of different (independent) factors (and hence the CLT applies, the reasoning goes). It seems much more reasonable to me to use the CLT in results where sampling distributions are involved. (That is, samples are taken from a population, and we're interested in the distribution of the sample mean.)

-- MattGoff - 24 Jan 2006

Of course, I'd expect a sample of heights in the population to be quite complex rather than a true bell.

If I were to take a sample of the heights of (say) 25 y.o. males, I'd expect a normal distribution. If I were to take a sample of 25 y.o. males and females, I'd expect a bi-modal distribution, with separate peaks for median male and median female heights. If I were to choose a sample that isn't age-restricted, I'd expect multiple peaks, some of which would be hard to detect, with the result that the tail on the short side would be lumpy and much thicker than the tail on the tall side.

(I'm not sure how germane this is to the discussion, but you sometimes have to choose your samples pretty carefully to get a normal distribution.)

-- DougSundseth - 24 Jan 2006

from The Myth that Schools Shortchange Women by Judith Kleinfeld:

-- CatherineJohnson - 24 Jan 2006

I'm not sure how germane this is to the discussion, but you sometimes have to choose your samples pretty carefully to get a normal distribution.

interesting....

-- CatherineJohnson - 24 Jan 2006

Catherine -- that diagram exactly illustrates the quote I left the other day about more men at both ends of the curve.

-- GoogleMaster - 24 Jan 2006

Yeah, it illustrates (though greatly exaggerated) the fat tail of male IQ and why the few males at the top have dominated most endeavors throughout history though clearly other factors are also in play.

-- KDeRosa - 24 Jan 2006

Ken

Yeah, it illustrates (though greatly exaggerated) the fat tail of male IQ and why the few males at the top have dominated most endeavors throughout history though clearly other factors are also in play.

It really is a terrific way of hammering home the to-most-folks-highly-abstract point of what 'different variability' is.

I thought Murray's point about women dropping out of the very highest realms of achievement the more abstract the undertaking became intriguing....

The highest-achieving women are in literature, which, he claims, is the most 'concrete' & closest-to-reality art form.

I think it's a little hard to make that claim vis a vis painting (and I've forgotten how he justified it)....but I thought there was more than a grain of truth....

-- CatherineJohnson - 24 Jan 2006

I'm a STRONG believer in the power of 'stereotype threat' and the unconscious, having experienced it myself firsthand, but I have a bias toward believing biological explanations for the 'two curves,' entirely on the basis of the radical over-representation of boys amongst autistic people.

-- CatherineJohnson - 24 Jan 2006

Doug- It is true that your sample may show lumps, but if you take lots of (large enough, see below) samples and calculate the mean, the distribution of the means will not be lumpy, it will approach a Normal distribution (assuming finite variance in the population, anyway) with center at the true population mean and standard deviation equal to the population standard deviation divided by the square root of the sample size. That's what the Central Limit Theorem is saying. The further the original population is from a Normal distribution, the larger the samples need to be in order to be 'close' to a Normal distribution. For example, in the coin flipping demonstration I mentioned previously, the effect is much more pronounced for samples of 1000 flips than samples of 10 flips. It's another question whether the mean of a population is actually a measure that is useful and/or interesting.

-- MattGoff - 24 Jan 2006

If all you're interested in is a population mean, you can get that, but this discussion has been (at least mostly) about specific curve shapes. (You note the likely lack of significance of the population mean yourself.

When you look at the curve shapes, some distributions are bimodal or multi-modal, and increasing the sample size won't smooth them out, except in the sense that the lumps in the curve will only represent real lumps in the population distribution rather than statistical artifacts. There is no guarantee of a peak at or near the mean.

For an obvious example, take a sample of the heights of people in a kindergarten at noon. Regardless of how large the sample, you'll see a strong peak at around 5'6" and another at around 3'6". You will never see a normal distribution, because you are sampling two distinct populations (adults and kindergarteners).

(FWIW, I suspect we're talking past each other, but there may be other interested readers of the discussion.)

-- DougSundseth - 24 Jan 2006

I think you're right, we're not really disagreeing. The issue you raise regarding the Normal distribution (or lack thereof) of a population is what I was referring to in my original comment when I mentioned being unimpressed with the argument that things like height (or intelligence, perhaps) are Normally distributed because they are the sum of a bunch of independent factors. I guess it makes analysis easier to make the assumption that a population is Normal, but it seems to me to be pretty clearly an assumption. I'm not sure why people seek to justify it by trotting out the CLT.

Where I see the CLT most commonly used is in constructing confidence intervals and/or doing significance testing for population parameters (means, regression coefficients, etc.). In the essay by Bart Kosko, I got the impression that he was suggesting the Normal distribution may not be applicable even in these cases (perhaps I am misreading him). That is, maybe we shouldn't trust poll results that use Normal distribution theory to get margins of error. I think the theory is pretty solid for using CLT here, although it depends strongly on correct sampling (which can be a very difficult thing to achieve).

-- MattGoff - 24 Jan 2006

No argument at all.

I suspect that assuming a normal distribution as a first approximation is reasonable for most sorts of population data. But it should be treated as only a first approximation and discarded if the data exhibit a different shape.

I don't understand the theory well enough to pontificate on the effect of a discontinuous data set on measures like standard deviation, but my intuition is that they are not intended for those cases and probably not especially useful. I'm sure a stats pro could address such questions easily.

Is there one in the house?

-- DougSundseth - 24 Jan 2006

You can certainly calculate the variance (the square root of which is standard deviation) for discrete distributions. However, the standard deviation is most meaningful/useful for Normal distributions (standard deviation is not even all that helpful for many other continuous distributions). Where the standard deviation of these non-Normal distributions comes into play is when looking at the distribution of the sample mean.

Suppose you draw samples from a population with mean 'mu' and standard deviation 'sigma' (assumed to be finite). If you repeatedly take samples of size 'n' from the population, the distribution of the mean of these samples will be approximately Normal with mean mu and standard deviation sigma/sqrt(n). The larger the sample, the closer the approximation.

-- MattGoff - 25 Jan 2006

Oh, good; I was right.

8-)

-- DougSundseth - 25 Jan 2006

WebLogForm
Title: Bayes and the bell curve(s)
TopicType: WebLog
SubjectArea: StatisticsTeaching
LogDate: 200601231044