Skip to content.

Kitchen > PrivateWebHome > SubjectArea > StatisticsAbuse

select another subject area

Entries from StatisticsAbuse



LyingWithStatisticsInCalifornia 07 Jul 2005 - 11:04 CarolynJohnston


I had a letter from Cathy Carlson the other day. Cathy is a founder of a group called "Accuracy in School Accountability" in Thousand Oaks, California, and I expect that she is an expert on the use of statistics as a weapon in marketing. She writes:

I see you started with quotes. I have a favorite from Samuel Clements: There are lies, there are damn lies, and then there are statistics! I see that Catherine Johnson has a similar line in her July 2 info. Does she know if it came from a book character of Mark Twain's or if it was in a speech by Samuel Clements? I've never known the context. I've used it frequently in my own speeches about our local School Board in Thousand Oaks, California regarding their exaggerated claims of greatness.

The Conejo Valley Unified School District spent quite a bit of money distributing 26 pages in the newspaper about how "great" the 29 schools were doing. They bragged that the 3 high schools had 30% of the students at the California level of Advanced or Proficient. The public didn't understand the inverse. That performance was pathetic for our "excellent" district. That meant that 70% of those teenagers were in the 3 lower groups of the 5 levels: Advanced, Proficient, Average, Basic, Far Below Basic. 7 out of 10 high school students here were NOT even Proficient.

The District fools the public by this omission. One of your writers today also had some cogent remarks on statistics that are omitted.

Another interesting statistic here is that a couple of years ago a third of the CVUSD schools failed to make the minimum target of 800 points, which is only 75% of the API (Academic Performance Index.) The API starts at 200 and goes to 1000, so there are 800 points available, not 1000. Every 80 points translates to 10%. This further confuses the public. Many do not understand when I explain that the true top 10% is really 920 points. It is the empirical 10% that is important, not the artificial 10th decile. In our state the kids' scores are so bad that in the first few years of the API there were high schools that scored only 726 but were ranked in the "top ten". Yeah, decile, not empirical. Every year ONE OUT OF EVERY TEN California schools gets to brag that they are a "10", often with scores more than 150 points below 920, the true cut off for 90%, because the cutoffs for the deciles continue to be down in the basement.

This really isn't a math wars issue, precisely; it's just good marketing in the face of bad statistics. It's amazing that while mathematics, including statistics, is a discipline with very clean edges that would not appear to admit much potential for fudging, nevertheless it's so easy to mislead people using statistical language.

Not to lie, though; because it's definitely true that, every year, one out of every ten high schools is in the top ten percent of high schools. But what if 90% of high schools are failing miserably? That remaining 10% could lie anywhere in the range from excellence down to barely-crawling-along. So the fact that a school is in the top ten percent tells you very little.

In an academic world that is benchmarked with standardized tests such as the California API (and the Colorado CSAP), the ability to Lie with Statistics is more valuable than ever. That doesn't mean that standardized tests should go away -- quite the contrary. It just means that we'll continue to need watchdog groups like Cathy's to keep pointing out the real meaning behind the marketing.



FallaciesAndCausalInference 08 Aug 2005 - 05:20 CarolynJohnston


I went looking up some stuff about the correlation-implies-causation fallacy after reading Catherine's post on causation implications in a multivariate study of college success and high school curricula.

The correlation-implies-causation fallacy is the idea that whenever two variables are correlated -- i.e., when they change together, and so appear to be related -- one must necessarily be the cause of the other. Famous examples:

High stock prices and short skirt lengths tend to occur together. Do short skirts cause high stock prices? No -- another explanation is that they are both likely to be symptoms of an exuberant public mood.

High chocolate consumption and acne tend to go together. This is easily explained by the fact that teenagers are both big chocolate consumers and the biggest acne sufferers; but the idea that chocolate causes acne persists (and it helped ruin my own adolescence, since I couldn't with a clear conscience solace my pain over my zits with a nice bar of chocolate).

But in the example Catherine gives, it's harder to tease apart correlation and causation, because of the timing of the data being studied; the one variable, whether the student had a college-prep curriculum, occurs several years before the other one, the student's graduating from college. Any correlation in data like that strongly suggests causation, but not reliably.

Here's another example, more loaded: birth defects are correlated with mothers who drink alcohol during pregnancy. Of course, the correlation by itself doesn't prove causation; but we certainly would like to know NOW whether drinking is the cause of the birth defects. How can we tell whether a correlation relationship is actually a causal one?

Not surprisingly, there are generally accepted methods for making causal inferences (I should mention that I am now officially way out of my realm of expertise, but here goes anyway). Here are some conditions under which you're permitted to make causal inferences, culled out of some medical literature I found online (obviously teasing out correlation and causation is critical in medicine).

1. A correlation is present.

2. The relationship is statistically significant, i.e. very unlikely to be due to chance.

3. The presence of one factor predates the other (e.g., the drinking happened before the birth defects; the college prep courses came before the college graduation).

4. Evidence from other experiments or statistical studies proves that it is unlikely that the relationship is due to a third variable.

It's item 4 that I have my doubts about, with respect to the high school courses and college graduation relationship. How did they eliminate all those 3rd factors that might be determining success in both college and high school?


statistics question
low birth weight paradox
how good are our best students?





LowBirthWeightParadox 04 Aug 2005 - 23:52 CarolynJohnston


Statistics is a really tricky tool, easy to lie with and easy to misunderstand: but how it sharpens your thinking. When I was looking around online, I came across the following mind-bender on Wikipedia, called the low birth-weight paradox.

Babies weighing less than 2500 gms at birth are said to have a low birth weight (remember that figure). Low birth weight babies in a given population have a higher mortality rate than normal babies.

Smoking mothers are more likely to have low birth weight babies, and children of smoking mothers are more likely to die at birth. No surprise there either.

However, low birth weight babies of smoking mothers have a lower mortality rate than low birth weight babies of non-smoking mothers. How can this be?

The reason is the arbitrary choice of cut-off (2500 gms) for the definition of low birth-weight babies. Smoking causes the overall distribution of birth weights of babies to decrease, pushing more otherwise healthy babies into the low birth weight category. If we move the cut-off downward, to agree with the average decrease in birth weight for babies of smoking mothers, we find that (as expected) the death rate of babies below the new cutoff for smoking mothers is higher than the same rate for non-smokers.

And, speaking of apparent statistical paradoxes, does anybody remember the big flap about Marilyn Vos Savant and the Monte Hall game? Does anybody want to? I love that one.


low birth weight paradox (& Monty Hall)
Monty Hall, part 2
Monty Hall, part 3
false positives
false positives, part 2
Doug Sundseth on Monty Hall
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident
probability question from Saxon 8/7





FalsePositives 12 Sep 2005 - 03:15 CatherineJohnson


A couple of days ago, Carolyn explained the difference between frequentist statistics and Bayesian. She's a Bayesian, she said.

Well, that explained a lot, because it turns out I'm a Bayesian, too. I just didn't know it. Obviously, that's why Carolyn and I constantly find ourselves traveling the exact same thought path, even though we've never met, and didn't know each other until a year ago.

Of course, a real Bayesian (that would be a Bayesian who knew what she was doing, which would not be me) would probably not conclude that the reason she likes a person well enough to start a vast time-gobbling math-ed web site with her is that you both subscribe to the same school of statistical thought. I'll have to ask Carolyn.

I'm a Bayesian aspirant.

I'm having quite a little midlife run of Self-Discovery here, I must say. First I find out I'm Scots-Irish; next I'm hearing I'm a Bayesian.

I just hope no one's gonna tell me I'm adopted.

I have a question

My question concerns a passage in a terrific book called What the Numbers Say: A Field Guide to Mastering Our Numerical World by Derrick Niederman & David Boyum. Boyum, it turns out, majored in applied mathematics at Harvard--I didn't know there was such a thing as a major in applied mathematics at Harvard!

Or anywhere else, for that matter.

I wish I'd know that when I was 17.

'Bayes Watch' is Niederman & Boyum's title for this passage:

Years ago a study asked the following question of students and doctors at Harvard Medical School:

If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5 percent, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person's symptoms or signs?

Ed and I both understand the answer now (neither of us got it right), but we still have a question about the precise calculations. (Don't hit this link unless you want to see the answer.)

update

I've just checked Niederman & Boyum. They do not specify a zero rate for false negatives. They say nothing about false negatives one way or the other. (Neither does John Kay in false positives, part 2, assuming I'm understanding him correctly).

Bayes & God

I actually bought this book a couple of years ago, though I haven't read it yet:

probabilityGodgif.gif


I believe it's intended to be a Bayesian proof of the existence of God, although I don't know how the word 'proof' is used either in the book or in the context of Bayesian statistics.


low birth weight paradox (& Monty Hall)
Monty Hall, part 2
Monty Hall, part 3
false positives
false positives, part 2
Doug Sundseth on Monty Hall
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident
Bayes & the human mind
Bayesian reasoning, intuition, & the cognitive unconscious
most bell curves have thick tails
ECONOMIST explanation Bayesian statistics
Bayesian certainty scale
probability question from Saxon 8/7

Bayesianprobability






FalsePositivesPart2 21 Dec 2005 - 15:31 CatherineJohnson


Another version of the False Positives challenge. This one ran in John Kay's column in the Financial Times yesterday. (Probably only available to subscribers.)

...intuition does not correspond to the mathematics of probability. One person in a 1,000 suffers from a rare disease. A friend has just tested positive for this illness and the test gives a correct diagnosis in 99 per cent of cases. How likely is it that your friend has the disease? Not at all likely. In random groups of 1,000 people an average of 10 would display false positives and only one would be correctly diagnosed with the disease. But most people, including most doctors, think otherwise. “The human mind,” said science writer Stephen Jay Gould, “did not evolve to deal with probabilities.”

Hmmm. Let's see. This problem does give us false negatives, right???

OK, let me think.

[pause]

Good grief. Not only can the human mind not intuit Bayesian probability; apparently the human mind equally cannot produce consistently lucid prose. (Nothing wrong with Mr. Kay's lucidity on a normal day.)

Kay's example, too, appears to assume a false negative rate of 0.

As far as I can tell.

update

This is funny. I was skimming Amazon reviews of Stephen Jay Gould's Mismeasure of Man, and I found this:

As Oxford academician Richard Dawkins says (see Bryson, "A Short History of Nearly Everything", pp. 330-332) "If only Stephen Gould could think as clearly as he writes!"

It's a Core Principle in the Writing Biz (& definitely in the Writing Instruction Biz) that you can't write clearly without thinking clearly. (True in my experience; that's for sure.)


low birth weight paradox (& Monty Hall)
Monty Hall, part 2
Monty Hall, part 3
false positives
false positives, part 2
Doug Sundseth on Monty Hall
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident
probability question from Saxon 8/7





MontyHallPart2 17 Aug 2005 - 21:49 CatherineJohnson


Here is Kay on the Monty Hall problem:

The Monty Hall problem is named after the host of a 1970s quiz show, Let’s Make a Deal. The successful contestant chooses from three closed boxes. One contains the keys to a car and the other two a picture of a goat. The choice made, Monty opens one of the other doors to reveal – a goat. He taunts the guest to change the decision. Should the guest switch to the other closed box?

When the solution was published in an American magazine, thousands of readers – including professors of statistics – alleged an error. Paul Erdös, the great mathematician, reputedly died still musing on the Monty Hall problem. But the answer is, indeed, yes: you should change.

I'm happy to hear that Paul Erdos stumbled over Monty Hall, seeing as how I still don't understand it.


low birth weight paradox (& Monty Hall)
Monty Hall, part 2
false positives
false positives, part 2
Doug Sundseth on Monty Hall
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident
probability question from Saxon 8/7





StatisticsAndMiscarriageOfJustice 18 Aug 2005 - 01:13 CatherineJohnson


Here is the heart of John Kay's column about the counterintuitive nature of Bayesian statistics:

Last month, the General Medical Council struck off Professor Sir Roy Meadow, the paediatrician, from the medical register. He had given misleading evidence in the criminal prosecution of Sally Clarke, whose two infants died in their cots. When Mrs Clarke was charged with their murder, Sir Roy told the jury that the chances of two successive cot deaths in the one family was “one in 73m”.

But although the disciplinary committee heard evidence from distinguished statisticians, it does not appear that they understood the application of probability theory to such cases any better than Sir Roy. The committee found that he had underestimated the incidence of cot deaths, and that he had not taken account of genetic and environmental factors that mean a household that experiences one cot death is more likely than average to suffer another. But even if you recognise these effects, his key conclusion remains valid. It is unlikely that such an accident would have happened at all. It is very unlikely indeed that such an accident could have happened twice in the same family.

Of course it is unlikely. The events that give rise to criminal cases are always unlikely, otherwise the courts would be unable to deal with the backlog. If Osama bin Laden is ever brought to justice, the question will not be “is it likely that two aircraft hit the World Trade Center on September 11?” – to which the answer is no – but “given that two aircraft did hit the World Trade Center on September 11, is it likely that bin Laden was responsible?” Confusion of these two separate issues has become known as “the prosecutor’s fallacy”.

A cot death in a family increases the probability that there will be another, but a murder in a family may well increase the probability of another murder by even more: wicked parents may continue to be wicked. Sir Roy might have been right to conclude that two cot deaths were more suspicious than one. But the Court of Appeal, releasing Mrs Clarke, was certainly right to have concluded that this statistical evidence could never, on its own, establish guilt beyond reasonable doubt.

You should not trust doctors, or lawyers, with probabilities; and be very hesitant about trusting yourself. Adversarial legal proceedings are a bad forum for unravelling technical issues. And we cannot expunge collective responsibility for mistakes by excoriating selected individuals.

The business and financial system, more than Bernie Ebbers and Henry Blodget, was to blame for the dotcom boom and bust. Failures in legal processes, rather than over-confident professors, led to the unjust conviction of women such as Sally Clarke.



I'm going to add this story to my collection of Cautionary Tales illustrating Why People Should Learn Math.

Until today it hadn't occurred to me people should learn math so they don't get their license to practice medicine yanked when they bumble a statistics and probability question in court.

Let that be a lesson to us.


low birth weight paradox (& Monty Hall)
Monty Hall, part 2
false positives
false positives, part 2
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident





DougSundsethOnMontyHall 18 Aug 2005 - 19:52 CatherineJohnson


Doug Sundseth (welcome, Doug!) posted an explanation of the Monty Hall problem, which I have never been able to understand:

It took me a long time to understand it, too.

The model that finally worked for me was something like this:

You have a 1/3 chance of being right to start with, and a 2/3 chance of being wrong. If you guessed wrong originally, Monte's pick will unambiguously determine the correct choice (he never picks the good door).

There are nine pairs of (your pick):(correct pick), A:A, A:B, A:C, B:A, B:B, B:C, C:A, C:B, and C:C. In three of those, you picked correctly, Monte's information isn't useful, and you shouldn't switch. In the other six, you picked incorrectly and Monte told you which of the other picks was correct; thus you should switch.

If you never switch, you have three chances in nine of being correct. If you always switch, you have six chances in nine of being correct and three chances in nine of switching off the correct choice.

Note that the latter possibility (choosing correctly at random then switching to an incorrect choice) may be more psychologically painful than just guessing wrong and not switching. This may have an undue effect on the choices of contestants.

Doug, thank you!

OK, I've just sat down and quickly thought this through.

[pause]

On my initial reading, I think it makes sense to me. What's particularly useful, for me, is the information that, yes, you could already have chosen the correct door, in which case, if you change your choice, you have moved to the incorrect door.

I think people who haven't studied probability get hung up on the 'what if I'm already right' issue.....and then, when math-savvy people try to explain Monty Hall without addressing, as Doug has, the issue foremost in their minds, the explanation doesn't 'take.'

metacognition again

I mentioned awhile back that metacognition is a huge issue amongst constructivists, both of the radical & the peer-reviewed , department of psychology cognitive science constructivists.

One of the main reasons for thinking about metacognition as you teach is that students may very well bring quite wrong ideas to the classroom, which they then 'build upon' as they acquire new knowledge. There's a lovely example of this in the National Research Council's book on learning. Many children, when told that the earth is round, picture it as a disk, not a sphere. (more t/k--I need to go take a look at these pages.)

In any case, Doug has addressed an aspect of metacognition that I haven't seen mentioned, which is to tell a student what it is they already know that's right, but incomplete.

I was having the same experience yesterday, puzzling through the 'false positives' problem. The objection both Ed and I were bumping our heads against--if it's 1 in 1000 and 50 in 1000, how can you ever have 1000???--was right; we just weren't seeing what to do about it.

I wonder how often it's the case that an incomplete right answer is the problem, as opposed to a Total Crackpot Misconception that has to be stomped out, obliterated, and disappeared without a trace before a person can learn Thing One about math? (And does this wording give you a feel for the challenge involved in attempting to re-learn elementary math in midlife?)

Or, as Steve H says, A little knowledge goes way too far.

a new question

This sentence confuses me:

You have a 1/3 chance of being right to start with, and a 2/3 chance of being wrong. If you guessed wrong originally, Monte's pick will unambiguously determine the correct choice (he never picks the good door).

[pause]

hmmm. Interesting. Reading this again, it makes sense.

I'm going to take a paper and pencil break, and see what I come up with working through Doug's explanation myself.

online Monty Hall simulation!

I love it!

Monty Hall dilemma


back again

OK, paper and pencil session complete.

I do understand this explanation, with one question: the funky, counterintuitive odds are created by the fact that Monty always opens the wrong door, correct?

That's why you shouldn't go with the 50-50 answer everyone automatically does go with--yes?

Carolyn was telling me the other day that a lot of Bayesian statistical results are counterintuitve (hey! just like the Bayesian proof of the existence of God!).

That's for sure.

other explanations

Here's a strictly mathematical explanation that will work for some people (and actually works OK for me....although frankly Doug's list helps move me a bit towards 'getting' the Monty Hall problem at a more intuitive level...):

After you pick but before you open any doors, there's a 1/3 chance that you've picked correctly, and a 2/3 chance that you've picked wrong. Assuming that the host can open doors, but can not move prizes, nothing that the host does will change the probabilities described above.

Now the host opens one of the doors, and there's nothing behind it. There's still a 1/3 chance that you've picked correctly, and a 2/3 chance that you've picked wrong. This means that the remaining door has a 2/3 chance of being correct.

This explanation helps me formulate exactly what it is that goes wrong for people: the chance to change your pick seems like a second event, with a second set of probabilities attached.

question: So how often does this happen in life?

How often do we perceive second events where we ought to perceive a continuation of the first?

update: an intuitive approach to Monty Hall that might work

I'm going to have to live with the Monty Hall problem for awhile....

But here's an interesting approach to rendering the answer intuitively correct:

It was a while ago that I accepted the idea that switching doors was the correct play every time because it improves your chances of winning, but I had trouble convincing my friends that it was the correct answer. However, a friend of mine just came up with this explanation that I think should really make it obvious.

Let's say that you choose your door (out of 3, of course). Then, without showing what's behind any of the doors, Monty says you can stick with your first choice or you can have both of the two other doors. I think most everyone would then take the two doors collectively.

Unfortunately, I don't think this works for me...

update: Keith Devlin's better version

OK, I think what the person above was trying to say was this:

...one last attempt at an explanation. Back to the three door version now. When Monty has opened one of the three doors and shown you there is no prize behind, and then offers you the opportunity to switch, he is in effect offering you a TWO-FOR-ONE switch. You originally picked door A. He is now saying "Would you like to swap door A for TWO doors, B and C ... Oh, and by the way, before you make this two-for-one swap I'll open one of those two doors for you (one without a prize behind it)."

I agree. Anyone told at the outset that he can pick one door or he can pick two doors would pick the two.

I give up

from Keith Devlin:

... suppose you are playing a seven door version of the game. You choose three doors. Monty now opens three of the remaining doors to show you that there is no prize behind it. He then says, "Would you like to stick with the three doors you have chosen, or would you prefer to swap them for the one other door I have not opened?" What do you do? Do you stick with your three doors or do you make the 3 for 1 swap he is offering?

OK, I'm switching doors.

But I'm doing so purely on the basis of 4/7 being greater than 3/7. Nothing common sense about it.

Of course, given that my family motto is no common sense-y, it's easy to dump my first pick and jump to Door Number Seven!


low birth weight paradox (& Monty Hall)
Monty Hall, part 2
Monty Hall, part 3
false positives
false positives, part 2
Doug Sundseth on Monty Hall
John Kay: We are likely to get probability wrong (subscription only)
Monty Hall diagram from Curious Incident
probability question from Saxon 8/7





DanKOnIncomeDistribution 25 Aug 2005 - 16:32 CatherineJohnson


Just noticed a terrific commen from Dan K

I don’t trust statistics about household income. In tough times (depression), there are fewer households. The head of household has not only his/her spouse and children living at home, but Granny can’t afford that retirement condo, either. Even after they begin earning, the kids may stay at home. So, this one household has the income of mom, dad, kids, and Granny’s Social Security check. Income per household looks pretty solid, even though the reality is that times are tough. If the economy is strong, there are more households. Granny lives in that retirement condo; the oldest son and daughter have their own apartments. Sure, their individual incomes are higher, because the economy is good, but with four separate households, the average income per household is actually lower. This is why challenger candidates for public office will raise stagnant household income as an argument against the incumbent. His/her policies are so bad that household income has remained stagnant for 30 years!

So, how might this affect the statistics on college attendance that you cite? Well, if the second and third quartiles are made up disproportionately of single person households, empty nesters, or families with only young children, why would we expect them to have students in college?

Further, I would guess that the top quartile has more two-income households, where the earners are far enough along in their careers to near the maximum annual incomes of their lifetimes. This would probably occur when the breadwinners are in their fifties. At what age do most people send their children to college? When they are in their fifties.

Thomas Sowell always cautions us to not think of income quintiles or quartiles as bins in which people spend their whole lives. Many twenty-somethings in the bottom half will eventually rise to the top quartile later in their careers. Now, I understand that upward mobility may really be diminishing because of the meritocracy effect, but I don’t think it’s been eliminated by any means.

Let’s do a thought experiment. We will use made up numbers. Let’s assume that the breakpoints between the income quartiles are at $20,000/yr, $40,000/yr, and 80,000/yr. Now assume one of those households at the bottom is a law student scraping by through school on $15,000/yr, and after graduation he accepts an offer to start at $60,000/yr. What does this do to our distribution? He moves from the fourth quartile to the second. The aggregate income of the bottom quartile loses his $15,000/yr, but replaces it with $20,001/yr from the lowest earner in the third quartile. Net effect of fourth quartile: +5,001. The third quartile loses that person making 20,001/yr, but gains someone dropping in from the second quartile earning $40,001/yr. Net effect of third quartile: +$20,000. The second quartile drops that person making $40,001/yr, but gains our lawyer making $60,000/yr. Net effect on second quartile: +$19,999. In this example, the top quartile is unaffected. So, what we have observed is what looks like very desirable upward mobility, but the statistics show the second and third quartile gaining four times as much (or roughly $15,000 more) than the fourth quartile does. Is this really bad?

It’s even more extreme when people from low-income grad student-like households move all the way into the top quartile over the course of a few years. Because anytime one migrates into that top quartile, there’s no limit on how much that person can earn to boost the income of the top quartile. If some poor soul making $12,000/yr won the lottery, it wouldn’t appear as much increased income for the bottom quartile; the winner would move into the top quartile, where he would contribute a healthy gain to the average. So, a poor guy gets rich, but the statistics say the rich got richer.

I’m not saying that there’s no trend. I am saying, though, that:

  • Household income is a slippery statistic.
  • The age of the heads of households is probably better suited at higher income levels for having kids in college.
  • Quoting statistics by quartile or quintile that assume static sets of people in each group is misleading.
  • If you want to measure income mobility; measure it. Don’t try to infer it from college household statistics.

This makes me feel better.

I had been vaguely thinking, yesterday, 'But do we know that the middle 50% of households also have 50% of the kids?'

That's as far as I got with the question, and since I didn't know the answer I felt it was OK to assume even distribution of children across income quartiles, which, after all, are quite large.... (that assumption may be wrong, but I felt I could make it).

But I was asking myself the wrong question.

We aren't talking about 'number of kids.' We're talking about 'number of college kids.' Dan is absolutely right; people who have kids in college are also, generally speaking, at the top of their earning years & thus more likely to be in the top quartile of the population.

Thanks, Dan!



JapanPopulation 26 Aug 2005 - 13:45 CatherineJohnson


The WALL STREET JOURNAL comes up with all kinds of fun stuff in August:

China's age crisis is shared across Asia, particularly in Japan. Its population of 127.7 million is expected to fall to little more than 100 million by 2050, the study says, barring a rise in fertility rates or an influx of immigrants. Indeed, if fertility and mortality rates from 2001 continue, researchers say, Japan's population would drop to one person by the year 3300.



source: U.S. Birth Rates Remain High





SorryWrongNumber 15 Nov 2005 - 02:04 CatherineJohnson



via Science News, I've just discovered Number Watch:

The "Number Watch" Web site focuses on "misleading" numbers that appear in the media and are often used to promote specific causes, as compiled by retired engineering professor John Brignell of the University of Southampton. Brignell also offers online resources on statistics and statistics education. The FAQ section includes answers to such questions as "Is there such a thing as average global temperature?" and "What is the Normal Distribution and what is so normal about it?"


The FAQ page includes this question:

Is it true that there was a scare in the 1970s about global cooling?

Turns out the answer is yes.


cooling.gif



I'm starting to get a picture of the Engineering-type Personality.....


update: the Number Watch FAQ page

Questions:

What has the weakest link to do with fallacies in medical statistics?

Is it true that there was a scare in the 1970s about global cooling?

What is so important and dangerous about feedback?

Is there such a thing as average global temperature?

So, what is so wrong about the BMA report on smoking and fertility?

Why do you contend that the "grey goo" theory is untenable?

What is the background to the increasing power cuts?

What are the complications of averages?

What are the implications of data smoothing?

What is the Uncertainty Principle and how is it relevant?

What is the second law of thermodynamics and why so important?

What is a data dredge and why so condemnatory?

What is meant by RR?

What is statistical significance and what has P to do with it?

What is a Trojan Number?

What is the Poisson distribution and why is it so important?

What is the Normal Distribution and what is so normal about it?

The Binomial Distribution seems fundamental; where does it come from?

But what exactly do you mean by a distribution?

What is so dangerous about computer models?

What exactly is the Greenhouse Effect?

What is the extreme value fallacy?

Is it possible to quantify the effects of publication bias?

What are the dangers of fitting linear trends to data?


♦ ♦ ♦ ♦ ♦


I can't wait to dig into this stuff.




NumberWatchPart2 12 Nov 2005 - 21:42 CatherineJohnson




Unjustified statistics are like smiling cats - not to be trusted.


6.4.jpg



source: Number Watch





DataWarehousing 07 Oct 2006 - 22:10 CatherineJohnson



Our school district is now using 'data warehousing.'

The couple who came to dinner Friday night — both employed in math-related fields — were highly unenthusiastic about this development.

My neighbor, the statistician, had the same reaction when she read about it.

The Friday-night-couple said data warehousing is the same thing as data mining.....which I think I favor.

Is that wrong?

I'm certain they're right, though, that data mining will allow the district to flummox parents with whatever statistics they decide to pull out.

Although.....so far district efforts to flummmox parents, namely me, have been unimpressive to say the least. These efforts consist of the Assistant Superintendent sending me one letter and one email telling me 'scores have gone up' since we purchased TRAILBLAZERS.

I pointed out that scores went up all over the state and that, furthermore, 'scores went up' is raw data, and we left it at that.

Color me Not flummoxed.

Then they shut down my Singapore Math course.



not flummoxed now & don't plan to be in the future

What do I need to start learning in order to not get flummoxed down the line?

Apart from real knowledge, comprehension, & procedural skills, I could use some lingo, just so I sound like I know what I'm talking about.

If the District is going to blow smoke-with-data, I need to be able to blow my own smoke, which I can do just through language. (Have I mentioned how ruthless I am lately?)



whose data is it, anyway?

What I fear — because we've hit this brick wall many, many times in special ed — is that parents won't get to see data because parents seeing data will represent an invasion of other parents' privacy.

Maybe things won't go that way, but seeing as how they've always gone that way for us in the past, and seeing as how Bush & c. had to pass a huge, major, revolutionary law just to get schools to disaggregate and publish their data some place where parents could find it, tells my Bayesian mind to count on it.

So maybe I should be familiarizing myself with the FOIA, right?



Wal-Mart has a warehouse for data, too

No idea whether this book would be useful or not.


6556819.gif



-- CatherineJohnson - 16 Jan 2006



BayesAndMedicalBreakthroughs 16 Mar 2006 - 20:09 CatherineJohnson



Ed spotted this op-ed in the Financial Times today:%BR5

A dose of realism exposes the heart of the matter ($)
By Robert Matthews

A study of patients with heart disease has found that high doses of a cholesterol-lowering drug known as a statin can break down the potentially fatty deposits lining the arteries. Over three-quarters of the patients showed some improvement, with the most severe cases showing the biggest reductions.

Announced at a meeting of the American College of Cardiology this week, the results have been hailed as a big breakthrough in the fight against this killer disease.

[snip]

AstraZeneca’s share price duly rose a couple of per cent, and may benefit again when the results of the trial are published in a leading medical journal next month.

[snip]

But there is ... tell-tale sign of DSS – damp squib syndrome – and one routinely found in supposed breakthroughs in many other fields. Ironically, it centres on the level of surprise that leads to such findings making the media spotlight in the first place.

In the case of Crestor, even the researchers themselves admit to being stunned by the results of the trial. While statins were already known to be capable of slowing the build-up of arterial deposits, few expected them to produce a reduction. The increase in “healthy” cholesterol levels was also a surprise. The research team leader summed up the results as “shockingly positive”.

In other words, the results fly in the face of previous experience with these drugs. Considering there is no shortage of that, and that the new results come from a small trial, the smart money is on this “breakthrough” falling victim to DSS.

That may sound glib, but it has its basis in sophisticated techniques for making sense of new findings. Known collectively as Bayesian methods, they allow new findings to be assessed in the light of extant knowledge – with often salutary consequences.



The op-ed's concluding paragraphs are terrifically well-put:


Take the case of anistreplase, a clot-busting drug hailed in the early 1990s as a breakthrough in the treatment of heart attacks. A small trial conducted in Scotland suggested that early administration of the drug could cut death-rates by an astonishing 50 per cent. This again flew in the face of experience, which had suggested a much more modest level of benefit. Using Bayesian methods to combine that extant knowledge with the trial results, statisticians predicted the real improvement would be about 20 per cent. This has now been confirmed by much larger studies.

Despite their obvious value in making sense of new claims, Bayesian methods are still regarded by some as esoteric. The blame for this must lie with statisticians, who have done a dismal job of making these powerful techniques accessible to a much wider audience – including the business community. After all, even the least mathematical can appreciate the central message: extraordinary claims demand extraordinary evidence.



We can thank the Reverend Bayes for giving us a statistical method to demonstrate that if something seems to good to be true, that's because it is.





Bayesian statistics & false positives
Bayes & the human mind
Bayesian reasoning, intuition, & the cognitive unconscious
most bell curves have thick tails
ECONOMIST explanation Bayesian statistics
Bayesian certainty scale
Bayes and medical breakthroughs



-- CatherineJohnson - 15 Mar 2006



SchoolsAndGroceryStores 02 Jun 2006 - 14:15 CatherineJohnson



I've been collecting essays I can have Christopher read and summarize.

Edspresso links to one of the clearest columns on economics I've ever seen:

Government K-12 schools, as now run everywhere in the U.S., will never excel at educating students. The reason is that each school gets its students and its budget without having to compete for them. Imagine if, say, supermarkets were run the same way we run schools. Everyone in my county would pay taxes to fund the county supermarket system; each one of us would then be assigned one specific county supermarket at which we are allowed to shop.

Of course, once in our assigned store, all the groceries that each of us gets are "free" -- meaning, we don't have to pay for them on the spot. If the products and services supplied by the supermarket are of poor quality, we're not allowed to switch to other county markets; we must, instead, complain to politicians.

The managers of the supermarkets will agree that their stores offer abysmal service and undesirable products; they will assert that this sad fact is caused by underfunding. We will be warned that only by paying higher taxes will we have any possibility of getting better supermarkets.

So our taxes will rise and funding for supermarkets will increase. But quality will remain poor -- and the excuses offered by the government-employed managers of the supermarkets will remain that they need yet more funding.




Author Donald J. Boudreaux, chair of George Mason's economics department, also has a nice passage on productivity in France and America:

Average worker productivity will be higher in those economies cursed by heavy government intervention into the labor market. Although at first this prediction might sound counterintuitive, it makes perfect sense. When government artificially raises the cost of hiring workers -- by mandating high minimum wages or by increasing the amount of red tape firms must endure in order to fire workers -- the workers that remain unhired are those who are least productive. Think about it: If the French minimum wage is the equivalent of $10 per hour, then French workers who can produce no more than $9 per hour of revenue for employers will not be hired, while in the U.S. such workers will be hired.

By pricing the lowest-skilled workers out of the labor market, European regulations ensure that only relatively high-skilled workers get jobs. So measures of average worker productivity will tend to show that workers in restrictive countries such as France and Germany are more productive than are workers in America. But this statistical outcome is a deceptive artifact of lamentable labor-market regulations whose burdens fall disproportionately on Europe's poorest peoples.



Can't wait to share that one with Ed, who has more than once told me French workers are more productive than American workers.

I've been wanting Christopher to go to George Mason ever since I read Alexander Tabarrok and Peter Boettke's column about it in SLATE.

Now I'm sure.


-- CatherineJohnson - 31 May 2006



TheReverendBayesStrikesAgain 17 Jun 2006 - 11:24 CatherineJohnson




I've been a mother for 19 years.

Throughout all of that time I've been reading articles, news items, and studies telling me daycare is not only harmless but actively good for children; daycare is the superior choice, better than being cooped up alone with your crazy mother. Quality time, not quantity time. The culture spent years banging on about that one.

I heard it all.

Children raised in daycare had better immune systems, better social skills, were better prepared for Kindergarten - you name it, some researcher somewhere had found it and I had read it.

It was endless.

My favorite moment was the day, probaby 16 or 17 years ago, I read an article in the New York Times characterizing boys raised in daycare as less sensitive and less responsive to adult direction than boys raised at home. My ears pricked up at that one. Whoa, I thought. A study showing daycare is bad for children! Heads must be spinning out there at the Times.

But no.

That wasn't what the study showed at all.

The study showed daycare was good for children.

Stay-at-home moms were the bad thing. Our boys were sensitive little wimps. If you wanted a manly boy, you had to go with daycare.



Pretty early on, I decided all of this stuff was likely to be bunk.

I used my Bayes brain to figure it out.



So what do I find in today's issue of the New York Times?

Gosh, it's an article on the very bad effects of daycare!

What a surprise!

Who would have thought!


Starting in 1997, the Quebec Family Policy subsidized day care for 4-year-olds at government-approved centers around the province. By 2000, the program had expanded to cover any child not old enough for kindergarten, all the way down to infants....

Centers from downtown Montreal to Hudson Bay were flooded with applications....

Almost a decade after the family policy started, however, there was still a big mystery about it. Nobody had done the work to find out how it had affected children. The province was spending $1.4 billion a year on a grand social experiment, yet no one had bothered to look at the results.

So three economists took up the challenge a few years ago, realizing that the program offered an excellent way to examine a much-debated topic....

When they finished last year, the answer seemed clear. "Across almost everything we looked at," said Mr. Gruber, an M.I.T., professor, "the policy led to much worse outcomes for kids."

Young children in Quebec are more anxious and aggressive than they were a decade ago, even though children elsewhere in Canada did not show big changes. Quebec children also learn to use a toilet, climb stairs and count to three at later ages, on average, than they once did. The effects weren't so great for parents, either. More of them reported being depressed, and they were less satisfied with their marriages — which also didn't happen in other provinces.

Before you dismiss the researchers as just three more men starting a new assault in the mommy wars, listen to Jane Waldfogel, a leading child-policy researcher and the author of the book, "What Children Need" (Harvard University Press). "This is a very high-quality paper by high-quality guys," she said. "They're very careful. This is a paper that's going to stand."



Quelle surprise.

When you read these findings in the stark language of the paper's abstract (pdf file) it's even worse:

The growing labor force participation of women with small children in both the U.S. and Canada has led to calls for increased public financing for childcare. The optimality of public financing depends on a host of factors, such as the “crowd-out” of existing childcare arrangements, the impact on female labor supply, and the effects on child well-being. The introduction of universal, highly-subsidized childcare in Quebec in the late 1990s provides an opportunity to address these issues. We carefully analyze the impacts of Quebec’s “$5 per day childcare” program on childcare utilization, labor supply, and child (and parent) outcomes in two parent families. We find strong evidence of a shift into new childcare use, although approximately one third of the newly reported use appears to come from women who previously worked and had informal arrangements. The labor supply impact is highly significant, and our measured elasticity of 0.236 is slightly smaller than previous credible estimates. Finally, we uncover striking evidence that children are worse off in a variety of behavioral and health dimensions, ranging from aggression to motor-social skills to illness. Our analysis also suggests that the new childcare program led to more hostile, less consistent parenting, worse parental health, and lower-quality parental relationships.


I'm waiting for an apology.




tee hee

The day they find out helicopter moms are good for children is gonna be fun.




last but not least

I hope I'm not upsetting our working moms. I don't mean to. In truth, I'm a working mom myself, though I've managed to work at home, which my Bayes brain thinks is a good idea.

I don't know what I would have done if I'd had a job or career that I couldn't do at home. I probably would have worked. My point: I didn't throw up this post to criticize working moms, but to complain about stupid research while showing off my Extreme Bayesian Brain in the process.

Last but not least, my friends who worked from the time their kids were infants have terrific grown kids. We (re-)met them all at Christmas, so we know.

I'm just hoping Christopher & Andrew turn out as well.




update

Good grief.

I hadn't read the whole article when I wrote this post.

They're still doing it:

The picture is murkier for toddlers and preschoolers. The stimulation they get at day care tends to make them better prepared for school than children who are home with a parent full time. Yet those who spend too many hours in day care or attend poor-quality programs also seem to be at greater risk of obesity and behavior problems.

Naturally this was the passage Ed chose to read aloud to me over lunch. "The picture is murkier for toddlers and preschoolers," he said.

I guess that apology's not going to be coming any time soon.

Bayes speaks: "The stimulation they get at day care tends to make them better prepared for school than children who are home with a parent full time" is bunk.

Mark my words.

I can even say why it's bunk.

For years there's been a heaping load of research showing that firstborns and singletons are slightly smarter than later-borns, a phenomenon attributed to the fact that firstborns and singletons spend more time in the company of adults and less time in the company of other children, namely their siblings.

Daycare means more time with other kids, less time with adults.




update update

Thinking it over, I realized that the daycare parents used in the Canada study was government sponsored and government staffed.

There was a time I would have thought that was good.

After lo these many years in the public schools, that time is gone.



WALWHC.jpg



-- CatherineJohnson - 14 Jun 2006



SundemTierneyUnifiedCelebrityTheory 19 Sep 2006 - 15:59 CatherineJohnson




Sometime in my youth, in high school I think, I came up with my first writer idea.

I wanted to write a Dear Abby column with numbers.

The plan was to do a Math Trailblazers-like counting job on social pain.

Basically, my plan was to figure out how long it took to get over things.

How long did it take to get over being dumped?

How long did it take to get over someone dying? (Two years, I figured.)*

etc.

Then people could write in, tell me what bad thing had just happened to them, and I could write back telling them how long before they felt OK again.**

At the time, I hadn't (really) heard of probability & statistics — or, rather, I'd heard of statistics and probability, but I had no idea how it worked.

I just knew about counting.




Geek Logik

Today I learn from John Tierney ($) that a fellow named Garth Sundem has actually gone out and done a geek version of my high school kid concept:

I wish no ill to Brangelina, Tom and Katie, or Pamela Anderson and Kid Rock. Like any mortal, I revere the romances on Olympus. I thrilled to hear of Pam’s secret wedding and agonized at reports of Angelina’s reluctance to marry (or is Brad dragging his feet?). When I finished poring over Vanity Fair’s photo spread of Tom Cruise and Katie Holmes with their daughter, my only bitter thought was: Why just 22 pages?

But we inquiring minds must be realistic. Remember your crazy joy at past celebrity marriages — Jessica and Nick, Julia and Lyle, Uma and Ethan?

[snip]

[Y]ou were sure this one was for the ages — until the day their publicist put out the statement about an “amicable” decision to pursue “separate lives.” Amicable! How could the couple of the century bear to be apart? You felt deceived, used, discarded. You stared at their photo and thought: I don’t even know you anymore.

I can’t bear any more of these breakups, so I have turned to science to steel my heart. I went to Garth Sundem, the wickedly ingenious author of “Geek Logik,” a new book of mathematical formulas for deciding questions like whether you should sleep with a co-worker, whether you should join a gym or see a therapist, and whether you can wear a Speedo without frightening small children.


Sundem's formula predicting the likelihood that a celebrity marriage will last:

Tierney450.gif




Sundem's odds


0761140212.01._AA240_SCLZZZZZZZ_.jpg

     Geek Logik



* Amazingly enough, two years turned out to be a pretty good estimate. At least, it's a good estimate for me.

**This would be your Midwest farmer's concept of self-help.


-- CatherineJohnson - 19 Sep 2006



DeathByData 07 Oct 2006 - 21:56 CatherineJohnson




Ed said the other day that here in Irvington we are going to experience death by data.

He was right.

KDerosa predicted as much lo these many months ago. Or perhaps it was Steve H, or Doug Sundseth, or basically anyone who's ever read Kitchen Table Math.


[pause]


Ah.

It was Steve H, Smartest Tractor, and Stephanie O.

hmm.....I'm tilting slightly in the guys-do-stats-ish direction here in vague recollection-land.

oh well

Round up the usual suspects, and keep scrolling.



Let's review.

1. Irvington Union Free School District has a strategic plan.

2. The strategic plan has 5 goals.

3. Data warehousing, differentiated instruction, and portfolio assessment are goal #2.

4. A couple of weeks ago, at the School Board meeting, I got a look at data warehousing in action.


pop quiz

Can a 61% drop in scores of 4 between grades 4 and 8 in a class of 158 children be explained by 18 kids with scores of <4 moving into the district and a roughly similar number of students with indeterminate scores moving out?

Answer: no.



moving right along

So today I receive word via edline that the Irvington Middle School Site Committee plans a Parent Information Night.

To prepare for Parent Information Night the committee has posted a survey at SurveyMonkey.

I had the same first thought about a Site Committee survey that I did about data warehousing —

Oh, good!

We'll learn something!

The administration will find out how what parents want!

Things will improve!

Then I took the survey.


Irvingtonpushpullhalf.jpg



There are 10 questions in all.

1. Would you like to attend an information night about New York State Assessments in the middle school?

2. How interested are you in obtaining information about stress and anxiety strategies for your middle school child?

3. How interested are you in obtaining information about the Grade 6 New York Math Assessment

4. How interested are you in obtaining information about the grade 7 New York Math Assessment?

5. How interested are you in obtaining information about the grade 8 New York Math Assessment?

6. How interested are you in obtaining information about the grade 8 Social Studies State Assessment

7. How interested are you in obtaining information about the grade 8 Science State Assessment?

8. How interested are you in obtaining information about the grade 6 ELA State Assessment?

9. How interested are you in obtaining information about the grade 7 ELA State Assessment?

10. How interested are you in obtaining information about the grade 8 ELA State Assessment?



Nope.

Nothing problematic about the item construction here.



So naturally, after taking the survey myself (you may be able to view my responses here), I felt compelled to sit down and write a memo to the Site Committee explaining why, for Item #2, How interested are you in obtaining information about stress and anxiety strategies for your middle school child? I selected:


Wild horses could not drag me to a Parent Information Night at which I could obtain stress and anxiety strategies for my middle school child.


It's pretty much the same old same old:

I’ve selected “not at all interested” for the question, “How interested are you in obtaining information about stress and anxiety strategies for your middle school child?”

I’ve selected “not at all interested” because, for my family, this is the wrong question.

For us the right question would be something along the lines of, “How interested are you in obtaining information about strategies your school can use to create a supportive, motivating, and productive learning environment for your middle school child?

etc., etc.

blah, blah

yadda, yadda



You guys could write the rest.

In your sleep.



data mining at Wikipedia
Statistical Data Mining Tutorials
Two Crows Data Mining

dog of helicopter mom




-- CatherineJohnson - 07 Oct 2006

WebForm
TopicType: SubjectArea
SubjectArea:  
TopicHeadline: statistics: abusing them