KTM User Pages
04 Aug 2005 - 04:15 Catherine's post on causation implications in a multivariate study of college success and high school curricula. The correlation-implies-causation fallacy is the idea that whenever two variables are correlated -- i.e., when they change together, and so appear to be related -- one must necessarily be the cause of the other. Famous examples: High stock prices and short skirt lengths tend to occur together. Do short skirts cause high stock prices? No -- another explanation is that they are both likely to be symptoms of an exuberant public mood. High chocolate consumption and acne tend to go together. This is easily explained by the fact that teenagers are both big chocolate consumers and the biggest acne sufferers; but the idea that chocolate causes acne persists (and it helped ruin my own adolescence, since I couldn't with a clear conscience solace my pain over my zits with a nice bar of chocolate). But in the example Catherine gives, it's harder to tease apart correlation and causation, because of the timing of the data being studied; the one variable, whether the student had a college-prep curriculum, occurs several years before the other one, the student's graduating from college. Any correlation in data like that strongly suggests causation, but not reliably. Here's another example, more loaded: birth defects are correlated with mothers who drink alcohol during pregnancy. Of course, the correlation by itself doesn't prove causation; but we certainly would like to know NOW whether drinking is the cause of the birth defects. How can we tell whether a correlation relationship is actually a causal one? Not surprisingly, there are generally accepted methods for making causal inferences (I should mention that I am now officially way out of my realm of expertise, but here goes anyway). Here are some conditions under which you're permitted to make causal inferences, culled out of some medical literature I found online (obviously teasing out correlation and causation is critical in medicine). 1. A correlation is present. 2. The relationship is statistically significant, i.e. very unlikely to be due to chance. 3. The presence of one factor predates the other (e.g., the drinking happened before the birth defects; the college prep courses came before the college graduation). 4. Evidence from other experiments or statistical studies proves that it is unlikely that the relationship is due to a third variable. It's item 4 that I have my doubts about, with respect to the high school courses and college graduation relationship. How did they eliminate all those 3rd factors that might be determining success in both college and high school?
low birth weight paradox
how good are our best students?
Back to main page.
KtmGuest (password: guest) when prompted.
Please consider registering as a regular user.
Look here for syntax help.
Hmmm. I don't know. I'm not sure whether you can ever infer causation from completely statistical parameters of not. A quick Google search turned up contradictory examples: 1) First, let me give a short answer to the question "When does correlation imply causation?" The short answer is: When the data from which the correlation was computed were obtained by experimental means with appropriate care to avoid confounding and other threats to the internal validity of the experiment. This author seems to be strongly saying that pure statistical means are never adequate. He goes on to say: I have frequently encountered this delusion, the belief that it is the type of statistical analysis done, not the method by which the data are collected, which determines whether or not one can make causal inferences with confidence. Several times I have I had to explain to my colleagues that two-group t tests and ANOVA are just special cases of correlation/regression analysis. One was a senior colleague who taught statistics, research methods, and experimental psychology in a graduate program. When I demonstrated to him that a test of the null hypothesis that a point biserial correlation coefficient is zero is absolutely equivalent to an independent samples (pooled variances) two-groups t test, he was amazed. 2) On the other hand, we have this very clear statement that under the right circumstances pure statistical tools can be used to infer causality: One of the most exciting things about Bayes nets is that they can be used to put discussions about causality on a solid mathematical basis. One very interesting question is: can we distinguish causation from mere correlation? The answer is "sometimes", but you need to measure the relationships between at least three variables; the intution is that one of the variables acts as a "virtual control" for the relationship between the other two, so we don't always need to do experiments to infer causality. See the following books for details. * "Causality: Models, Reasoning and Inference", Judea Pearl, 2000, Cambridge University Press.
* "Causation, Prediction and Search", Spirtes, Glymour and Scheines, 2001 (2nd edition), MIT Press.
* "Cause and Correlation in Biology", Bill Shipley, 2000, Cambridge University Press.
* "Computation, Causation and Discovery", Glymour and Cooper (eds), 1999, MIT Press. Needless to say, I'm eager to read those books now. 3) This reference points out that many statistical tests for correlation are often used in contexts in which they are not valid. It also gives some heuristic (but non-statistical) criteria similar to Carolyn's for concluding that an observed correlation is indeed a causation. Finally, one should note that philosophers from Hume on have raised considerable doubt about the very existence of causation at all. I remember reading a particularly pellucid argument given by Bertrand Russell. In hindsight, though, it strikes me that the arguments against are examples of the sorites fallacy. In any case, causation seems to me like a reasonable hypothesis to keep in mind for everyday life. -- BernieJohnston - 04 Aug 2005
In the field of education, I keep hearing people repeat the statistic about how much more college graduates earn than high school graduates do. It is invariably presented as a reason to go to college or an argument for how higher education helps the economy. The assumption is that the college degree is the causal factor for the increased earnings. It seems to me, though, that more intelligent people will tend to earn more money (with or without a degree), and more intelligent people are more likely to earn a college degree. In extreme cases, I believe people like Bill Gates and Micheal Dell are college drop-outs. -- DanK - 04 Aug 2005
Actually, I just remembered a conversation I had on this subject with my neighbor, who is a statistician. In fact, statisticians do have methods of identifying causality that I would certainly be inclined to trust. Most of the field of medicine is predicated on statistics, not on controlled studies. For instance, probably most of us believe that smoking cigarettes causes cancer. That link, in humans, is strictly correlational and always will be (although I believe controlled research has been done with animals). The twin studies are also interpreted, by researchers, to indicate that genetics cause behavioral traits. Controlled studies of twins separated at birth will never be done. And....I'm now remembering Sarah McClanahan (I may have misspelled her name.) She was a single mother at Princeton who read an article in THE NEW YORKER attributing black children's failure to single motherhood and got angry enough to decide to prove it was wrong. Her plan was to show that the problem for black children with single mothers was that they were poor, not that they had single mothers. Her research--none of it controlled, obviously--showed exactly the opposite. Wealthy kids with single mothers were also doing very badly. Caroline Hoxby's research fascinates me, because she creates 'found experiments,' identifying schools or towns, etc. that are identical in every way except for the factor she wants to measure. So obviously statisticians have figured out ways to identify causality, but I need to know lots more about this to be able to read these studies at all. DanI have a zillion charts on the college degree business, and as far as I can tell the correlation is strong, and holds across all different people, different SES's, different races, etc. I'm leaving town today, but one of the charts I need to get up was an interesting one showing growth in jobs. There is huge growth in jobs requiring 2 years of college, minimal growth in jobs requring only on the job training, and practically no growth in jobs requiring a high school degree. I now see the college-prep-for-everyone issue completely differently. What you hear, constantly, is 'not everyone goes to college.' That is always said as a protest against creating more rigorous high school curricula. Of course, that's true; not everyone goes to college. But in fact, the nature of the jobs we have demands some college education, and certainly a rigorous high school education. To say that 'not everyone goes to high school' is to say nothing about what kind of curriculum high schools need to provide. Kids who aren't going to go to college also need college prep. And it appears that when you give these kids everyone agrees 'aren't going to go to college' a college prep, many of them then do go to college and graduate! -- CatherineJohnson - 04 Aug 2005
Bayesian nets were part of the PowerPoint explanation I read yesterday as well. -- CatherineJohnson - 04 Aug 2005
The idea that, logically speaking, there's no such thing as causality is what I call logic-mongering. Logic, if you stick with it, will lead to nonsensical positions (and I'm sure there's a logical explanation for this). I have 'core' belief in implicit knowledge, which is the knowledge we have that we don't (necessarily) consciously have. (Think Alan Greenspan!) I should find the Alan Greenspan op-ed the WSJ just ran. Wonderful. In any case, I trust logic only to a point, and that point's not too far when the person using logic isn't an expert in the field he or she is reasoning about. Actually, it would be interesting to know what kind of logic the 'cognitive unconscious' uses. It may be plain old logic, only with more variables thrown in. This is probably making zero sense; what I'm trying to say is that working memory, which is consciousness, can hold only a tiny number of variables in consciousness at a time. The cognitive unconscious seems to be able somehow to combine many variables.....or to know which variables out of an infinity of variables to focus on.....something like that. I need to know more about this. But, also, I need some guidelines for understanding when I'm looking at a statistical correlation that probably does indicate causality! Great post, Carolyn--that's incredibly helpful. One last thing: 'converging lines of evidence' are VERY important in scientific research. When you start having different scientists in different fields coming up with the same data, everyone assumes that's a strong indication that we're getting close to an answer. I think that's probably what item number 4 is talking about. -- CatherineJohnson - 04 Aug 2005
Catherine, I'm all for college prep high school for everyone. Or, perhaps a better term for it is just more rigorous high school. In fact, if high school was better, many of those jobs wouldn't require two years of college. High school would be enough. Still, my point about college is that it isn't the big difference maker, really. Sure, kids learn stuff in college, and a lot of it is directly related to high-paying jobs. But consider the high school valedictorian and the kid in the bottom third of the class. If neither goes to college, would you expect them to both have the same earnings prospect? I would expect the valedictorian to do much better. Perhaps after college he will earn three times as much as the lower ranked kid. Even without college, though, he would earn double. So, if you convince the lower third kid to go to college, will he now be expected to earn the same as the valedictorian? I don't think so. Colleges get to cream off the top n% of high schoolers. The fact that those n% outearn the 100-n% can't be a big surprise. Some colleges might be like country clubs. I don't have hard stats, but I bet country club members have significantly higher average earnings than non-members. Would that mean that country club membership prepares one to earn higher wages? In fact, it might cause a small boost, but that's a tiny fraction of the difference in earnings. -- DanK - 04 Aug 2005
Catherine, Sorry, I'm highly skeptical of your post. "most of us believe that smoking cigarettes causes cancer" may be true, but most of us believing it doesn't make it so. If most of the American people believe that Echinacea cures colds or most of the German populace of the '30's believed Jews were the spawn of the devil, does that make it true? I believe that smoking causes cancer because, in addition to statistical correlation, it makes perfect sense within my model of biological reality. But that still doesn't prove it. The fact that researchers are frequently using correlation to infer causation does not imply that their methods are sound. Statistics is very tricky. As the first link above shows, even trained professional statisticians can sometimes be completely misled about what they are actually finding. When people find results that are the opposite of what they expect they are more believable because it seems less likely that their biases are determining their results. But that still doesn't make it true. It is always possible that there was some other factor they overlooked. Overlooking things is often a greater danger than outright bias. "So obviously statisticians have figured out ways to identify causality"--that's not obvious to me. I'm going to have to study this thoroughly before I am convinced one way or the other. "Caroline Hoxby's research fascinates me, because she creates 'found experiments,' identifying schools or towns, etc. that are identical in every way except for the factor she wants to measure." That isn't possible. The very fact that the two towns are sitting in two different locations and populated with different people means that they aren't identical in every way. The problem in science is knowing exactly which variables are relevant and without controlled experiment you cannot know definitively. I'm not being argumentative here. A simple example is weather. Weather affects people greatly. Most revolutions occur in the warm part of the year; people get hot under the collar when it's hot. Snow brings out other attitudes, etc. The weather alone will necessarily be different in the two towns and will undoubtedly have effects on behavior. (I refer you to Feynman's discussion of rat-running experiments below.) I believe in implicit knowledge too, but I am painfully aware that it is often wrong. There are many situations where our intuition is completely wrong, where the things we think we know are completely wrong. A few hundred years ago most people believed there were witches controlling things. Some people know that the number 13 is unlucky. The Germans knew the Jews were evil. Other examples where what is known by common sense is wrong are quantum mechanics or the theory of random walks. Lots of people have a hard time believing the results of the Monte Hall problem. I refer you to Feynman's wonderful commencement speech. Here are a few relevant quotes. "During the Middle Ages there were all kinds of crazy ideas, such as that a piece of rhinoceros horn would increase potency. Then a method was discovered for separating the ideas -- which was to try one to see if it worked, and if it didn't work, to eliminate it. This method became organized, of course, into science. And it developed very well, so that we are now in the scientific age. It is such a scientific age, in fact, that we have difficulty in understanding how witch doctors could ever have existed, when nothing that they proposed ever really worked -- or very little of it did. But even today I meet lots of people who sooner or later get me into a conversation about UFO's, or astrology, or some form of mysticism, expanded consciousness, new types of awareness, ESP, and so forth. And I've concluded that it's not a scientific world. " ... This one is relevant to KTM: "But then I began to think, what else is there that we believe? (And I thought then about the witch doctors, and how easy it would have been to check on them by noticing that nothing really worked.) So I found things that even more people believe, such as that we have some knowledge of how to educate. There are big schools of reading methods and mathematics methods, and so forth, but if you notice, you'll see the reading scores keep going down -- or hardly going up -- in spite of the fact that we continually use these same people to improve the methods. There's a witch doctor remedy that doesn't work. It ought to be looked into; how do they know that their method should work? Another example is how to treat criminals. We obviously have made no progress -- lots of theory, but no progress -- in decreasing the amount of crime by the method that we use to handle criminals." ... This one is most relevant to our discussion: "When I was at Cornell, I often talked to the people in the psychology department. One of the students told me she wanted to do an experiment that went something like this -- it had been found by others that under certain circumstances, X, rats did something, A. She was curious as to whether, if she changed the circumstances to Y, they would still do A. So her proposal was to do the experiment under circumstances Y and see if they still did A. I explained to her that it was necessary first to repeat in her laboratory the experiment of the other person -- to do it under condition X to see if she could also get result A, and then change to Y and see if A changed. Then she would know the the real difference was the thing she thought she had under control. She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1947 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happened." ... On controlled experiments: "there have been many experiments running rats through all kinds of mazes, and so on -- with little clear result. But in 1937 a man named Young did a very interesting one. He had a long corridor with doors all along one side where the rats came in, and doors along the other side where the food was. He wanted to see if he could train the rats to go in at the third door down from wherever he started them off. No. The rats went immediately to the door where the food had been the time before. The question was, how did the rats know, because the corridor was so beautifully built and so uniform, that this was the same door as before? Obviously there was something about the door that was different from the other doors. So he painted the doors very carefully, arranging the textures on the faces of the doors exactly the same. Still the rats could tell. Then he thought maybe the rats were smelling the food, so he used chemicals to change the smell after each run. Still the rats could tell. Then he realized the rats might be able to tell by seeing the lights and the arrangement in the laboratory like any commonsense person. So he covered the corridor, and still the rats could tell. He finally found that they could tell by the way the floor sounded when they ran over it. And he could only fix that by putting his corridor in sand. So he covered one after another of all possible clues and finally was able to fool the rats so that they had to learn to go in the third door. If he relaxed any of his conditions, the rats could tell. Now, from a scientific standpoint, that is an A-number-one experiment. That is the experiment that makes rat-running experiments sensible, because it uncovers that clues that the rat is really using -- not what you think it's using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat-running. I looked up the subsequent history of this research. The next experiment, and the one after that, never referred to Mr. Young. They never used any of his criteria of putting the corridor on sand, or being very careful. They just went right on running the rats in the same old way, and paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn't discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats. But not paying attention to experiments like that is a characteristic example of cargo cult science." As for logic, it is only misleading when our premises are wrong. And what's the alternative? Illogic doesn't seem to get us anywhere at all. -- BernieJohnston - 04 Aug 2005
trying to understand "causation" is probably a philosophical dead-end . . . indeed, it seems to lead straight into the old "free will versus determinism" tarpit. just a feeling, you understand . . . maybe it's interesting, maybe it's even useful. but as far as i can see, it's metaphysics & shouldn't distract us from just pragmatically fiddling around with our mental models until we find something that works for our purposes. to the rat, it looks like choosing the right door "causes" the food to become available (i refer, of course, to the properly controlled experiment) -- and that's fine for the rat. the psychologist believes that the food's available "because" she put it there . . . fine for her. but maybe there's some higher level being that's secretly experimenting on the psychologist "god", say) in which case there's no better basis for the p'gist to think "i fed the rat" than for the rat to think "i earned these food pellets here by my own undaided labor". who knows? statistics is powerless against such questions. owen still refusing to sign up on th' blookie. -- KtmGuest - 06 Aug 2005
Owen, why not join up? Don't you want to be cool? My dog is not clear on what causes food to happen, or what causes the toy to Get Thrown. Unfortunately, the latter problem is a real problem (for her), since she simultaneously wants to play keepaway with the toy, and wants it to be thrown again. This is the problem with bad ed research. We all want to push our educational agendas, and to see the kids do better at their studies. Best to try to get really clear on what causes the latter to happen. -- CarolynJohnston - 06 Aug 2005
"I'm not sure whether you can ever infer causation from completely statistical parameters o[r] not. A quick Google search turned up contradictory examples: 1) First, let me give a short answer to the question 'When does correlation imply causation?' The short answer is: When the data from which the correlation was computed were obtained by experimental means with appropriate care to avoid confounding and other threats to the internal validity of the experiment." This seems to me to be a ridiculous conclusion, first and foremost because correlation can always IMPLY causality. Secondly, the criteria this person lists (experimental means, avoid confounding) are attributes of an EXPERIMENT, not a correlation. So, in a strict sense, the author is correct in saying that a correlation can give us credible evidence of causality when that correlation is, in fact, not a correlation at all. "This author seems to be strongly saying that pure statistical means are never adequate." What the words say to me is that if a person uses some experimental methods to obtain a correlation, then he or she can have greater confidence in the entirely unreliable and fickle determination of causality that a correlation provides. "One very interesting question is: can we distinguish causation from mere correlation? The answer is "sometimes," but you need to measure the relationships between at least three variables; the intu[i]tion is that one of the variables acts as a "virtual control" for the relationship between the other two, so we don't always need to do experiments to infer causality." Again, from a scientific standpoint, the words here are complete crap. Mathematical analysis of the variables can yield great insights (just ask Einstein), but the experiments must still be done to really determine causation for scientists (just ask all the physicists who came after Einstein). -- JdFisher - 08 Aug 2005