Sunday, June 15, 2008

Does correlation equal causation?

"Critical Flop", the consistently funny satirist, poses an important question in the comments to the most recent Tuesday Statisticz:

"Not to sound like a stuffy statistics professor, but even if there was a correlation, that's just a correlation, right? It doesn't mean one is the cause of the other."

I have lots and lots of thoughts on this. The general idea from Science (with a capital S) is that you can never prove anything to be true, you can merely show that the opposite of it is not true. This is why scientists conduct experiments, and why Jared Diamond sought natural experiments in his research for Guns, Germs, and Steel.

Unfortunately, brushing aside correlations has gotten people into trouble from time to time. R. A. Fisher, the famous statistician, wrote in a letter to Nature in 1957 that:

"The curious associations with lung cancer found in relation to smoking habits do not, in the minds of some of us, lend themselves easily to the simple conclusion that the products of combustion reaching the surface of the bronchus induce, though after a long interval, the development of a cancer."

Essentially, he was saying that we couldn't assume that cigarettes cause cancer, because although there was a "curious association", correlation does not equal causation. In case you haven't been forwarded the e-mail yet, he was wrong.

At the same time, we can't look at r² values and make a tidal wave out of surf wake. Even if the r² had been higher, we shouldn't conclude that more Tarantino and Lynch will cause people to invent more patentable stuff. There's likely to be a third variable at play, like the economic mobility in the country.

What you want to do in order to prove causation is to start ruling out all these other possible third variables. So check to see if the type of government influences patent applications. Check to see if something else also has a correlation. Once you start ruling other variables out, your explanation will look better and better.

Bottom Line: Sometimes. Correlation is a good start, and if you build up more and more data, you can make a good case that there is a causal relationship.