On correlation and causation
- October 18th, 2009
- Posted in Uncategorized
- Write comment
It just strikes my mind to see how some topics come back in cycles. I remember an interesting conversation
on correlation/causation and predictive models back in 2005. Yes, I’m an old man and quite silent, but I’m working on it.
Jonathan Lewis’ post on correlation puts me in a fuzzy state of over-stimulative reminiscences. This is why I often find Jonathan’s posts so stimulating: they are not only very informative but are food for thought and propose new exploratory possibilities.
Correlation implies cause
For a lot of people, this is intuitively right. For 2 measurements which are related with a third variable say time, having these measures changing over time in a very similar fashion looks like they share a common cause.
Constant change rates on both variables indicates linearity. Okay, let me be nit-picking and make some definitions:
The linear correlation coefficient r measures the strength and direction of a linear relationship (or association) between two variables (aka Pearson’s coefficient).
When r is close to 1, the relationship between both variables is strong. That means that when values for X go up, values for y also go up.
On the other hand, when r is close to -1, there is also a linear relationship, but now when x goes up, y goes down.
Finally when r is close to 0, there is no linear relationship, but random non-linear relationships can be found on these variables.
To tell how “strong” a correlation is depends on the kind of data. This is why scientific data generally need a higher correlation coefficient to call them “strong” (generally above 0.8) than medical/social/psychological data. It is well accepted today that the interpretation of a correlation coefficient depends on the context of data.
The determination coefficient R^2 gives the proportion of variance of one variable that is predictable from the other one. It will tell us how certain we can be of making a prediction from a certain model.
So, back to the original discussion, the sentence “correlation does not mean causation” doesn’t necessarily mean that correlation doesn’t indicate potential causal relations. It’s just saying that a strong correlation is not sufficient to establish a causal relationship. period.
Haven’t you seen Dr House ? The white board ? Yes, it’s fiction, but illustrative.
So far, I have found it useful to correlate some wait events from statspack measurements with other non-db measurements. In one particular case, many years ago, it helped me to find the root cause of a really odd performance issue. I found out the problem to be related to the NFS client configuration on a Solaris server while using NAS storage and Oracle 8.0.6.
Personally, I would be very careful on building predictive models for anything in Oracle. One thing I’ve learned over all these Oracle versions is that one size doesn’t fit all, and of course, the more I know, probably the more I miss. The only steady ground I have is the scientific method: hypothesis, test, prediction.
Working with test cases, with representative test data, with a well-known baseline state.
As Ian Anderson sings, “life is a song”…
Un maître zen et son disciple :
- Disciple, qu’est-ce qui donne la douceur à ton thé, le sucre ou la cuiller?
- Maître, si tu me poses la question, je suppose que je dois répondre la cuiller. Mais alors, à quoi sert le sucre?
- A savoir combien de temps remuer la cuiller.
Cher Virgile,
Tout d’abord, merci pour ce grand paradoxe qui a certainement troublé la méditation de générations de moines bouddhistes zen.
J’attire ton attention sur le fait suivant: il est fort peu probable qu’un moine zen utilise une cuiller occidentale et en metal. Il usera sans doute du chasen en bambou.
Voici un autre paradoxe, assez connu:
Un prisonnier attend que son sort soit fixé.
Son cerbère se présente un matin à lui et lui annonce
“Un des prochains 30 jours, tu seras exécuté. Mais le jour de ton exécution sera une surprise”
Perhaps, you’d be interested to join The BAAG Party?
Thanks Alex. I almost thought you were referring to BAARF
)