|
Pearson's Correlation Coefficient, r
Correlation is a technique for investigating the relationship between two quantitative, continuous variables, for example, age and blood pressure. Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables. The first step in studying the relationship between two continuous variables is to draw a scatter plot of the variables to check for linearity. The correlation coefficient should not be calculated if the relationship is not linear. For correlation only purposes, it does not really matter on which axis the variables are plotted. However, conventionally, the independent (or explanatory) variable is plotted on the x-axis (horizontally) and the dependent (or response) variable is plotted on the y-axis (vertically). The nearer the scatter of points is to a straight line, the higher the strength of association between the variables. Also, it does not matter what measurement units are used. Values of Pearson's correlation coefficientPearson's correlation coefficient (r) for continuous (interval level) data ranges from -1 to +1:
Positive correlation indicates that both variables increase or decrease together, whereas negative correlation indicates that as one variable increases, so the other decreases, and vice versa. Example ScatterplotsIdentify the approximate value of Pearson's correlation coefficient. There are 8 charts, and on choosing the correct answer, you will automatically move onto the next chart.
Tip: that the square of the correlation coefficient indicates the proportion of variation of one variable 'explained' by the other (see Campbell & Machin, 1999 for more details). SignificanceThe t-test is used to establish if the correlation coefficient is significantly different from zero, and, hence that there is evidence of an association between the two variables. There is then the underlying assumption that the data is from a normal distribution sampled randomly. If this is not true, the conclusions may well be invalidated. If this is the case, then it is better to use Spearman's coefficient of rank correlation (for non-parametric variables). See Campbell & Machin (1999) appendix A12 for calculations and more discussion of this. It is interesting to note that with larger samples, a low strength of correlation, for example r = 0.3, can be highly statistically significant (ie p < 0.01). However, is this an indication of a meaningful strength of association? NB Just because two variables are related, it does not necessarily mean that one directly causes the other! Worked exampleNine students held their breath, once after breathing normally and relaxing for one minute, and once after hyperventilating for one minute. The table indicates how long (in sec) they were able to hold their breath. Is there an association between the two variables?
The chart shows the scatter plot (drawn in MS Excel) of the data, indicating the reasonableness of assuming a linear association between the variables. Hyperventilating times are considered to be the dependent variable, so are plotted on the vertical axis. Output from SPSS and Minitab are shown below: SPSS Minitab Pearson correlation
of Normal and Hypervent = 0.966 In conclusion, the printouts indicate that the strength of association between the variables is very high (r = 0.966), and that the correlation coefficient is very highly significantly different from zero (P < 0.001). Also, we can say that 93% (0.9662) of the variation in hyperventilating times is explained by normal breathing times. |
|
Page Options: Standard Contrast | High Contrast | Low Contrast
Terms & Conditions
|
Privacy Policy
|
Feedback
|
Help
|
Back to top |