The linear correlation coefficient is one of the fundamental concepts behind the interpretation of regression models. In order to understand the mathematics and ideas behind the correlation coefficient, try to solve the following problem by reviewing what you know. If you’re encountering this concept for the first time, read through this guide for a step-by-step walk through.  

The best Maths tutors available
Paolo
5
5 (63 reviews)
Paolo
£30
/h
Gift icon
1st lesson free!
Hiren
5
5 (23 reviews)
Hiren
£150
/h
Gift icon
1st lesson free!
Akash
5
5 (58 reviews)
Akash
£45
/h
Gift icon
1st lesson free!
Intasar
5
5 (48 reviews)
Intasar
£79
/h
Gift icon
1st lesson free!
Johann
5
5 (35 reviews)
Johann
£35
/h
Gift icon
1st lesson free!
Sehaj
4.9
4.9 (47 reviews)
Sehaj
£40
/h
Gift icon
1st lesson free!
Luke
5
5 (76 reviews)
Luke
£125
/h
Gift icon
1st lesson free!
Harjinder
4.9
4.9 (155 reviews)
Harjinder
£25
/h
Gift icon
1st lesson free!
Paolo
5
5 (63 reviews)
Paolo
£30
/h
Gift icon
1st lesson free!
Hiren
5
5 (23 reviews)
Hiren
£150
/h
Gift icon
1st lesson free!
Akash
5
5 (58 reviews)
Akash
£45
/h
Gift icon
1st lesson free!
Intasar
5
5 (48 reviews)
Intasar
£79
/h
Gift icon
1st lesson free!
Johann
5
5 (35 reviews)
Johann
£35
/h
Gift icon
1st lesson free!
Sehaj
4.9
4.9 (47 reviews)
Sehaj
£40
/h
Gift icon
1st lesson free!
Luke
5
5 (76 reviews)
Luke
£125
/h
Gift icon
1st lesson free!
Harjinder
4.9
4.9 (155 reviews)
Harjinder
£25
/h
Gift icon
1st lesson free!
Let's go

Problem 3

You are interested in knowing the relationship between the weather and tourism levels. To investigate, you collect data from the touristic centre in a city during one month in the summer, counting the number of people that arrive at the square at the same time every day. Given the data set below, what is the correlation between temperature and tourism? Interpret the correlation and name a few other reasons why these two variables are or are not related.  

Temperature Number of Visitors
12 87
21 150
20 110
25 90
17 85
15 70
13 90

 

What is the Correlation Coefficient?

The Pearson correlation coefficient, also known as the Pearson product-moment correlation coefficient, is one of the most powerful statistics in the field. Be careful not to confuse this with the coefficient of determination, which is also known as the “R squared” value.   The correlation coefficient is a statistic that measures the strength of the linear relationship between two variables. A linear relationship between any two variables means that when the two variables are graphed, they follow a straight line. In other words, an increase or decrease in one variable will see a corresponding increase or decrease in the other variable as well.   cuasation_correlation   You should be careful not to confuse the correlation coefficient with causation. Simply because two variables exhibit a strong linear correlation doesn’t mean one causes the other. A classic example is the strong linear relationship between shark attacks and ice cream sales. As ice cream sales increase, there is a corresponding increase in shark attacks as well. This does not mean an increase in ice cream sales cause an increase in shark attacks.   Correlation simply signals towards a relationship between two variables. However, those two variables might have an underlying, common relationship to another variable which can explain why they are related in the first place. In this example, ice cream sales and shark attacks can exhibit a strong relationship because of hot weather: the hotter it is, the more people buy ice cream and swim in the ocean.  

Derivation of Formula

The formula for the correlation coefficient is the following.   [ rho_{xy} = frac{Cov(x,y)}{sigma_{x} sigma_{y}} ]   While this formula may seem confusing at first, it is actually quite simple to understand when breaking down each element of the formula.  

Pearson product moment correlation
Covariance between x and y
Standard deviation of x
Standard deviation of y

  Let’s take the first element, which is the covariance. The covariance of two variables measures the direction of the relationship between them. In other words, the covariance measures how two variables move together.   Next, let’s look at the two elements of the denominator of the correlation coefficient. The standard deviation is a statistic that measures how far spread the variable is from the mean. The formulas for all three elements can be seen below.  

[ frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1} ]
[ sqrt{frac{sum(x_{i} - bar{x}^2)}{n-1}} ]
[ sqrt{frac{sum(y_{i} - bar{y}^2)}{n-1}} ]

  As you can see, these three elements are what go into deriving the correlation formula. In the numerator, you have the measure of the direction of the relationship between two variables. This relationship can be either positive or negative. If, for example, the relationship is positive, this means that a decrease in one variable would result in a decrease in another variable - and vice versa.   On the other hand, a negative covariance would mean that a decrease in one variable would result in an increase in the other variable, and again vice versa. The denominator is the multiplication of the standard deviations of both variables. The standard deviation of a variable is a measure of dispersion. This means that it measures the spread of a variable around it’s mean.   To derive the correlation coefficient formula, you first plug in the three elements of the formula into the correlation coefficient formula.   [ frac{frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1}}{sqrt{frac{sum(x_{i} - bar{x}^2)}{n-1}}*sqrt{frac{sum(y_{i} - bar{y}^2)}{n-1}}} ]   Recall that in mathematics, the square root of a fraction is simply the square root of the numerator divided by the square root of the denominator. This means that the denominator becomes:   [ frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1} div ( frac{sqrt{sum(x_{i} - bar{x}^2)}}{sqrt{n-1}} * frac{sqrt{sum(y_{i} - bar{y}^2)}}{sqrt{n-1}} ) ]   squareroot_fraction   Recall that a square root times itself is simply the number. For an example, take the number 3.     Also, keep in mind that when multiplying fractions, they become one fraction where the numerator is the two multiplied numerators and the denominator is the two multiplied denominators. For example, take the fraction one-third multiplied by one-fourth.   fraction_multiplication   Putting these two characteristics together, we can see that the denominator of the correlation coefficient formula becomes the following.   [ frac{sqrt{sum(x_{i} - bar{x}^2)}}{sqrt{n-1}} * frac{sqrt{sum(y_{i} - bar{y}^2)}}{sqrt{n-1}} = ]   [ frac{sqrt{sum(x_{i} - bar{x}^2)} * sqrt{sum(y_{i} - bar{y}^2)}}{sqrt{n-1} * sqrt{n-1}} = ]   [ frac{sqrt{sum(x_{i} - bar{x}^2)} * sqrt{sum(y_{i} - bar{y}^2)}}{n-1} ]   When plugging this number back into the numerator, please remember that a fraction divided by a fraction is the same thing as a fraction multiplied by the inverse of that fraction. Taking the same example form above, this means that one-third divided by one fourth is the same thing as one-third multiplied by four over one.   fraction_division   [ frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1} * frac{n-1}{sqrt{sum(x_{i} - bar{x}^2)} * sqrt{sum(y_{i} - bar{y}^2)}} ]

Cancelling out the denominator and the numerator, as they are both , and simplifying both the numerator and denominator, we get:   [ frac{n sum xy - sum x sum y}{n} * frac{n}{sqrt{ (n sum x^2 - (sum x)^2) ( n sum y^2 - (sum y)^2) }} ]   [ frac{n sum xy - sum x sum y}{sqrt{ (n sum x^2 - (sum x)^2) ( n sum y^2 - (sum y)^2) }} ]  

Interpretation of Correlation Coefficient

The interpretation of the correlation coefficient is quite simple and can be summarized by the table below.  

Value Direction Strength Interpretation
-1 Negative Very Strong Perfect negative correlation
-0.3 Negative Weak Very weak negative correlation
0 None None No correlation
0.3 Positive Weak Very weak positive correlation
1 Positive Very strong Perfect positive correlation
 

Step by Step Solution

The correlation is calculated below. 

Observation Happiness Score Work Hours
1.0 89.0 30.0 21.3 -11.7 -248.9 455.1 136.1
2.0 90.0 35.0 22.3 -6.7 -148.9 498.8 44.4
3.0 54.0 40.0 -13.7 -1.7 22.8 186.8 2.8
4.0 60.0 35.0 -7.7 -6.7 51.1 58.8 44.4
5.0 73.0 40.0 5.3 -1.7 -8.9 28.4 2.8
6.0 40.0 70.0 -27.7 28.3 -783.9 765.4 802.8
Average 67.7 41.7 Total -1116.7 1993.3 1033.3

  Plugging this into the formula, we get: [ r_{xy} = frac{-1116.7}{sqrt{1993.3*1033.3}} = -0.78 ]

Did you like this article? Rate it!

1 Star2 Stars3 Stars4 Stars5 Stars 4.00 (2 rating(s))
Loading...
Emma

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.