Chapters

What is Regression?
Simple Linear Regression
SLR Interpretation
Problem 1
Solution to Problem 1
Problem 2
Solution to Problem 2
Problem 3
Solution to Problem 3

The best Maths tutors available

What is Regression?

Take a look at the following table, which is a dataset from a sample of high school students.

Hours Spent on Phone	Hours Spent Outside
2	0.5
1	0.6
5	0.2
3	0.4

With descriptive statistics, which measures the centre and spread of the data, we could calculate the mean number of hours spent on their phone or outside. We could also calculate the variance in the data or plot the number of hours spent on their phone with a bar chart.

While descriptive statistics are very powerful, inferential statistics can help us predict what is not included in our dataset. Regression analysis is one of the tools of inferential statistics, which models the linear relationship between two or more variables. Take a look at the image below, which you’ll be able to interpret by the end of this guide.

Simple Linear Regression

Simple linear regression is a form of linear regression in which there is only one independent and one dependent variable. To understand these variables, take a look at the image below.

This is the sample SLR equation, which closely follows the equation of a line, which can be seen below.

As you can see, the SLR equation is composed of four main components. These components are explained in the table below.

Component	Definition	Interpretation
Y	Response variable	The variable that increases or decreases in response to changes in x
X	Explanatory variable	The variable that describes the variation in y
Bo	Constant	The value of y if x was zero
B1	Slope	The amount of increase or decrease (if positive or negative) in y following a 1 unit change in x

SLR Interpretation

In order to understand how to interpret and SLR model, you should know that besides the four elements mentioned above, there are typically two more elements given in a regression model, summarized below.

Component	Definition	Interpretation
R-squared	Proportion of variance of the response variable explained by the explanatory variables	A high R-squared indicates the regression model is good at explaining the variance in y
Standard error of regression	The standard error between the data points and the predicted values	A low SE of the regression means that the data points and predicted values are close together

Problem 1

You are interested in studying the relationship between income level and energy consumption. In order to do this, you are given a data set that includes the variables income and energy consumption. The income variable is in thousands of dollars while the energy consumption variable are in megawatt hours, or MWh.

Calculate the covariance and correlation coefficient of these variables. Using this information, interpret the graph of both variables which is given below.

Income	Energy Consumption
35	9
46	10
52	11
60	12
85	16

Solution to Problem 1

In this problem, you were asked to calculate the correlation coefficient and then interpret the variables in the graph using this information. To calculate the correlation coefficient, you first have to calculate the mean of both x and y. Next, you subtract this value from each observation and plug the results into the formula.

Income	Energy Consumption	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
35	9	-21	-3	53.56	424.36	6.76
46	10	-10	-2	15.36	92.16	2.56
52	11	-4	-1	2.16	12.96	0.36
60	12	4	0	1.76	19.36	0.16
85	16	29	4	129.36	864.36	19.36
Mean = 56	Mean = 12		Total	202	1413	29

r(x,y) = \dfrac{202}{\sqrt{1413*29}} = 0.995

Problem 2

In the previous problem, you were asked to explore the relationship between the two variables of energy consumption and income level using the covariance and correlation coefficient. Now, you want to see if there is another factor in determining energy consumption. You are given a data set that, for the same energy consumption observations, has data on the average temperature in that region.

Given the following graph of the two variables, calculate the correlation coefficient of the two variables and compare it to the previous two variables. In other words, find out if income level and energy consumption are more strongly or weakly correlated than average temperature and energy consumption.

Average Temperature	Energy Consumption
20	9
19	10
10	11
4	12
28	16

Solution to Problem 2

In order to compare the two variables, we need to find the correlation between average temperature and energy consumption.

Average Temperature	Energy Consumption	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
20	9	4	-3	-9.88	14.44	6.76
19	10	3	-2	-4.48	7.84	2.56
10	11	-6	-1	3.72	38.44	0.36
4	12	-12	0	-4.88	148.84	0.16
28	16	12	4	51.92	139.24	19.36
Mean = 16	Mean = 12		Total	36	349	29

r(x,y) = \dfrac{36}{\sqrt{349*29}} = 0.361

Income and energy consumption are more highly correlated than average temperature and energy consumption.

Problem 3

You have now determined which variables are more strongly correlated. In order to be able to use this information, you need to use the data to model energy consumption. That is, you need to build an SLR model with the data that you have. Recall that there are two main elements you need to calculate in order to build an SLR model: the constant and the regression coefficient. You can find these formulas below.

Find the SLR model using the data of the most strongly correlated variables. Next, perform an interpolation and extrapolation using any values. Recall that interpolation is when you predict y using an x variable that is already included in the range of your data set. Extrapolation, on the other hand, is when you predict a y using an x that is outside the range of your data. The picture below should give a clearer idea.

Solution to Problem 3

In order to build a regression model, we must find the values for b_{o} and b_{1}. Recall the information we already calculated.

$\text{[math]}$	56
$\text{[math]}$	12
$\text{[math]}$	202
$\text{[math]}$	1413
$\text{[math]}$	29

$\text{[math]}$	$\text{[math]}$
	= $\text{[math]}$ = 0.14

$\text{[math]}$	$\text{[math]}$
	= $\text{[math]}$ = 3.6

Did you like this article? Rate it!

4.00 (4 rating(s))

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.

Solution to Problem of Regression 4

What is Regression?

Simple Linear Regression

SLR Interpretation

Problem 1

Solution to Problem 1

Problem 2

Solution to Problem 2

Problem 3

Solution to Problem 3

Theory

Central Limit Theorem

Linear Correlation Coefficient

Type I and Type II Errors

Hypothesis Testing

Linear Regression

Sampling

Solution to Problem of Regression 4

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 8

Solution to Problem of Regression 3

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 9

Solution to Problem of Regression 2

Solution to Problem of Regression 3

Solution to Problem of Regression 2

Solution to Problem of Regression 8

Two Variable Statistics

Solution to Problem of Regression 1

Solution to Problem of Regression 7

Solution to Problem of Regression 1

Solution to Problem of Regression 4

Solution to Problem of Regression 7

Correlation

Covariance

One Tailed Test

Two Tailed Test

Exercises

Problems of Correlation and Regression

Hypothesis Testing Problems

Problems of Regression

Cancel reply