Chapters

The Interquartile Range
Outliers
Influential Observation
How to Identify Outliers
Practice Problem 1
Problem 2
Problem 3
Solution Problem 1
Solution Problem 2
Solution Problem 3

In previous sections of this guide to descriptive statistics, we introduced you to the fundamental concepts underlying measures of central tendency and variability. Namely, we walked you through the formulas of these measures as well as some intermediate applications of them. Here, we’ll expand upon these topics, presenting the concept of outliers as well as providing you with some practice problems.

The best Maths tutors available

The Interquartile Range

In the previous sections, you were introduced to quartiles and the interquartile range, otherwise known as the IQR. To briefly recap, the interquartile range is defined as the distance between the first and third quartiles, which contains both the median and 50% of the data. Recall the image below, used as an example illustration of the IQR.

While the IQR has many applications, including ones tied to the discussion on outliers explained further on in this section, what is important to note is how the measures of central tendency play into the IQR. This is easiest to see when looking at data plotted on a boxplot.

Boxplots can be an effective way of displaying the IQR because they can display many measures of central tendency and variability. The mean and median can be seen in both plots, where the boxplot on the left shows a boxplot where the mean is greater than the median and the boxplot on the right shows a distribution where the median and mean are equal.

The distribution, defined as how the variables are spread out, is best interpreted by the IQR. The boxplot on the left shows a boxplot where the first quartile is closer to the median than the third quartile. The boxplot on the right, on the other hand, shows a distribution where the median and mean are equidistant from both quartiles 1 and 3.

These differences in where the measures lie on the boxplot are due to differences in distributions. Where the distribution on the right is indicative of a normal distribution, the one on the left signals a skewed distribution. We’ll go more into more detail on distributions later. For now, you can find a recap of the measures of central tendency and variability you can observe from boxplots in the table below.

Measure	Location on Boxplot	Interpretation
Mean	Typically located above or below the mean and within the IQR, although there are exceptions	The average of the data
Median	Located at quartile 2	Half the data fall above and below this point (the 50% mark)
Minimum	Located at Q0	The lowest value of the data set
Maximum	Located at Q4	The highest value of the data set
Interquartile Range	Between Q1 and Q3	Holds 50% of the data, the median and information about the centre 50% of the data set

Outliers

If you’ve never heard of outliers in a mathematical or statistics setting, you’re bound to have heard it used in other disciplines. This is due mainly because of the fact that the definition of outliers is broad and can therefore be applied to situations beyond mathematics.

An outlier is defined as a point that diverges from the typical pattern. In other words, an outlier is different from the rest of the data set.

Influential Observation

It’s easy to confuse outliers with influential observations. However, it can be easier to separate the two by thinking of outliers as a measure belonging mainly to descriptive statistics while influential observations are typically used when utilizing inferential statistics.

An influential observation is a data point or points that have an impact on the slope of a regression line. Reserving the details of regression for our guide on inferential statistics, you can get a basic understanding of the difference between these two statistical concepts from the images below.

As you can see, the regression line on the left is not affected by the inclusion of the red point, whereas on the right, we can see that the regression line changes significantly with the inclusion of the pink point. This suggests the red point is an outlier and the pink point is an influential observation.

How to Identify Outliers

In statistics, there are many different ways to identify whether or not a point is an outlier. There are two basic methods you can employ to identify an outlier, which are summarized in the table below.

Method	Description	Example
Standard Deviation Method	If the data has a normal distribution, we can use the 68-95-99.7 rule to determine outliers. This means we can arbitrarily set limits, typically 3 $\text{[math]}$ and above, to identify outliers.	If we set it at 3 $\text{[math]}$ , this means that any point 3 $\text{[math]}$ away from the mean and beyond can be considered outliers.
Interquartile Range Method	If the data doesn’t have a normal distribution, we can use the IQR as a benchmark for outliers as it contains 50% of the data. Typically, the limits are, again, arbitrarily set at IQR * $\text{[math]}$ away from the 25th and 75th quartiles, where $\text{[math]}$ is typically set at 1.5.	If Q3 is 10 and Q1 is 3, the IQR would be 10 - 3 = 7. Then, the lower limit and upper limit for the data set would be 7*1.5 = 10.5. This means that any point below 3-10.5 = -7.5 and above 10+10.5 = 20.5 could be considered an outlier.

Practice Problem 1

Calculate the following descriptive statistics from the data given in the table below:

Median
Mean
Interquartile Range

Observation	Value
1	5
2	16
3	24
4	28
5	30
6	31
7	32
8	35
9	95

Problem 2

You are trying to decide whether or not you have an outlier in your data set. Use the standard deviation method in order to determine if there are any outliers in your data, given in the data table below.

Observation	Value
1	4
2	6
3	3
4	9
5	60
Mean	16.4
Standard Deviation	24.5

Problem 3

Interpret the chart below.

Solution Problem 1

Observation	Value
1	5
2	16
3	24
4	28
5	30
6	31
7	32
8	35
9	95
Total	296

The mean is calculated as,

\bar{x} = \dfrac{296}{9} = 32.9

The median is the midpoint of the data set. Because our data is already ordered form least to greatest, we simply need to find the middle value. In this case, it is the 5th observation, which has a value of 30.

The interquartile range is found by splitting the data into fourths. Doing this gives us the following quartiles:

Q0 = 5
Q1 = 24
Q2 = 30
Q3 = 32
Q4 = 95

Next, the IQR can be calculated as,

IQR = Q3 - Q1 = 32-24 = 8

Solution Problem 2

Find the step-by-step solution below.

Observation	Value
1	4
2	6
3	3
4	9
5	60
Mean	16.4
Standard Deviation	24.5

Using the standard deviation method to identify an outlier can be done by standardizing the data point. We suspect the fifth observation may be an outlier.

z_{i} = \dfrac{60-16.4}{24.5} = 1.78

This means that the 60 is about 1.8 $\text{[math]}$ away from the mean. While this is still well within the 3 $\text{[math]}$ normally used for finding outliers in the standard deviation method, you may want to consider setting the limit at a lower $\text{[math]}$ since the sample size is small.

Solution Problem 3

Quartile	Interpretation
Q0	The minimum, located at 0
Q1	25% of the data is below 35
Q2	50% of the data is above and below 50
Q3	75% of the data is below 65
Q4	The maximum, located at 100

Did you like this article? Rate it!

4.00 (2 rating(s))

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.

Formulas

Statistical Formulas

Descriptive Statistical Formulas

Can you help me answer my activities

Solutions to Mean, Median and Mode Problems

The Interquartile Range

Outliers

Influential Observation

How to Identify Outliers

Practice Problem 1

Problem 2

Problem 3

Solution Problem 1

Solution Problem 2

Solution Problem 3

Theory

Frequency Distribution

Solutions to Average Deviation, Variance and Standard Deviation Problems

Solutions to Quartiles, Deciles and Percentiles Problems

Solutions to Statistical Measures Problems

Solutions to Discrete and Continuous Variable Problems

Solutions to Frequency Distribution Problems

Solutions to Bar Chart Problems

Solutions to Mode, Median, Mean, Range, Average Deviation, Variance and Standard Deviation Problems

Solutions to Mean Problems

Solutions to Coefficient of Variation Problems

Solutions to Pie Chart and Mean Problems

Solutions to Median and Quartile Problems

Standard Deviation

Mean, Median and Mode Problem

Solutions to Categorical, Discrete and Continuous Variable Problems

Solutions to Histogram and Cumulative Frequency Polygon Problems

Coefficient of Variation

Standard Scores

Statistics

Variance

Solutions to Histogram, Mode and Median Problems

Solutions to Absolute Cumulative Frequency Distribution Problems

Solutions to Mean and Standard Deviation Problems

Bar Charts

Solutions to Categorical and Quantitative Variables Problems

Solutions to Variance and Standard Deviation Problems