The best Maths tutors available
Paolo
5
5 (63 reviews)
Paolo
£30
/h
Gift icon
1st lesson free!
Hiren
5
5 (23 reviews)
Hiren
£150
/h
Gift icon
1st lesson free!
Akash
5
5 (58 reviews)
Akash
£45
/h
Gift icon
1st lesson free!
Sehaj
4.9
4.9 (47 reviews)
Sehaj
£40
/h
Gift icon
1st lesson free!
Luke
5
5 (76 reviews)
Luke
£125
/h
Gift icon
1st lesson free!
Johann
5
5 (35 reviews)
Johann
£35
/h
Gift icon
1st lesson free!
Intasar
5
5 (48 reviews)
Intasar
£79
/h
Gift icon
1st lesson free!
Harinder
5
5 (36 reviews)
Harinder
£20
/h
Gift icon
1st lesson free!
Paolo
5
5 (63 reviews)
Paolo
£30
/h
Gift icon
1st lesson free!
Hiren
5
5 (23 reviews)
Hiren
£150
/h
Gift icon
1st lesson free!
Akash
5
5 (58 reviews)
Akash
£45
/h
Gift icon
1st lesson free!
Sehaj
4.9
4.9 (47 reviews)
Sehaj
£40
/h
Gift icon
1st lesson free!
Luke
5
5 (76 reviews)
Luke
£125
/h
Gift icon
1st lesson free!
Johann
5
5 (35 reviews)
Johann
£35
/h
Gift icon
1st lesson free!
Intasar
5
5 (48 reviews)
Intasar
£79
/h
Gift icon
1st lesson free!
Harinder
5
5 (36 reviews)
Harinder
£20
/h
Gift icon
1st lesson free!
Let's go

A Guide to Statistics

In previous sections, you learned about the concepts involved in descriptive statistics. Specifically, we showed you the different measures involved in measures of central tendency and variability, as well as how to calculate each. In addition, we walked you through the types of variables involved in statistics as well as the types of analysis and visualizations you could make using data. Here, we’ll help you review everything related to descriptive statistics.

 

What are Descriptive Statistics?

The field of statistics is generally divided into two types of statistics: descriptive and inferential statistics. Descriptive statistics is, luckily, exactly what it sounds like: it involves analysing data on a descriptive basis. If this sounds confusing, let’s oppose it to inferential statistics in the table below.

 

Descriptive Statistics Inferential Statistics
Makes statements about what is within the data Makes predictions using of data points outside the data set by using the information within the data
Conveys information through measures like mean and standard deviation Conveys information through predictive models
Visualizations generally include:

  • Bar charts
  • Pie charts
  • Histograms
  • Line graphs
Visualizations generally include:

  • Line graphs
  • Scatterplots

 

While this general information is by no means exhaustive, it can be a great starting point for understanding the differences between the two branches of statistics. The goal of descriptive statistics is to either summarize the characteristics of a data set or to analyse a data set by utilizing its descriptive properties.

 

Population

The units used in descriptive statistics can be anything. People using descriptive statistics can strive to measure things like:

  • Rainfall
  • Trees in parks
  • Tourists at a beach

The analysis that can be done using descriptive statistics alone isn’t just vastly diverse, it is also the majority of what many people use. The units that people strive to measure, however, need to be clearly defined in order to properly understand any data.

In statistics, the elements people want to study are split into a population and a sample. A population is the actual group of elements that you want to study. A population could be anything and take on any form. In the previous examples, the population would take the following form.

 

Elements Population
Rainfall Total rain produced
Trees in a park All the trees in a park
Tourists at a beach Total number of tourists at a beach

 

While this may seem simple, and it is, populations are notoriously hard to measure. While surveying the total number of trees in a park might be an easy task to accomplish if it involves a local city park, imagine the same task applied to a national forest. Often times, there is not enough financial resources or time to be able to measure an entire population. That is why in statistics you’ll often encounter samples.

 

Sample

A sample is a part of a population, where the elements and units might be the same. A sample is drawn from a population in order to make the data collection process cheaper and more time efficient. Taking the previous example, let’s take a look at the differences between a population and a sample.

 

Population Sample
Total rain produced Rainfall produced in an hour in one location of a city
All the trees in a park Number of trees in measured in a one-kilometre radius
Total number of tourists at a beach Number of tourists arriving at the beach at three specific times in a day

 

As you can guess, samples tend to include a fraction of the elements that are included in a population. There are many different methods for drawing a sample, which include:

  • Simple Random Sampling
  • Stratified Sampling
  • Cluster Sampling
  • Quota Sampling

As you can imagine, each sampling method has their advantages and disadvantages. The sampling method that is desired in most cases is simple random sampling, also known as SRS.

The reason is because it involves a completely random selection of elements from a population, which can decrease variability in the estimation of statistical measures. An SRS can be conducted with or without replacement.

Because the true population measure, or the measure we would have calculated had we measured the entire population, is unknown, measures calculated from samples are always considered as estimates of the population. A measure from a population is called a “parameter” while a measure from a sample is called a “statistic.”

 

Measures of Central Tendency

Measures of central tendency is a long name for something simple: measuring the centre. The reason why people like to measure the centre point of a data set is because it generally indicates what the most “typical” value of the data looks like.

There are three basic measures of central tendency: the mean, median and mode. Some rules of thumb for remembering when each of them is used are:

  • When the data includes extreme values or outliers, the median is better
  • When the data doesn’t include outliers and you want to measure the average, use the mean
  • When you want to know the value or category with the highest frequency, use the mode

Below are the formulas for each measure.

Sample Population
Mean \[

\bar{x} = \frac{\Sigma x_{i}}{n}

\]

\[

\mu = \frac{\Sigma x_{i}}{N}

\]

Median Midpoint of ordered data points, the average of the two midpoint values if it’s an even number of values Calculated the same as the sample
Mode The value or category with the highest frequency Calculated the same as the sample

 

Measures of Variability

Unlike measures of central tendency, measures of variability strive to capture how the data are spread around the centre values. The two most basic types of variability measures include variance and standard deviation. Other common measures include:

  • Coefficient of Variation
  • Covariance
  • Standard Error

The spread of a data set is how closely or how far apart the data lie around the centre. While variance is used throughout statistics, standard deviation tends to be preferred when speaking to the spread of a data set because its units are easy to interpret.

Below you’ll find the formulas for standard deviation and variance for populations and samples.

Sample Population
Variance \[

s^2 =  \frac{\Sigma(x_{i}-\bar{x})^2}{n-1}

\]

\[

\sigma^2 =  \frac{\Sigma(x_{i}-\mu)^2}{n}

\]

Standard Deviation \[

s =  \sqrt{ \frac{\Sigma(x_{i}-\bar{x})^2}{n-1} }

\]

\[

\sigma = \sqrt{  \frac{\Sigma(x_{i}-\mu)^2}{n} }

\]

Notice that the standard deviation is simply the square root of the variance.

 

Notation of Measures of Central Tendency and Variability

As you may have noticed, the measures for the population and sample have different notations. These parameters are standardized throughout the statistical world. Meaning, you will encounter them everywhere from your textbooks to computer programs. Below, we’ve summarized the notations of the mean, standard deviation and variance.

Sample Population
Mean \[

\bar{x}

\]

\[

\mu

\]

Standard Deviation \[

s

\]

\[

\sigma

\]

Variance \[

s^2

\]

\[

\sigma^2

\]

 

Types of Variables

There are many variable types, all used in different statistical analysis. The most common variable distinction is made between two variables: qualitative and quantitative variables, also known as categorical and numerical variables.

Qualitative variables are those that involve categories. They are called qualitative because they describe a variable’s characteristics, or qualities. These include variables like:

  • Colour
  • Shape
  • Gender

Quantitative variables, on the other hand, involve variables that measure quantities of something. These include variables like:

  • Height
  • Age
  • Weight

Quantitative and qualitative variables can be further broken down into sub-groups. Below you’ll find a summary.

Data

A collection of observations, measurements or ideas on specific variables

Quantitative

Qualitative

Numeric information about a place, person or thing

Descriptive information about a place, person or thing

Ordinal

Nominal

Ordered based on a specific scale

Not ordered on a scale

 

Data Visualization

Data visualization is an integral part of descriptive statistics and is defined by displaying information visually. The most common visualizations in descriptive statistics include:

  • Bar charts
  • Pie charts
  • Line graphs
  • Histograms

Did you like this article? Rate it!

1 Star2 Stars3 Stars4 Stars5 Stars 4.00 (3 rating(s))
Loading...
Emma

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.