Statistics is a set of tools designed to analyze data and deduce information about a population
from a given sample.
What is Statistics?
• It is a three‐steps process:
1. Sampling and design of the experiment: take a sample (or many) from the population, make
observations about the sample, and turn them into numerical data.
2. Descriptive statistics: analyze the data to get information about the sample.
3. Statistical inference: from the data, deduce information about the whole population.
• Context is crucial in this process
A population is a set of individuals (people, cases, etc) that we want to analyze.
A sample is a subset of the population.
A variable is an aspect or characteristic of the population that we want study.
Choosing a sample is a delicate task: the sample must be representative of the population (context
• How big should a sample be? Clearly, it will depend on the situation. Usually, a sample must have
at least 30 individuals.
When choosing a sample, there are several strategies: random sampling, stratified sampling,
cluster sampling, etc
The variables are just questions that we ask the population.
The variables should be neutral:
• Are you in favor of the illegal one‐sided declaration of independence of the autonomous region of
Catalunya from the great Spanish nation?
• Are you in favor of the historically legitimated declaration of independence of the great nation of
Catalunya from the oppressive Spanish state?
• The variables must serve a purpose: if I am interested in the income per family, is it necessary to
ask about music preferences?
• The possible answers of a variable must be very clear from the beginning.
• There are two types of variables, depending on the kind of answer: • Quantitative or numerical:
the answer is a number.
• Qualitative or categorical: the answer is a label (category).
• Quantitative variables can be of two types:
• Discrete: the answers are obtained by counting.
• Continuous: the answers are obtained by measuring.
The sample together with the variable produce some data (that is, the values that the variable
takes on each individual of the sample). Now we have to analyze these data,
We have some tools at our disposal:
- Frequency tables (organize the data)
- Graphic representations of data (visually represent the data)
- Descriptive statistics (measure features of the data)