Inferential Statistics

Dr. Peng Xiao

What is the difference between Probability and Statistics

Probability vs Statistics

Probability Lingo

Statistics Lingo

What Distribution Looks Like (Binomial Distribution)




Denote \(X\) = number of heads when toss 3 fair coins

\(X\sim\) Binomial (n=3, p=0.5)

The probability distribution function of \(X\) is

\[f_X(x) = \binom{3}{x}\cdot (0.5)^x \cdot (0.5)^{3-x}, \text{ where } x=0,1,2,3 \]

What Does Distribution Looks like (Normal Distribution)


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters for the normal distribution (mean and standard deviation)
mean_height = 170  # mean height in centimeters
std_dev = 10        # standard deviation in centimeters

# Generate a range of heights for plotting (e.g., from 130 to 210 cm)
heights = np.linspace(130, 210, 1000)

# Calculate the probability density function (PDF) of the normal distribution
pdf_heights = norm.pdf(heights, loc=mean_height, scale=std_dev)

# Plotting the normal distribution curve
plt.figure(figsize=(10, 6))
plt.plot(heights, pdf_heights, color='blue', label='Normal Distribution')

# Highlight the mean and ±1 standard deviation range
plt.axvline(mean_height, color='red', linestyle='--', label='Mean Height')
plt.axvline(mean_height - std_dev, color='green', linestyle='--', label='Mean - 1 Std Dev')
plt.axvline(mean_height + std_dev, color='green', linestyle='--', label='Mean + 1 Std Dev')

# Adding labels, title, and legend
plt.xlabel('Height (cm)')
plt.ylabel('Probability Density')
plt.title('Normal Distribution of Adult Human Heights')

# Show the plot

Note - Actual human height data may vary and could exhibit deviations from a perfect normal distribution due to factors such as genetic diversity, environmental influences, and sampling variability. However, the normal distribution is a useful theoretical model for describing and analyzing continuous variables like height in statistical contexts.



Denote \(X\) is the height of adult human.

If \(X\sim N(\mu,\sigma)\), then its probability density function is

\[ f_X(x) = \frac{1}{\sigma\sqrt{2\pi}}\cdot e^{-\frac{(x-\mu)^2}{2\cdot \sigma^2}}, \text{ where } x\in (-\infty,\infty ) \]

What if we know the Population Distribution?

The population distribution function describes the probabilities associated with every possible value of the random variable within the population.

Example (Hypokalemia)

Hypokalemia is diagnosed when blood potassium levels are below 3.5mEq/L. Let’s assume that we know a patient whose measured potassium levels vary daily according to a normal distribution N(μ = 3.8,σ = 0.2). If only one measurement is made, what is the probability that this patient will be misdiagnosed with Hypokalemia?

mosaic :: xpnorm(3.5,mean=3.8,sd=.2)

## [1] 0.0668072

What if we don’t have Population Distribution

Collect a Representative Sample

What if we don’t have Population Distribution

Visualize the Data


hist(VADeaths,breaks=10, col=brewer.pal(3,"Set3"),main="Set3 3 colors")

hist(VADeaths,breaks=3 ,col=brewer.pal(3,"Set2"),main="Set2 3 colors")

hist(VADeaths,breaks=7, col=brewer.pal(3,"Set1"),main="Set1 3 colors")

hist(VADeaths,,breaks= 2, col=brewer.pal(8,"Set3"),main="Set3 8 colors")

hist(VADeaths,col=brewer.pal(8,"Greys"),main="Greys 8 colors")

hist(VADeaths,col=brewer.pal(8,"Greens"),main="Greens 8 colors")

What if we don’t have Population Distribution

Select an Estimation Method

What if we don’t have Population Distribution

Statistical Inference

Methods for drawing conclusions about a population from sample data are called Statistical Inference.

Population Distribution vs. Sampling Distribution

Rather than directly estimating the population distribution, we estimate the sampling distribution of a statistic.

Sampling Distribution

Instead of estimating the population distribution, sampling distribution is focusing on the distribution of a statistic

Sampling distribution is the distribution of all possible values taken by the statistic when all possible samples of a fixed size n are taken from the population

Sampling Distribution of \(\overline{X}\)

Sampling Distributions

Mean of \(\overline{X}\) = Population Mean \(\mu\)

Sampling Distribution of \(\overline{X}\)

Sampling Distributions

\[\sigma_\overline{X} = \sigma/\sqrt{n}\]

Example (IQ Scores)

In a large population of adults, the mean IQ is 112 with standard deviation 20. Suppose 200 adults are randomly selected for a market research campaign.

The distribution of the sample mean IQ is:

  1. Exactly normal, mean 112, standard deviation 20
  2. Approximately normal, mean 112, standard deviation 20
  3. Approximately normal, mean 112, standard deviation 1.414
  4. Approximately normal, mean 112, standard deviation 0.1

Population Distribution : \(N(\mu=112,\sigma=20)\)
Sampling Distribution for \(n=200\) : \(N(\mu=112,\sigma/\sqrt{n}=1.414)\)

Central Limit Theorem

When randomly sampling from any population with mean \(\mu\) and standard deviation \(\sigma\), when \(n\) is large enough, the sampling distribution of is approximately normal: \(\sim N (\mu, \sigma/\sqrt{n})\).

“Randomly” – every individual in the population has an equal chance of being selected and every possible subset of a given size has an equal chance of being chosen.

Large enough?

Example (Hypokalemia)

Hypokalemia is diagnosed when blood potassium levels are below \(3.5\) mEq/L. Let’s assume that we know a patient whose measured potassium levels vary daily according to a normal distribution \(N(\mu = 3.8,\sigma = 0.2)\). If only one measurement is made, what is the probability that this patient will be misdiagnosed with Hypokalemia?

Instead, if measurements are taken on 4 separate days, what is the probability of a misdiagnosis?

We can first look at the graph of the Population Distribution and the Sampling Distribution

The sampling distribution is narrower than the population distribution by a factor of \(\sqrt{n}\)


When we only use one measurement

mosaic :: xpnorm(3.5, 3.8, 0.2)

## [1] 0.0668072

When we use 4 measurements

mosaic:: xpnorm(3.5,3.8,0.2/sqrt(4))

## [1] 0.001349898

Statistical Confidence

Although the sample mean, \(\overline{x}\), is a unique number for any particular sample, if you pick a different sample you will probably get a different sample mean.

In fact, you could get many different values for the sample mean, and virtually none of them would actually equal the true population mean, \(\mu\).

But the sampling distribution of \(\overline{X}\) is narrower than the population distribution, by a factor of \(\sqrt{n}\)

Thus, the estimates gained from our samples are always relatively close to the population parameter \(\mu\).

Within 2 standard deviation of the mean


95% of all sample means will be within roughly 2 standard deviations (\(2\times \sigma/\sqrt{n}\)) of the population parameter \(\mu\).


This implies that the population mean μ must be within roughly 2 standard deviations from (\(2\times \sigma/\sqrt{n}\)) from the sample average \(\overline{x}\), in \(95\%\) of all samples.

Confidence Interval

The confidence interval is a range of values with an associated probability or confidence level. The probability quantifies the chance that the interval contains the true population mean.

Population Distribution - \(N(3.8,0.2)\)
Sample Size = \(4\)
Sample mean = \(3.7\)

\(95\%\) confidence interval : \(3.7±1.96×0.2/\sqrt{4}\) = \(3.7±0.196\) = \((3.504, 3.896)\)

mosaic :: xpnorm(c(-1.96,1.96))

## [1] 0.0249979 0.9750021

We are \(95\%\) confidence that the actual value of \(\mu\) will be in \((3.504, 3.896)\)
With \(95\%\) chance, the actual value of \(\mu\) will be within \(0.196\) units of the value of \(\overline{x}\)

Note –

Interpretation of Confidence Intervals

Cautions about using \(\overline{x}\pm z^*\times \sigma/\sqrt{n}\)

Reasoning of Significance Tests

You are in charge of quality control in a food company. You sample randomly four packs of tomatoes, each labeled 1/2 lb. (227 g).
The average weight from your four packs is 222 g. Obviously, we cannot expect boxes filled with whole tomatoes to all weigh exactly half a pound.

There are two possibilities:

One way to think about this that we want a measure of how extreme the event is that we observed (222 g)? Can probability help us to measure “how extreme?”

Some Assumptions needed to estimate sampling distribution

Reasoning of Significance Tests

After we carefully checked the assumptions, we can frame this problem into a probability problem as follow

\[P(\overline{X} < 222 | \text{Assumptions})\] If assumptions are satisfied,

\[\overline{X} \sim N(227,5/\sqrt{4})\] Therefore,

\[P(\overline{X}<222) = P\left(Z<\frac{222-227}{5/\sqrt{4}}\right)=P(Z<-2) = 0.0228\]

mosaic :: xpnorm(222,227,5/sqrt(4)) 

## [1] 0.02275013

There is only 2.28% chance that you would pick one tomato package with 220 g or less.

Is it an extreme event?

Unusual event happened!

Null and Alternative Hypotheses

The purpose of hypothesis testing is to assess whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

The null hypothesis is a very specific statement about a parameter of the population(s). It is labeled \(H_o\)

\[ H_o : \mu = 227 \]


\[ H_a : \mu <227 \]


Steps for Test of Significance

  1. State the null hypotheses \(H_o\) and the alternative hypothesis \(H_a\).
  2. Calculate value of the test statistic.
  3. Determine the P-value for the observed data.
  4. State a conclusion

Significant Level \(\alpha\)

The significance level, \(\alpha\), is the largest P-value tolerated for rejecting a true null hypothesis (how much evidence against \(H_o\) we require). This value is decided before conducting the test.

If the P-value is equal to or less than \(\alpha\) (\(P ≤ \alpha\)), then we reject \(H_o\). If the P-value is greater than \(\alpha\) (\(P > \alpha\)), then we fail to reject \(H_o\).

When choosing the sifnificance level \(\alpha\)

The power of a test

The power of a test of hypothesis with fixed significance level \(\alpha\) is the probability that the test will reject the null hypothesis when the alternative is true.
In other words, power is the probability that the data gathered in an experiment will be sufficient to reject a wrong null hypothesis.

Knowing the power of your test is important:

Type I and II Errors