Skip to main content

Central Limit Theorem

 

Central Limit Theorem (CLT) – Explained Simply

The Central Limit Theorem (CLT) is a fundamental concept in statistics that states:

When you take a sufficiently large number of random samples from any population, the distribution of the sample means will approximate a normal (Gaussian) distribution, regardless of the population's original distribution.

This holds true as long as the sample size is large enough (typically n ≥ 30 is considered sufficient in practice).


1. Key Components of CLT

  • Population Distribution: The original data can have any distribution (e.g., uniform, skewed, exponential).

  • Random Sampling: We take multiple random samples of size nn.

  • Sample Means Distribution: If we plot the means of these samples, the resulting distribution will be approximately normal.

  • Larger Samples Improve Approximation: As nn increases, the sample means' distribution becomes closer to a true normal distribution.


2. Mathematical Formulation

Let:

  • X1,X2,,XnX_1, X_2, \dots, X_n be a random sample of size nn from a population with mean μ\mu and standard deviation σ\sigma.

  • The sample mean is:

    Xˉ=X1+X2+...+Xnn\bar{X} = \frac{X_1 + X_2 + ... + X_n}{n}

According to the Central Limit Theorem:

Xˉμσ/nN(0,1)\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \approx N(0,1)

where:

  • Xˉ\bar{X} is normally distributed,

  • μ\mu is the true mean of the population,

  • σ/n\sigma / \sqrt{n} is the standard error (how much the sample means vary).

As nn \to \infty, the distribution of Xˉ\bar{X} approaches a normal distribution N(μ,σ2/n)N(\mu, \sigma^2/n).


3. Why is CLT Important?

Justifies Normal Assumptions: Even if data isn't normal, sample means follow a normal distribution.
Foundation for Confidence Intervals & Hypothesis Testing: Used in t-tests, z-tests, and regression analysis.
Simplifies Analysis: Allows us to make inferences about a population even if we don’t know its true distribution.


4. Visualization of CLT

Imagine a skewed distribution (e.g., exponential). If we take small samples and compute their means, the resulting distribution of sample means gradually looks normal as we increase the sample size.

Let's visualize the Central Limit Theorem (CLT) using Python! 🚀

We'll start with a highly skewed distribution (Exponential Distribution) and take multiple random samples of different sizes. We'll then compute the sample means and plot their distribution to see how it gradually becomes normal.


🔢 Steps to Simulate CLT:

  1. Generate a skewed population (e.g., exponential distribution).

  2. Take many random samples of different sizes (e.g., n=5, n=30, n=100).

  3. Compute the sample means for each case.

  4. Plot the histogram of sample means to observe the shape.


📜 Python Code for CLT Visualization

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Create a highly skewed population (Exponential Distribution)
population = np.random.exponential(scale=2.0, size=100000)  # Skewed Data

# Function to demonstrate CLT
def sample_means(sample_size, num_samples):
    means = [np.mean(np.random.choice(population, sample_size, replace=True)) for _ in range(num_samples)]
    return means

# Step 2: Take samples of different sizes
sample_sizes = [5, 30, 100]  # Small, medium, large sample sizes
num_samples = 1000  # Number of samples taken

# Step 3: Plot results
fig, axes = plt.subplots(1, 4, figsize=(20, 5))

# Plot original skewed population
sns.histplot(population, bins=50, kde=True, ax=axes[0], color="red")
axes[0].set_title("Original Skewed Population")

# Plot histograms of sample means for different sample sizes
for i, size in enumerate(sample_sizes):
    sample_means_data = sample_means(size, num_samples)
    sns.histplot(sample_means_data, bins=30, kde=True, ax=axes[i + 1], color="blue")
    axes[i + 1].set_title(f"Sample Size = {size}")

plt.show()

🔍 What Will You See?

  1. The first plot (left) shows the original skewed distribution.

  2. The next three plots show the distribution of sample means:

    • Small sample size (n=5) → Still skewed.

    • Medium sample size (n=30) → More symmetric.

    • Large sample size (n=100) → Approaching a perfect normal distribution!


📌 Key Takeaways from the Visualization

✅ Even though the original population is not normal, the sample means become normal as sample size increases.
✅ Larger sample sizes reduce variability, making estimates more reliable.
✅ This property allows us to use normal-based statistical methods (e.g., confidence intervals, hypothesis testing) even when the raw data isn’t normally distributed.

Here is the output visualization of the Central Limit Theorem (CLT) in action! 📊

  • Leftmost Plot (Red) → Original skewed population (Exponential distribution).

  • Next Three Plots (Blue) → Distributions of sample means for different sample sizes:

    • n=5 → Still skewed.

    • n=30 → Starts looking more normal.

    • n=100 → Nearly a perfect normal distribution!

This demonstrates how the CLT makes the distribution of sample means normal, even if the original data isn't normal. 🎯

Output image


Comments

Popular posts from this blog

Simple Linear Regression - and Related Regression Loss Functions

Today's Topics: a. Regression Algorithms  b. Outliers - Explained in Simple Terms c. Common Regression Metrics Explained d. Overfitting and Underfitting e. How are Linear and Non Linear Regression Algorithms used in Neural Networks [Future study topics] Regression Algorithms Regression algorithms are a category of machine learning methods used to predict a continuous numerical value. Linear regression is a simple, powerful, and interpretable algorithm for this type of problem. Quick Example: These are the scores of students vs. the hours they spent studying. Looking at this dataset of student scores and their corresponding study hours, can we determine what score someone might achieve after studying for a random number of hours? Example: From the graph, we can estimate that 4 hours of daily study would result in a score near 80. It is a simple example, but for more complex tasks the underlying concept will be similar. If you understand this graph, you will understand this blog. Sim...

What problems can AI Neural Networks solve

How does AI Neural Networks solve Problems? What problems can AI Neural Networks solve? Based on effectiveness and common usage, here's the ranking from best to least suitable for neural networks (Classification Problems, Regression Problems and Optimization Problems.) But first some Math, background and related topics as how the Neural Network Learn by training (Supervised Learning and Unsupervised Learning.)  Background Note - Mathematical Precision vs. Practical AI Solutions. Math can solve all these problems with very accurate results. While Math can theoretically solve classification, regression, and optimization problems with perfect accuracy, such calculations often require impractical amounts of time—hours, days, or even years for complex real-world scenarios. In practice, we rarely need absolute precision; instead, we need actionable results quickly enough to make timely decisions. Neural networks excel at this trade-off, providing "good enough" solutions in seco...

Activation Functions in Neural Networks

  A Guide to Activation Functions in Neural Networks 🧠 Question: Without activation function can a neural network with many layers be non-linear? Answer: Provided at the end of this document. Activation functions are a crucial component of neural networks. Their primary purpose is to introduce non-linearity , which allows the network to learn the complex, winding patterns found in real-world data. Without them, a neural network, no matter how deep, would just be a simple linear model. In the diagram below the f is the activation function that receives input and send output to next layers. Commonly used activation functions. 1. Sigmoid Function 2. Tanh (Hyperbolic Tangent) 3. ReLU (Rectified Linear Unit - Like an Electronic Diode) 4. Leaky ReLU & PReLU 5. ELU (Exponential Linear Unit) 6. Softmax 7. GELU, Swish, and SiLU 1. Sigmoid Function                       The classic "S-curve," Sigmoid squashes any input value t...