Random Variables and Distributions - Complete Guide
Let me explain this comprehensively, building from the basics to more advanced concepts.
Part 1: What is a Random Variable?
The Intuitive Idea
A random variable is NOT actually a "variable" in the traditional sense - it's a function that assigns numerical values to outcomes of a random experiment.
Think of it like this:
- You perform a random experiment (flip coins, roll dice, measure height)
- Different outcomes can happen
- A random variable assigns a NUMBER to each possible outcome
Formal Definition
A random variable is a function that maps outcomes from a sample space to real numbers.
Random Variable: X : S → ℝ
Where S = Sample Space (all possible outcomes)
Simple Example: Coin Flips
Experiment: Flip a coin twice
Sample Space: S = {HH, HT, TH, TT}
Random Variable X = "Number of heads"
- X(HH) = 2
- X(HT) = 1
- X(TH) = 1
- X(TT) = 0
The random variable X assigns numbers to outcomes!
Part 2: Types of Random Variables
There are two main types:
1. Discrete Random Variables
Definition: Can take on countable values (you can list them: 0, 1, 2, 3, ...)
Examples:
- Number of heads in 10 coin flips: {0, 1, 2, ..., 10}
- Number of customers per hour: {0, 1, 2, 3, ...}
- Number of defective items in a batch: {0, 1, 2, ..., n}
- Score on a die roll: {1, 2, 3, 4, 5, 6}
Key characteristic: There are gaps between possible values.
2. Continuous Random Variables
Definition: Can take on any value within an interval (uncountably infinite)
Examples:
- Height of a person: any value between 0 and 8 feet
- Time until next phone call: any value ≥ 0
- Temperature: any real number
- Weight of a product: any positive real number
Key characteristic: No gaps - every value in an interval is possible.
Part 3: Probability Distributions
A probability distribution describes how probabilities are distributed over the values of a random variable.
For Discrete Random Variables: PMF
PMF = Probability Mass Function
The PMF gives the probability that X equals a specific value:
P(X = x) = probability that random variable X equals value x
Properties:
- Non-negative: P(X = x) ≥ 0 for all x
- Sums to 1: Σ P(X = x) = 1 (sum over all possible values)
Example: Die Roll
X = outcome of rolling a fair die
| x | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| P(X=x) | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |
For Continuous Random Variables: PDF
PDF = Probability Density Function
For continuous variables, P(X = exact value) = 0 (infinitely many possibilities!)
Instead, we use the PDF f(x) and calculate probabilities over intervals:
P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
Properties:
- Non-negative: f(x) ≥ 0 for all x
- Integrates to 1: ∫₋∞^∞ f(x) dx = 1
Important: f(x) itself is NOT a probability! It's a density. The area under the curve gives probability.
Example: Uniform Distribution on [0, 1]
f(x) = 1 for 0 ≤ x ≤ 1
f(x) = 0 otherwise
P(0.2 ≤ X ≤ 0.5) = ∫₀.₂^⁰·⁵ 1 dx = 0.3
Part 4: Cumulative Distribution Function (CDF)
The CDF works for BOTH discrete and continuous random variables.
Definition:
F(x) = P(X ≤ x)
The CDF gives the probability that X is less than or equal to x.
Properties:
- Non-decreasing: If x₁ < x₂, then F(x₁) ≤ F(x₂)
- Limits: lim(x→-∞) F(x) = 0, lim(x→∞) F(x) = 1
- Right-continuous: F(x) is continuous from the right
For Discrete Variables:
F(x) = Σ P(X = k) for all k ≤ x
For Continuous Variables:
F(x) = ∫₋∞ˣ f(t) dt
And the PDF is the derivative of CDF:
f(x) = dF(x)/dx
Example: Die Roll CDF
| x | x < 1 | 1 ≤ x < 2 | 2 ≤ x < 3 | 3 ≤ x < 4 | 4 ≤ x < 5 | 5 ≤ x < 6 | x ≥ 6 |
|---|---|---|---|---|---|---|---|
| F(x) | 0 | 1/6 | 2/6 | 3/6 | 4/6 | 5/6 | 1 |
Part 5: Expected Value (Mean)
The expected value E[X] is the long-run average value of the random variable.
For Discrete Random Variables
E[X] = μ = Σ x · P(X = x)
Sum of (each value × its probability)
Example: Die Roll
E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6)
= (1+2+3+4+5+6)/6
= 21/6
= 3.5
For Continuous Random Variables
E[X] = μ = ∫₋∞^∞ x · f(x) dx
Properties of Expected Value:
- Linearity: E[aX + b] = aE[X] + b
- Linearity for sums: E[X + Y] = E[X] + E[Y]
- For independent variables: E[XY] = E[X]·E[Y]
Part 6: Variance and Standard Deviation
Variance measures how spread out the distribution is around the mean.
Definition
Var(X) = σ² = E[(X - μ)²]
Alternative formula (easier to compute):
Var(X) = E[X²] - [E[X]]²
For Discrete Random Variables
Var(X) = Σ (x - μ)² · P(X = x)
Or:
Var(X) = [Σ x² · P(X = x)] - μ²
For Continuous Random Variables
Var(X) = ∫₋∞^∞ (x - μ)² · f(x) dx
Standard Deviation
σ = √Var(X)
The standard deviation has the same units as X, making it more interpretable.
Properties of Variance:
- Var(aX + b) = a²Var(X) (constants affect variance differently!)
- For independent X, Y: Var(X + Y) = Var(X) + Var(Y)
- Always non-negative: Var(X) ≥ 0
Example: Die Roll
E[X²] = 1²(1/6) + 2²(1/6) + 3²(1/6) + 4²(1/6) + 5²(1/6) + 6²(1/6)
= (1+4+9+16+25+36)/6
= 91/6
Var(X) = 91/6 - (3.5)²
= 91/6 - 12.25
= 15.17/6 - 12.25
≈ 2.92
σ = √2.92 ≈ 1.71
Part 7: Common Discrete Distributions
1. Bernoulli Distribution
Models: Single yes/no trial
Parameters: p (probability of success)
PMF: P(X = 1) = p, P(X = 0) = 1-p
Mean: E[X] = p
Variance: Var(X) = p(1-p)
2. Binomial Distribution
Models: Number of successes in n independent Bernoulli trials
Parameters: n (trials), p (success probability)
PMF: P(X = k) = C(n,k) · p^k · (1-p)^(n-k)
Mean: E[X] = np
Variance: Var(X) = np(1-p)
3. Poisson Distribution
Models: Number of events in a fixed interval (time/space)
Parameters: λ (average rate)
PMF: P(X = k) = (λ^k · e^(-λ)) / k!
Mean: E[X] = λ
Variance: Var(X) = λ
4. Geometric Distribution
Models: Number of trials until first success
Parameters: p (success probability)
PMF: P(X = k) = (1-p)^(k-1) · p
Mean: E[X] = 1/p
Variance: Var(X) = (1-p)/p²
5. Negative Binomial Distribution
Models: Number of trials until r-th success
Parameters: r (successes), p (success probability)
Mean: E[X] = r/p
Variance: Var(X) = r(1-p)/p²
Part 8: Common Continuous Distributions
1. Uniform Distribution
Models: Equal probability over an interval [a, b]
PDF: f(x) = 1/(b-a) for a ≤ x ≤ b
Mean: E[X] = (a+b)/2
Variance: Var(X) = (b-a)²/12
2. Exponential Distribution
Models: Time until next event (waiting times)
Parameters: λ (rate parameter)
PDF: f(x) = λe^(-λx) for x ≥ 0
CDF: F(x) = 1 - e^(-λx)
Mean: E[X] = 1/λ
Variance: Var(X) = 1/λ²
Memoryless property: P(X > s+t | X > s) = P(X > t)
3. Normal (Gaussian) Distribution
Models: Many natural phenomena (heights, test scores, errors)
Parameters: μ (mean), σ² (variance)
PDF: f(x) = (1/(σ√(2π))) · e^(-(x-μ)²/(2σ²))
Notation: X ~ N(μ, σ²)
Mean: E[X] = μ
Variance: Var(X) = σ²
68-95-99.7 Rule:
- 68% of data within μ ± σ
- 95% of data within μ ± 2σ
- 99.7% of data within μ ± 3σ
Standard Normal: N(0, 1) - mean 0, variance 1
Z-score transformation: Z = (X - μ)/σ
4. Gamma Distribution
Models: Sum of exponential random variables
Parameters: α (shape), β (rate)
Mean: E[X] = α/β
Variance: Var(X) = α/β²
5. Beta Distribution
Models: Probabilities and proportions (values between 0 and 1)
Parameters: α, β (shape parameters)
Support: 0 ≤ x ≤ 1
Part 9: Joint Distributions
When you have multiple random variables, you need joint distributions.
Joint PMF (Discrete)
P(X = x, Y = y) = probability that X=x AND Y=y
Properties:
- ΣΣ P(X = x, Y = y) = 1 (sum over all x and y)
Joint PDF (Continuous)
P((X,Y) ∈ A) = ∬ₐ f(x,y) dx dy
Properties:
- ∬ f(x,y) dx dy = 1 (integral over entire plane)
Marginal Distributions
To get the distribution of just X from a joint distribution:
Discrete:
P(X = x) = Σ P(X = x, Y = y) for all y
Continuous:
f_X(x) = ∫ f(x,y) dy
Independence
X and Y are independent if:
Discrete: P(X = x, Y = y) = P(X = x) · P(Y = y) for all x, y
Continuous: f(x,y) = f_X(x) · f_Y(y) for all x, y
Part 10: Covariance and Correlation
Covariance
Measures how two variables vary together:
Cov(X,Y) = E[(X - μ_X)(Y - μ_Y)]
= E[XY] - E[X]·E[Y]
Properties:
- Cov(X,X) = Var(X)
- Cov(X,Y) = Cov(Y,X)
- If X and Y are independent: Cov(X,Y) = 0
- Cov(aX + b, cY + d) = ac·Cov(X,Y)
Interpretation:
- Cov(X,Y) > 0: Positive relationship (both increase together)
- Cov(X,Y) < 0: Negative relationship (one increases, other decreases)
- Cov(X,Y) = 0: No linear relationship
Correlation
Pearson correlation coefficient:
ρ(X,Y) = Cov(X,Y) / (σ_X · σ_Y)
Properties:
- -1 ≤ ρ ≤ 1
- ρ = 1: Perfect positive linear relationship
- ρ = -1: Perfect negative linear relationship
- ρ = 0: No linear relationship
Advantage over covariance: Scale-independent!
Part 11: Transformations of Random Variables
For Single Variable
If Y = g(X), how do we find the distribution of Y?
Method 1: CDF Method
- Find F_Y(y) = P(Y ≤ y) = P(g(X) ≤ y)
- Differentiate to get f_Y(y) = dF_Y(y)/dy
Method 2: Jacobian Method (for continuous)
If Y = g(X) and g is monotonic with inverse X = h(Y):
f_Y(y) = f_X(h(y)) · |dh(y)/dy|
Example: If X ~ N(0,1) and Y = X²
Then Y follows a Chi-square distribution with 1 degree of freedom.
For Multiple Variables
If we have transformations:
- U = g₁(X,Y)
- V = g₂(X,Y)
We use the Jacobian determinant:
f_{U,V}(u,v) = f_{X,Y}(x,y) · |J|
Where J is the Jacobian matrix determinant.
Part 12: Moment Generating Functions (MGF)
The MGF is a powerful tool for characterizing distributions.
Definition:
M_X(t) = E[e^(tX)]
For discrete:
M_X(t) = Σ e^(tx) · P(X = x)
For continuous:
M_X(t) = ∫ e^(tx) · f(x) dx
Why useful?
- Uniqueness: Each distribution has a unique MGF
- Moments: The n-th derivative at t=0 gives the n-th moment
M_X^(n)(0) = E[X^n]
- Sums: If X and Y are independent:
M_{X+Y}(t) = M_X(t) · M_Y(t)
Example: Bernoulli Distribution
X ~ Bernoulli(p)
M_X(t) = E[e^(tX)] = e^(t·0)·(1-p) + e^(t·1)·p
= (1-p) + pe^t
Part 13: Central Limit Theorem (CLT)
One of the most important theorems in statistics!
Statement: If X₁, X₂, ..., X_n are independent, identically distributed random variables with mean μ and variance σ², then as n → ∞:
(X̄ - μ) / (σ/√n) → N(0, 1)
Or equivalently:
X̄ ~ N(μ, σ²/n) approximately
Where X̄ = (X₁ + X₂ + ... + X_n)/n is the sample mean.
In plain English: The average of many random variables (regardless of their original distribution) follows a normal distribution!
Practical implications:
- Works for ANY distribution (as long as it has finite mean and variance)
- Larger n = better approximation
- Rule of thumb: n ≥ 30 is usually sufficient
Example: Roll a die 100 times and take the average. Even though individual rolls are uniform, the average will be approximately normal with mean 3.5 and variance (2.92/100).
Part 14: Law of Large Numbers (LLN)
Weak Law of Large Numbers:
As n → ∞, the sample mean X̄ converges in probability to the population mean μ:
P(|X̄ - μ| > ε) → 0 as n → ∞
For any ε > 0, no matter how small.
In plain English: If you repeat an experiment many times, the average result gets closer and closer to the expected value.
Difference from CLT:
- LLN: Sample mean converges to population mean
- CLT: Sample mean has an approximately normal distribution
Part 15: Practical Applications
1. Quality Control
- Binomial: Number of defective items in a batch
- Poisson: Number of defects per unit area
2. Finance
- Normal: Stock returns
- Exponential: Time between trades
- Lognormal: Stock prices
3. Insurance
- Poisson: Number of claims
- Exponential/Gamma: Claim amounts
4. Reliability Engineering
- Exponential: Component lifetimes
- Weibull: Product failure times
5. Queuing Theory
- Poisson: Customer arrivals
- Exponential: Service times
6. Machine Learning
- Bernoulli: Binary classification
- Multinomial: Multi-class classification
- Normal: Regression residuals
Part 16: Summary Table
| Distribution | Type | Parameters | Mean | Variance | Use Case |
|---|---|---|---|---|---|
| Bernoulli | Discrete | p | p | p(1-p) | Single trial |
| Binomial | Discrete | n, p | np | np(1-p) | n trials, count successes |
| Poisson | Discrete | λ | λ | λ | Events in time/space |
| Geometric | Discrete | p | 1/p | (1-p)/p² | Trials until success |
| Uniform | Continuous | a, b | (a+b)/2 | (b-a)²/12 | Equal probability |
| Exponential | Continuous | λ | 1/λ | 1/λ² | Waiting times |
| Normal | Continuous | μ, σ² | μ | σ² | Natural phenomena |
| Gamma | Continuous | α, β | α/β | α/β² | Sum of exponentials |
| Beta | Continuous | α, β | α/(α+β) | - | Proportions |
The Bottom Line
Random Variables are functions that assign numbers to random outcomes. They come in two types:
- Discrete: Countable values (PMF)
- Continuous: Uncountable values (PDF)
Distributions describe how probabilities are spread over possible values:
- PMF/PDF: Probability at each point
- CDF: Cumulative probability
- Expected Value: Long-run average
- Variance: Spread around the mean
Key Theorems:
- CLT: Averages approach normality
- LLN: Averages converge to expected value
Understanding random variables and distributions is fundamental to statistics, probability, data science, machine learning, and countless real-world applications!
Comments
Post a Comment