Random Variables and Distributions

Random Variables and Distributions - Complete Guide

Random Variables: A random variable is a function that maps outcomes of random experiments to numerical values, not a traditional "variable" but a mathematical transformation. For example, in coin flipping, we might assign heads=1 and tails=0. Two types exist: discrete random variables take countable values (0,1,2,3...) like number of customers or dice rolls, while continuous random variables take any value within intervals (height, time, temperature). Random variables are characterized by their probability distributions, expected value (long-run average), and variance (measure of spread). They provide the mathematical foundation for modeling uncertainty in real-world phenomena, enabling statistical analysis, predictions, and decision-making across virtually all quantitative fields including science, engineering, finance, and machine learning.

Probability Distribution Functions: These mathematical functions describe how probabilities are distributed across possible values of a random variable. Three main types exist: PMF (Probability Mass Function) for discrete variables provides exact probabilities P(X=x) for each value, like P(die=3)=1/6. PDF (Probability Density Function) for continuous variables describes probability density where area under the curve gives probability over intervals; exact point probabilities equal zero. CDF (Cumulative Distribution Function) works for both types, giving F(x)=P(X≤x), the probability X doesn't exceed x. All distribution functions must be non-negative and sum/integrate to one. They characterize random variables completely, enabling calculation of probabilities, means, variances, and supporting statistical inference across science, engineering, finance, and data science applications. Common examples: Binomial (coin flips), Poisson (events over time), Normal (bell curve). These mathematical tools model uncertainty in real-world phenomena.

Discrete Distributions: Discrete distributions describe random variables that take countable values (0,1,2,3...) with gaps between possible outcomes. Key distributions include: Bernoulli (single yes/no trial with probability p), Binomial (counting successes in n independent trials), Poisson (events occurring in fixed time/space intervals at average rate λ), Geometric (trials until first success), and Negative Binomial (trials until r-th success). Each uses a Probability Mass Function (PMF) giving exact probabilities P(X=x) for each value, where all probabilities are non-negative and sum to one. Common applications include quality control (defective items), customer arrivals, test questions answered correctly, and disease occurrences. These distributions model real-world scenarios involving counting and enable probability calculations, hypothesis testing, and statistical inference.

Expectation and Variance: Expectation (or expected value) E[X] is the long-run average value of a random variable, calculated as Σx·P(X=x) for discrete or ∫x·f(x)dx for continuous variables. It represents the "center" of the distribution. Variance Var(X)=E[(X-μ)²] measures spread or variability around the mean, indicating how dispersed values are from expectation. Computed as E[X²]-(E[X])², variance is always non-negative. Standard deviation σ=√Var(X) provides spread in original units. Key properties: E[aX+b]=aE[X]+b (linearity), Var(aX+b)=a²Var(X) (scaling), and for independent variables, E[X+Y]=E[X]+E[Y] and Var(X+Y)=Var(X)+Var(Y). These fundamental measures characterize distributions and enable statistical inference, risk assessment, and decision-making across all quantitative fields.

Let me explain this comprehensively, building from the basics to more advanced concepts.

Part 1: What is a Random Variable?

The Intuitive Idea

A random variable is NOT actually a "variable" in the traditional sense - it's a function that assigns numerical values to outcomes of a random experiment.

Think of it like this:

You perform a random experiment (flip coins, roll dice, measure height)
Different outcomes can happen
A random variable assigns a NUMBER to each possible outcome

Formal Definition

A random variable is a function that maps outcomes from a sample space to real numbers.

Random Variable: X : S → ℝ
Where S = Sample Space (all possible outcomes)

Simple Example: Coin Flips

Experiment: Flip a coin twice

Sample Space: S = {HH, HT, TH, TT}

Random Variable X = "Number of heads"

X(HH) = 2
X(HT) = 1
X(TH) = 1
X(TT) = 0

The random variable X assigns numbers to outcomes!

Part 2: Types of Random Variables

There are two main types:

1. Discrete Random Variables

Definition: Can take on countable values (you can list them: 0, 1, 2, 3, ...)

Examples:

Number of heads in 10 coin flips: {0, 1, 2, ..., 10}
Number of customers per hour: {0, 1, 2, 3, ...}
Number of defective items in a batch: {0, 1, 2, ..., n}
Score on a die roll: {1, 2, 3, 4, 5, 6}

Key characteristic: There are gaps between possible values.

2. Continuous Random Variables

Definition: Can take on any value within an interval (uncountably infinite)

Examples:

Height of a person: any value between 0 and 8 feet
Time until next phone call: any value ≥ 0
Temperature: any real number
Weight of a product: any positive real number

Key characteristic: No gaps - every value in an interval is possible.

Part 3: Probability Distributions

A probability distribution describes how probabilities are distributed over the values of a random variable.

For Discrete Random Variables: PMF

PMF = Probability Mass Function

The PMF gives the probability that X equals a specific value:

P(X = x) = probability that random variable X equals value x

Properties:

Non-negative: P(X = x) ≥ 0 for all x
Sums to 1: Σ P(X = x) = 1 (sum over all possible values)

Example: Die Roll

X = outcome of rolling a fair die

x	1	2	3	4	5	6
P(X=x)	1/6	1/6	1/6	1/6	1/6	1/6

For Continuous Random Variables: PDF

PDF = Probability Density Function

For continuous variables, P(X = exact value) = 0 (infinitely many possibilities!)

Instead, we use the PDF f(x) and calculate probabilities over intervals:

P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx

Properties:

Non-negative: f(x) ≥ 0 for all x
Integrates to 1: ∫₋∞^∞ f(x) dx = 1

Important: f(x) itself is NOT a probability! It's a density. The area under the curve gives probability.

Example: Uniform Distribution on [0, 1]

f(x) = 1  for 0 ≤ x ≤ 1
f(x) = 0  otherwise

P(0.2 ≤ X ≤ 0.5) = ∫₀.₂^⁰·⁵ 1 dx = 0.3

Part 4: Cumulative Distribution Function (CDF)

The CDF works for BOTH discrete and continuous random variables.

Definition:

F(x) = P(X ≤ x)

The CDF gives the probability that X is less than or equal to x.

Properties:

Non-decreasing: If x₁ < x₂, then F(x₁) ≤ F(x₂)
Limits: lim(x→-∞) F(x) = 0, lim(x→∞) F(x) = 1
Right-continuous: F(x) is continuous from the right

For Discrete Variables:

F(x) = Σ P(X = k) for all k ≤ x

For Continuous Variables:

F(x) = ∫₋∞ˣ f(t) dt

And the PDF is the derivative of CDF:

f(x) = dF(x)/dx

Example: Die Roll CDF

x	x < 1	1 ≤ x < 2	2 ≤ x < 3	3 ≤ x < 4	4 ≤ x < 5	5 ≤ x < 6	x ≥ 6
F(x)	0	1/6	2/6	3/6	4/6	5/6	1

Part 5: Expected Value (Mean)

The expected value E[X] is the long-run average value of the random variable.

For Discrete Random Variables

E[X] = μ = Σ x · P(X = x)

Sum of (each value × its probability)

Example: Die Roll

E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6)
     = (1+2+3+4+5+6)/6
     = 21/6
     = 3.5

For Continuous Random Variables

E[X] = μ = ∫₋∞^∞ x · f(x) dx

Properties of Expected Value:

Linearity: E[aX + b] = aE[X] + b
Linearity for sums: E[X + Y] = E[X] + E[Y]
For independent variables: E[XY] = E[X]·E[Y]

Part 6: Variance and Standard Deviation

Variance measures how spread out the distribution is around the mean.

Definition

Var(X) = σ² = E[(X - μ)²]

Alternative formula (easier to compute):

Var(X) = E[X²] - [E[X]]²

For Discrete Random Variables

Var(X) = Σ (x - μ)² · P(X = x)

Or:

Var(X) = [Σ x² · P(X = x)] - μ²

For Continuous Random Variables

Var(X) = ∫₋∞^∞ (x - μ)² · f(x) dx

Standard Deviation

σ = √Var(X)

The standard deviation has the same units as X, making it more interpretable.

Properties of Variance:

Var(aX + b) = a²Var(X) (constants affect variance differently!)
For independent X, Y: Var(X + Y) = Var(X) + Var(Y)
Always non-negative: Var(X) ≥ 0

Example: Die Roll

E[X²] = 1²(1/6) + 2²(1/6) + 3²(1/6) + 4²(1/6) + 5²(1/6) + 6²(1/6)
      = (1+4+9+16+25+36)/6
      = 91/6

Var(X) = 91/6 - (3.5)²
       = 91/6 - 12.25
       = 15.17/6 - 12.25
       ≈ 2.92

σ = √2.92 ≈ 1.71

Part 7: Common Discrete Distributions

1. Bernoulli Distribution

Models: Single yes/no trial

Parameters: p (probability of success)

PMF: P(X = 1) = p, P(X = 0) = 1-p

Mean: E[X] = p

Variance: Var(X) = p(1-p)

2. Binomial Distribution

Models: Number of successes in n independent Bernoulli trials

Parameters: n (trials), p (success probability)

PMF: P(X = k) = C(n,k) · p^k · (1-p)^(n-k)

Mean: E[X] = np

Variance: Var(X) = np(1-p)

3. Poisson Distribution

Models: Number of events in a fixed interval (time/space)

Parameters: λ (average rate)

PMF: P(X = k) = (λ^k · e^(-λ)) / k!

Mean: E[X] = λ

Variance: Var(X) = λ

4. Geometric Distribution

Models: Number of trials until first success

Parameters: p (success probability)

PMF: P(X = k) = (1-p)^(k-1) · p

Mean: E[X] = 1/p

Variance: Var(X) = (1-p)/p²

5. Negative Binomial Distribution

Models: Number of trials until r-th success

Parameters: r (successes), p (success probability)

Mean: E[X] = r/p

Variance: Var(X) = r(1-p)/p²

Part 8: Common Continuous Distributions

1. Uniform Distribution

Models: Equal probability over an interval [a, b]

PDF: f(x) = 1/(b-a) for a ≤ x ≤ b

Mean: E[X] = (a+b)/2

Variance: Var(X) = (b-a)²/12

2. Exponential Distribution

Models: Time until next event (waiting times)

Parameters: λ (rate parameter)

PDF: f(x) = λe^(-λx) for x ≥ 0

CDF: F(x) = 1 - e^(-λx)

Mean: E[X] = 1/λ

Variance: Var(X) = 1/λ²

Memoryless property: P(X > s+t | X > s) = P(X > t)

3. Normal (Gaussian) Distribution

Models: Many natural phenomena (heights, test scores, errors)

Parameters: μ (mean), σ² (variance)

PDF: f(x) = (1/(σ√(2π))) · e^(-(x-μ)²/(2σ²))

Notation: X ~ N(μ, σ²)

Mean: E[X] = μ

Variance: Var(X) = σ²

68-95-99.7 Rule:

68% of data within μ ± σ
95% of data within μ ± 2σ
99.7% of data within μ ± 3σ

Standard Normal: N(0, 1) - mean 0, variance 1

Z-score transformation: Z = (X - μ)/σ

4. Gamma Distribution

Models: Sum of exponential random variables

Parameters: α (shape), β (rate)

Mean: E[X] = α/β

Variance: Var(X) = α/β²

5. Beta Distribution

Models: Probabilities and proportions (values between 0 and 1)

Parameters: α, β (shape parameters)

Support: 0 ≤ x ≤ 1

Part 9: Joint Distributions

When you have multiple random variables, you need joint distributions.

Joint PMF (Discrete)

P(X = x, Y = y) = probability that X=x AND Y=y

Properties:

ΣΣ P(X = x, Y = y) = 1 (sum over all x and y)

Joint PDF (Continuous)

P((X,Y) ∈ A) = ∬ₐ f(x,y) dx dy

Properties:

∬ f(x,y) dx dy = 1 (integral over entire plane)

Marginal Distributions

To get the distribution of just X from a joint distribution:

Discrete:

P(X = x) = Σ P(X = x, Y = y) for all y

Continuous:

f_X(x) = ∫ f(x,y) dy

Independence

X and Y are independent if:

Discrete: P(X = x, Y = y) = P(X = x) · P(Y = y) for all x, y

Continuous: f(x,y) = f_X(x) · f_Y(y) for all x, y

Part 10: Covariance and Correlation

Covariance

Measures how two variables vary together:

Cov(X,Y) = E[(X - μ_X)(Y - μ_Y)]
         = E[XY] - E[X]·E[Y]

Properties:

Cov(X,X) = Var(X)
Cov(X,Y) = Cov(Y,X)
If X and Y are independent: Cov(X,Y) = 0
Cov(aX + b, cY + d) = ac·Cov(X,Y)

Interpretation:

Cov(X,Y) > 0: Positive relationship (both increase together)
Cov(X,Y) < 0: Negative relationship (one increases, other decreases)
Cov(X,Y) = 0: No linear relationship

Correlation

Pearson correlation coefficient:

ρ(X,Y) = Cov(X,Y) / (σ_X · σ_Y)

Properties:

-1 ≤ ρ ≤ 1
ρ = 1: Perfect positive linear relationship
ρ = -1: Perfect negative linear relationship
ρ = 0: No linear relationship

Advantage over covariance: Scale-independent!

Part 11: Transformations of Random Variables

For Single Variable

If Y = g(X), how do we find the distribution of Y?

Method 1: CDF Method

Find F_Y(y) = P(Y ≤ y) = P(g(X) ≤ y)
Differentiate to get f_Y(y) = dF_Y(y)/dy

Method 2: Jacobian Method (for continuous)

If Y = g(X) and g is monotonic with inverse X = h(Y):

f_Y(y) = f_X(h(y)) · |dh(y)/dy|

Example: If X ~ N(0,1) and Y = X²

Then Y follows a Chi-square distribution with 1 degree of freedom.

For Multiple Variables

If we have transformations:

U = g₁(X,Y)
V = g₂(X,Y)

We use the Jacobian determinant:

f_{U,V}(u,v) = f_{X,Y}(x,y) · |J|

Where J is the Jacobian matrix determinant.

Part 12: Moment Generating Functions (MGF)

The MGF is a powerful tool for characterizing distributions.

Definition:

M_X(t) = E[e^(tX)]

For discrete:

M_X(t) = Σ e^(tx) · P(X = x)

For continuous:

M_X(t) = ∫ e^(tx) · f(x) dx

Why useful?

Uniqueness: Each distribution has a unique MGF
Moments: The n-th derivative at t=0 gives the n-th moment

M_X^(n)(0) = E[X^n]

Sums: If X and Y are independent:

M_{X+Y}(t) = M_X(t) · M_Y(t)

Example: Bernoulli Distribution

X ~ Bernoulli(p)

M_X(t) = E[e^(tX)] = e^(t·0)·(1-p) + e^(t·1)·p
       = (1-p) + pe^t

Part 13: Central Limit Theorem (CLT)

One of the most important theorems in statistics!

Statement: If X₁, X₂, ..., X_n are independent, identically distributed random variables with mean μ and variance σ², then as n → ∞:

(X̄ - μ) / (σ/√n) → N(0, 1)

Or equivalently:

X̄ ~ N(μ, σ²/n) approximately

Where X̄ = (X₁ + X₂ + ... + X_n)/n is the sample mean.

In plain English: The average of many random variables (regardless of their original distribution) follows a normal distribution!

Practical implications:

Works for ANY distribution (as long as it has finite mean and variance)
Larger n = better approximation
Rule of thumb: n ≥ 30 is usually sufficient

Example: Roll a die 100 times and take the average. Even though individual rolls are uniform, the average will be approximately normal with mean 3.5 and variance (2.92/100).

Part 14: Law of Large Numbers (LLN)

Weak Law of Large Numbers:

As n → ∞, the sample mean X̄ converges in probability to the population mean μ:

P(|X̄ - μ| > ε) → 0 as n → ∞

For any ε > 0, no matter how small.

In plain English: If you repeat an experiment many times, the average result gets closer and closer to the expected value.

Difference from CLT:

LLN: Sample mean converges to population mean
CLT: Sample mean has an approximately normal distribution

Part 15: Practical Applications

1. Quality Control

Binomial: Number of defective items in a batch
Poisson: Number of defects per unit area

2. Finance

Normal: Stock returns
Exponential: Time between trades
Lognormal: Stock prices

3. Insurance

Poisson: Number of claims
Exponential/Gamma: Claim amounts

4. Reliability Engineering

Exponential: Component lifetimes
Weibull: Product failure times

5. Queuing Theory

Poisson: Customer arrivals
Exponential: Service times

6. Machine Learning

Bernoulli: Binary classification
Multinomial: Multi-class classification
Normal: Regression residuals

Part 16: Summary Table

Distribution	Type	Parameters	Mean	Variance	Use Case
Bernoulli	Discrete	p	p	p(1-p)	Single trial
Binomial	Discrete	n, p	np	np(1-p)	n trials, count successes
Poisson	Discrete	λ	λ	λ	Events in time/space
Geometric	Discrete	p	1/p	(1-p)/p²	Trials until success
Uniform	Continuous	a, b	(a+b)/2	(b-a)²/12	Equal probability
Exponential	Continuous	λ	1/λ	1/λ²	Waiting times
Normal	Continuous	μ, σ²	μ	σ²	Natural phenomena
Gamma	Continuous	α, β	α/β	α/β²	Sum of exponentials
Beta	Continuous	α, β	α/(α+β)	-	Proportions

The Bottom Line

Random Variables are functions that assign numbers to random outcomes. They come in two types:

Discrete: Countable values (PMF)
Continuous: Uncountable values (PDF)

Distributions describe how probabilities are spread over possible values:

PMF/PDF: Probability at each point
CDF: Cumulative probability
Expected Value: Long-run average
Variance: Spread around the mean

Key Theorems:

CLT: Averages approach normality
LLN: Averages converge to expected value

Understanding random variables and distributions is fundamental to statistics, probability, data science, machine learning, and countless real-world applications!

Artificial Intelligence Theory and Application