Different approaches of probability

1. Probability

2. What is an Event

3. Three axioms of probability

4. Bayesian probability

5. Discrete Probability Distributions (Bernoulli Distribution, Binomial Distribution and Poisson Distribution)

First let us see the fundamental vocabulary used in probability. Let's go through each term using the example provided.

Experiment

An experiment is any process or action that has a result that is not known in advance with certainty. It's the overall activity you're conducting.

The experiment is "Flipping a coin twice."

Trial

A trial is a single performance of the experiment. In experiments that consist of multiple actions, each individual action is a trial.

"Each individual flip of the coin" is a trial. The first flip is one trial, and the second flip is another.

Outcome

An outcome is the specific result of an experiment.

The example given is HT (Heads on the first flip, Tails on the second). This is one specific outcome of the two-flip experiment.

Sample Space

The sample space is the complete list of all possible outcomes of an experiment. It's every single result that could possibly happen.

The sample space is {HH, HT, TH, TT}. This is the complete set of results you can get from flipping a coin twice.

Event

An event is a specific outcome or a collection of outcomes that you are interested in. It's a subset of the sample space.

The event is "Getting at least one head." This is a compound event because it includes multiple outcomes from the sample space: {HH, HT, TH}.

Probability Definition

1. Probability is a mathematical framework that quantifies uncertainty by assigning numerical values between 0 and 1 to possible outcomes, enabling us to make rational decisions and predictions when facing incomplete information.

2. The probability of an event is the measure of how likely that event is to occur. For a finite sample space, it is calculated as the ratio of the number of favorable outcomes to the total number of possible outcomes.

In simpler terms, probability is a number between 0 and 1 (or 0% and 100%) that tells you the chance of something happening. The formula is:

P (Event) = \frac{Number of Favorable Outcomes}{Total Number of Possible Outcomes /}

Example: Imagine you roll a standard six-sided die 🎲.

The total number of possible outcomes is 6 (you can roll a 1, 2, 3, 4, 5, or 6).
Let's say our desired event is "rolling an even number."
The number of favorable outcomes for this event is 3 (the numbers 2, 4, and 6).

Using the formula, the probability of rolling an even number is: $P (even number) = \frac{3}{6} = \frac{1}{2}$ or $0.5$ or $50%$ .

What "Incomplete Information" Means

Incomplete information means you don't know everything you'd need to know to be 100% certain about what will happen.

Examples That Show The Difference

Complete Information (No Probability Needed):

You drop a ball while standing on Earth → It WILL fall down (we know gravity exists)
You put ice in a hot oven → It WILL melt (we know ice melts when heated)
You have 5 apples and eat 2 → You WILL have 3 left (basic math)

In these cases, you know all the relevant facts, so you can be certain about the outcome.

Incomplete Information (Probability Helps):

Rolling a die: You don't know which face will land up because you can't know:

The exact force of your throw
The tiny air currents in the room
The precise angle it hits the table
Microscopic imperfections in the die

Weather tomorrow: The weather service doesn't know:

Every air particle's movement right now
All temperature variations across the region
Exact moisture levels everywhere
How all these billions of factors will interact

Will your friend like a movie? You don't know:

Their exact mood that day
Every movie they've ever seen
All their personal preferences
What mindset they'll be in when watching

Why We Usually Have Incomplete Information

There are three main reasons:

Too much to measure - To predict a coin flip perfectly, you'd need to measure the force, angle, air resistance, tiny vibrations, surface imperfections... it's practically impossible!
Hidden information - In a card game, you can't see other players' cards. When taking a test, you don't know what questions will be asked.
Future unknowns - Will your favorite team win tonight? You don't know if a player might get injured, make an amazing play, or have an off day.

How Probability Helps

When you have incomplete information, probability gives you a way to:

Make the best guess possible with what you DO know
Put a number on how confident you should be
Make smart decisions despite not knowing everything

Think of it this way:

Without probability: "Will it rain tomorrow? I don't know, maybe?"

With probability: "There's a 70% chance of rain tomorrow based on current conditions, so I should probably bring an umbrella."

A Simple Analogy

Imagine you're trying to guess what's in a wrapped present:

Complete information: Someone tells you exactly what's inside → No guessing needed
Incomplete information: You can only shake it, feel its weight, see its size → You need probability

You might think: "Based on the size and weight, there's a high probability it's a book, medium chance it's a board game, low chance it's clothing."

The Key Point

Incomplete information doesn't mean NO information - it means you know some things but not everything. Probability helps you make the best use of the partial information you have.

For instance, if you know a basketball player makes 80% of their free throws, you have incomplete information about their next shot (you don't know if it will go in), but you have enough information to say there's an 80% probability they'll make it.

Different approaches of probability

a) Classical Probability

b) Frequentist Probability

c) Axiomatic Probability

Classical Probability

Classical probability, also called the "principle of insufficient reason" or Laplacian probability (after Pierre-Simon Laplace), is the oldest formal approach to probability.

Imagine you have a bag with 10 marbles: 3 red, 5 blue, and 2 green. If you close your eyes and pick one marble, what's the chance it's blue?

Classical probability is like the simplest recipe for finding chances. You just count! There are 5 blue marbles out of 10 total marbles, so the probability is 5 out of 10, or 1/2.

The main idea: When everything has an equal chance of happening, you just divide what you want by the total number of possibilities.

Think about rolling a regular die. Each number (1, 2, 3, 4, 5, 6) has the same chance of appearing because the die is fair. Want to know the chance of rolling a 4? It's 1 out of 6 possibilities, so 1/6.

When it works best: This method is perfect for things like:

Drawing cards from a shuffled deck

Picking names from a hat

Rolling dice or flipping coins

Choosing a random student in your class

The catch: Everything needs to have an equal chance. If someone put weights inside the die to make it land on 6 more often, this method wouldn't work anymore!

Fundamental Principle:

If an experiment has n equally likely outcomes, and event A consists of m of these outcomes, then P(A) = m/n. This is the "favorable outcomes over total outcomes" formula taught in elementary probability.

Key Assumptions:

The classical approach requires two critical assumptions:

a. All outcomes must be equally likely (equiprobability)

b. The sample space must be finite

For a standard die, each face has probability 1/6 because we assume perfect symmetry—no reason to favor any face over another. For a deck of cards, each card has probability 1/52 of being drawn.

The Principle of Insufficient Reason:

When we have no information suggesting one outcome is more likely than another, we assign equal probabilities to all outcomes. This philosophical principle, while intuitive, can lead to paradoxes. Consider Bertrand's paradox: "A factory produces cubes with side lengths between 1 and 3 meters. What's the probability a randomly selected cube has side length less than 2?" Depending on whether we assume uniform distribution over side length, surface area, or volume, we get different answers.

Strengths and Weaknesses:

Classical probability works beautifully for games of chance with natural symmetries—dice, cards, roulette wheels, lottery drawings. The calculations are straightforward and intuitive.

However, it fails when outcomes aren't equally likely (a weighted die), when the sample space is infinite (selecting a random real number between 0 and 1), or when we have partial information suggesting unequal probabilities. Real-world phenomena rarely exhibit the perfect symmetry required for classical probability.

Historical Importance:

Despite its limitations, classical probability was crucial in probability theory's development. It provided the first systematic approach to calculating probabilities and remains the foundation for teaching basic probability concepts. Many combinatorial probability problems still use classical methods when the equal likelihood assumption is justified.

Frequentist Probability

Frequentist probability interprets probability as the long-run frequency of events in repeated experiments. This approach dominated statistical thinking through much of the 20th century.

Let's say you're curious about whether your basketball shots go in more often from the left side or right side of the hoop. How would you figure this out?

A frequentist would say: "Shoot 100 times from the left side and count how many go in. If 40 shots go in, then your probability is 40/100 or 40%."

The main idea: Probability is what actually happens when you repeat something many, many times. It's like keeping score over lots of games to find your true average.

Imagine your friend claims their lucky pencil helps them get better test scores. A frequentist would say: "Use that pencil for 20 tests, use a regular pencil for 20 other tests, and compare the results. The pencil that gives better scores more often is the better one."

Real-life examples:

A baseball player's batting average (hits divided by times at bat)

How often the school bus is late (count late days over the whole year)

The chance your favorite YouTuber posts on Monday (check the last 50 Mondays)

The limitation: You need to be able to repeat things. You can't use this method to find the probability that your teacher will give a pop quiz tomorrow (since tomorrow only happens once), but you could track how often pop quizzes happen on Tuesdays over the whole school year.

Core Concept:

In the frequentist view, probability represents the limiting relative frequency as the number of trials approaches infinity. If we flip a coin infinitely many times, the probability of heads is the proportion of flips that result in heads as n → ∞.

Mathematically: P(A) = lim(n→∞) [n(A)/n], where n(A) is the number of times event A occurs in n trials.

Key Characteristics:

Frequentist probability requires repeatability—we must be able to imagine repeating the experiment indefinitely under identical conditions. This creates challenges for unique events. What does it mean to say there's a 30% chance of rain tomorrow? The frequentist interprets this as: "In the long run, among all days with atmospheric conditions identical to today's, rain occurs on 30% of them."

Frequentists treat probability as an objective property of the physical world, not dependent on anyone's knowledge or beliefs. A die has a 1/6 probability of showing 6, regardless of what anyone believes or knows.

Limitations:

This approach struggles with single-case probabilities (the probability that a specific historical figure wrote a particular document), and with scenarios that can't be repeated (the probability of nuclear war next year). Frequentists often avoid assigning probabilities to hypotheses or parameters, viewing them as fixed but unknown values rather than random variables.

Axiomatic Probability

Axiomatic probability, developed by Andrei Kolmogorov in 1933, provides the mathematical foundation for modern probability theory. Rather than defining what probability "means" philosophically, it establishes a rigorous mathematical framework through axioms.

Instead of telling you HOW to find probabilities, it's like the official rulebook that ALL probabilities must follow, no matter how you calculate them.

Think of it like the rules of a board game. The rules don't tell you how to win, but they tell you what moves are legal and what's not allowed.

Axiomatic Probability - The Three Rules of Chance

What Are Axioms?

Axioms are the most basic rules or statements that we accept as true without proof, which serve as the foundation for building all other knowledge in a system - like agreeing "you can't move your chess piece and your opponent's piece in the same turn" before you can play chess.

Imagine you're inventing a brand new game with your friends. Before you can play, you need to agree on some basic rules that everyone has to follow - like "you can't go outside the boundaries" or "everyone gets a turn." These super important rules that everything else is built on are called axioms. They're like the foundation of a house - everything else stands on top of them!

Axiomatic probability is just three simple rules that ALL probabilities must follow, no matter what. It's like the rules of being fair when talking about chances.

The Three Magic Rules

Rule 1: No Negative Chances (Non-negativity)

The Rule: You can never have less than a 0% chance of something happening.

What it means: Imagine your friend says "There's a negative 10% chance I'll share my candy." That doesn't make any sense! You can have zero chance (impossible), small chance, big chance, or certain - but never less than zero. It's like saying you have negative 5 cookies - you can't have less than zero cookies!

Real example:

Chance of finding a dinosaur in your backyard: 0% ✓
Chance of it raining tomorrow: 30% ✓
Chance of scoring a goal: 50% ✓
Chance of breathing air today: 100% ✓
Chance of growing wings: -20% ✗ (This breaks the rule!)

Rule 2: Something Must Happen (Normalization)

The Rule: If you list every single possible thing that could happen, the chances all add up to 100% (or 1).

What it means: Imagine you're rolling a normal die. It HAS to land on 1, 2, 3, 4, 5, or 6 - there's no other option! So if you add up all the chances: chance of 1 + chance of 2 + chance of 3 + chance of 4 + chance of 5 + chance of 6 = 100%. Something on that list MUST happen!

Real example with a coin flip:

Chance of heads: 50%
Chance of tails: 50%
Total: 50% + 50% = 100% ✓

The coin has to land on something - it can't just disappear!

Fun way to think about it: If your mom says "We're either going to the park, the movies, or staying home" - one of those three things HAS to happen. If the chances are 25% park, 35% movies, then staying home must be 40% (because 25% + 35% + 40% = 100%).

Rule 3: Add Separate Chances (Additivity)

The Rule: If two things can't happen at the same time, you add their chances to find the probability of either happening.

What it means: Let's say you have a bag of marbles:

3 red marbles
2 blue marbles
5 green marbles

What's the chance of pulling out either red OR blue? Since a marble can't be both red AND blue at the same time (they're "mutually exclusive"), you just add:

Chance of red: 3 out of 10 = 30%
Chance of blue: 2 out of 10 = 20%
Chance of red OR blue: 30% + 20% = 50%

But be careful! This only works when things CAN'T happen together. If you're asking "What's the chance it's sunny OR I'm happy?" - you can't just add because both could be true at the same time!

Why These Rules Matter - The Spinner Game

Let's imagine you have a spinner divided into colored sections for a game:

Red takes up 1/2 the spinner
Blue takes up 1/3 the spinner
Yellow takes up 1/6 the spinner

Using Rule 1: Each color has a positive chance (1/2, 1/3, and 1/6 are all greater than 0) ✓

Using Rule 2: Let's check if they add to 1:

1/2 + 1/3 + 1/6 = 3/6 + 2/6 + 1/6 = 6/6 = 1 = 100% ✓
Perfect! The spinner must land on some color!

Using Rule 3: What's the chance of spinning red OR yellow?

Since the spinner can't land on both at once:
P(red or yellow) = 1/2 + 1/6 = 3/6 + 1/6 = 4/6 = 2/3

Breaking the Rules - What Goes Wrong

If we break Rule 1 (negative probability):

"There's a -50% chance I'll win the race!" This is like saying you'll win negative half a race - it's nonsense!

If we break Rule 2 (doesn't add to 100%):

Imagine a weather forecast that says:

30% chance of sun
20% chance of rain
10% chance of snow
Total = 60%

Wait, what happens the other 40% of the time? Does the weather just not exist? This breaks the rule!

If we break Rule 3 (wrong adding):

Your teacher puts everyone's name in a hat once. If there are 20 students and you count:

Chance Sarah is picked: 1/20
Chance Michael is picked: 1/20
Chance Sarah OR Michael is picked: 1/20 + 1/20 = 2/20 = 1/10 ✓

But if someone said the chance is 1/15, they'd be breaking Rule 3!

The Cool Part - Everything Else Comes From These Three Rules!

Just like how you can build an enormous LEGO castle from simple blocks, mathematicians can figure out super complicated probability problems using just these three simple rules. Every single probability fact follows from these three rules:

Why the chance of something NOT happening equals 100% minus the chance it does happen
How to figure out the chance of multiple things happening
Why flipping two coins gives you a 25% chance of two heads
Everything about card games, dice games, and even weather predictions!

A Story to Remember: The Magic Probability Kingdom

In the Kingdom of Chance, there are three unbreakable laws:

The Law of Positivity: No one can have negative gold coins (no negative probabilities)
The Law of Completeness: All the land in the kingdom must belong to someone - every piece must be accounted for (everything adds to 100%)
The Law of Fair Addition: If two people can't own the same piece of land, you add their territories to find the total (add probabilities that can't happen together)

Every citizen (every probability problem) must follow these three laws, or they're banished from the kingdom (they're not valid probabilities)!

Try It Yourself!

You have a jar with 10 candies:

4 chocolate
3 strawberry
2 lemon
1 mint

Check Rule 1: Are all chances positive or zero?

Each candy type has a positive chance ✓

Check Rule 2: Do all chances add to 100%?

4/10 + 3/10 + 2/10 + 1/10 = 10/10 = 100% ✓

Check Rule 3: What's the chance of getting chocolate OR mint?

Can't be both, so we add: 4/10 + 1/10 = 5/10 = 50% ✓

Remember

Axiomatic probability gives us three simple rules that every probability must follow:

No negative chances (you can't have less than 0%)
Something must happen (all possibilities add to 100%)
Add separate chances (if things can't happen together, add them)

These are like the rules of the probability game - break them, and you're not playing probability anymore! Just like you can't have negative cookies, can't have a day where nothing happens, and if you can't be in two places at once, you add the chances of being in either place.

That's all axiomatic probability is - three simple, sensible rules that make sure our math about chances actually makes sense!

Another way to explain:

The Three Big Rules:

No negative probabilities: You can't have a -50% chance of something. The lowest is 0% (impossible), like the chance of rolling a 7 on a normal die.

Something must happen: If you list every possible thing that could happen, the total probability is always 100%. When you flip a coin, it's either heads or tails - one of them MUST happen.

Adding up separate events: If two things can't happen at the same time, add their probabilities. The chance of rolling a 1 OR a 2 on a die is 1/6 + 1/6 = 2/6.

Why this matters: These rules are like the grammar of probability. Just like sentences need to follow grammar rules to make sense, any probability calculation needs to follow these rules to be valid.

For example, if someone said "There's a 60% chance of rain and an 80% chance of no rain," you'd know something's wrong because 60% + 80% = 140%, which breaks rule #2 (the total should be 100%).

The Core Axioms:

The Kolmogorov axioms define probability as a function P that assigns numbers to events in a sample space Ω, satisfying three conditions:

Non-negativity: For any event A, P(A) ≥ 0 (probabilities cannot be negative)

Normalization: P(Ω) = 1 (the probability that something happens is 1)

Countable additivity: For mutually exclusive events A₁, A₂, A₃, ..., the probability of their union equals the sum of their individual probabilities: P(A₁ ∪ A₂ ∪ ...) = P(A₁) + P(A₂) + ...

Key Features:

From these axioms, all other probability rules can be derived mathematically. For instance, we can prove that P(∅) = 0 (the empty set has probability zero), that P(A') = 1 - P(A) (complement rule), and derive formulas for conditional probability and independence.

The axiomatic approach is agnostic about interpretation—it doesn't tell us what probability "means" in the real world, only how probabilities must behave mathematically. This makes it compatible with both frequentist and Bayesian interpretations. It's like defining the rules of arithmetic without specifying whether numbers represent apples, dollars, or abstract quantities.

This framework enables rigorous mathematical proofs and is essential for advanced topics like measure theory, stochastic processes, and mathematical statistics.

Putting It All Together

Think of these three approaches like different ways to figure out if you'll like a new ice cream flavor:

Classical: If the shop has 20 flavors and you like half the ones you can see ingredients for, you might guess you'll like 10 out of 20 flavors.

Frequentist: Try a small sample of different flavors over several visits. If you like 7 out of 10 you try, you'd say you have a 70% chance of liking a random flavor.

Axiomatic: These are the rules everyone has to follow - you can't like a flavor negative amounts, you must have some opinion about each flavor, and if you list all flavors as "like" or "don't like," those percentages must add up to 100%.

Each method has its place. Classical is great for simple games, frequentist is perfect for real-world experiments you can repeat, and axiomatic gives us the mathematical rules that keep everything making sense!

These three approaches—axiomatic, frequentist, and classical—represent different levels of abstraction and different philosophical commitments about what probability means. Modern probability theory typically uses the axiomatic framework as its mathematical foundation while allowing for various interpretations (frequentist, Bayesian, or others) depending on the application context.

Formula,

"The probability of an event is the measure of how likely that event is to occur. For a finite sample space, it is calculated as the ratio of the number of favorable outcomes to the total number of possible outcomes."

In simpler terms, probability is a number between 0 and 1 (or 0% and 100%) that tells you the chance of something happening. The formula is:

Probability is the ratio of Number of Favorable Outcomes divided by the Total Number of Possible Outcomes.

Example: Imagine you roll a standard six-sided die 🎲.

The total number of possible outcomes is 6 (you can roll a 1, 2, 3, 4, 5, or 6).
Let's say our desired event is "rolling an even number."
The number of favorable outcomes for this event is 3 (the numbers 2, 4, and 6).

Using the formula, the probability of rolling an even number is: $P (even number) = \frac{3}{6} = \frac{1}{2}$ or $0.5$ or $50%$ .

What is Probability?

Probability is a measure of how likely something is to happen. It's a number between 0 and 1 (or 0% and 100%).

0 means it's an impossible event.
1 means it's a certain event.

The basic formula to calculate it is:

P (Event) = Number of Favorable Outcomes / T

An event is just the specific outcome or group of outcomes you're interested in (like rolling a 6 on a die). The sample space is every single possible outcome (all the numbers from 1 to 6).

The Certain Event: $P (Ω) = 100%$

$Ω$ (Omega) is the Greek letter used in probability to represent the sample space. The sample space is the set of all possible outcomes of an experiment. For our die roll, the sample space is $Ω = {1, 2, 3, 4, 5, 6}$ .
P(Ω) therefore means "the probability of any outcome from the sample space occurring."
This probability is 1 (or 100%) because it is an absolute certainty that when you roll the die, one of the possible outcomes will occur. You are guaranteed to get a result that is either a 1, 2, 3, 4, 5, or 6. This is known as a certain event.

The Impossible Event: $P (\emptyset) = 0$

$\emptyset$ (Phi or the Empty Set symbol) is used to represent an impossible event—an event that has no possible outcomes.
P(∅) therefore means "the probability of an impossible event occurring."
This probability is always 0 because it simply cannot happen.

Example: Using our six-sided die again, what is the probability of rolling a 7?

The number of favorable outcomes is 0 (there is no "7" on the die).
The total number of possible outcomes is still 6.

So, the probability is $P (rolling a 7) = \frac{0}{6} = 0$ .

Summary

In essence, the image lays out the foundational rules of probability:

Probability is calculated by dividing the outcomes you want by the total outcomes possible.
The probability of a certain event (something that must happen) is 1.
The probability of an impossible event (something that cannot happen) is 0.

All other probabilities will fall somewhere between these two extremes.

What is an Event

An event is a specific outcome or a set of outcomes from an experiment, like rolling number 5 on a die or number of students in a class who like pizza.

How can the word 'event' describe both a single outcome, like rolling a 5, and a collection of multiple outcomes, like the group of students who like pizza?

The key is to understand the difference between a simple event and a compound event.

A simple event is a single, specific outcome.

Example: Rolling a die and getting a 5 is a event. There's only one way for this to happen.

A compound event is a set of one or more outcomes that share a specific trait.

Example: The group of students in your class who like pizza is also a event. This isn't a single outcome; it's a collection of many individual outcomes (each student who says "yes" or raises their hands).

A, B and C are events. These may have some probability of occurrences. Then we see the probability that 2 or more events may happen using the shaded area.

Let's say our three events are different pizza toppings:

Event A: Friends who like Pepperoni
Event B: Friends who like Mushrooms
Event C: Friends who like Olives

Understanding the Parts

Here’s what each section of the diagram means, based on our pizza toppings:

Only Pepperoni (Just the A circle): These are the friends who only want pepperoni and nothing else.
Only Mushrooms (Just the B circle): These friends will only eat pizza with mushrooms.
Only Olives (Just the C circle): These friends are sticking with only olives.

The Overlaps (Where Circles Cross)

This is where things get interesting! People may like more than one item, so

A and B Overlap: Friends in this section like both Pepperoni and Mushrooms, but not olives.
B and C Overlap: These friends like both Mushrooms and Olives, but not pepperoni.
A and C Overlap: These friends like both Pepperoni and Olives, but not mushrooms.

The Center (Where All Circles Cross)

A, B, and C All Overlap: This is for the friends who are the most adventurous! They love pizza with Pepperoni, Mushrooms, AND Olives all on the same slice. This is the ultimate combo section.

So, each circle (A, B, and C) is just a group, and the diagram shows us how those groups are the same and how they are different in a simple, visual way.

An event is a specific outcome or a set of outcomes of an experiment or random process.

In other words:

Visualizing Events with Venn Diagrams

Venn diagrams help us see the relationships between different events.

The box (U) represents the universe or the entire sample space.
Each circle represents a specific event (like Event A, Event B, etc.).

When circles overlap, it means that it's possible for those events to happen at the same time.

Combining Events: Key Operations

We can describe the different parts of a Venn diagram using special terms and symbols. Let's use the example of a school where:

Event A = Students in the Art Club 🎨
Event B = Students on the Basketball Team 🏀

Name	Symbol	Meaning	Example
Intersection	$A \cap B$	AND - The elements that are in both Event A and Event B.	Students who are in the Art Club and on the Basketball Team. (The overlap of the circles)
Union	$A \cup B$	OR - The elements that are in Event A, Event B, or both.	Students who are in the Art Club or on the Basketball Team. (Everything inside both circles)
Complement	$A^{'}$	NOT - Everything in the universe that is not in Event A.	Students who are not in the Art Club. (Everything outside the A circle)

Putting It All Together (Three Events)

The three-circle diagram shows more complex relationships, like with our pizza topping example: Event A (Pepperoni), Event B (Mushrooms), and Event C (Olives).

Region 1 (Only one event): These sections show outcomes that belong to only one event. (e.g., Liking only Pepperoni).
Region 2 (Intersection of two events): These shaded sections show outcomes that belong to two events at the same time (e.g., Liking Pepperoni and Mushrooms, but not Olives).
Region 3 (Intersection of all three events): The dotted center shows the outcome that belongs to all three events at once (Liking Pepperoni, Mushrooms, and Olives).

In Probability and Statistics 🎲

In the context of probability, an event is a result you are interested in. It can be a single outcome or a group of outcomes.

Simple Event: A single possible outcome.
- Example: When flipping a coin, getting "Heads" is a simple event.
- Example: When rolling a six-sided die, "rolling a 4" is a simple event.
Compound Event: A combination of two or more simple events.
- Example: When rolling a die, "rolling an odd number" is a compound event because it includes the outcomes {1, 3, 5}.
- Example: When drawing a card from a deck, "drawing a red king" is a compound event (the card must be both red and a king).

The probability of an event is calculated by dividing the number of ways the event can occur by the total number of possible outcomes.

In General Usage 🗓️

More broadly, an event is simply something that happens, particularly something of significance. It can refer to a planned public or social occasion, like a concert or a festival, or a noteworthy occurrence, like a historical event or a personal milestone.

Three axioms of probability

The three axioms of probability, formulated by Andrey Kolmogorov in 1933, form the mathematical foundation of probability theory. These axioms provide a rigorous framework for understanding and calculating probabilities.

Axiom 1: Non-negativity

The probability of any event is a non-negative real number between 0 and 1.

Mathematically: For any event A, P(A) ≥ 0

This axiom establishes that probabilities cannot be negative. It makes intuitive sense because probability represents the likelihood of something occurring - you can't have less than zero chance of something happening. A probability of 0 means the event is impossible, while positive values indicate varying degrees of possibility up to certainty.

Axiom 2: Normalization (Unitarity)

The probability of the entire sample space is 1.

Mathematically: P(S) = 1, where S is the sample space

The sample space represents all possible outcomes of an experiment or situation. This axiom states that something must happen - when you account for all possibilities, their total probability equals 1 (or 100%). For example, when rolling a standard die, the sample space is {1, 2, 3, 4, 5, 6}, and the probability that you'll roll one of these numbers is certain, hence P(S) = 1.

Axiom 3: Additivity (Countable Additivity)

For any countable sequence of mutually exclusive (disjoint) events, the probability of their union equals the sum of their individual probabilities.

Mathematically: If A₁, A₂, A₃, ... are mutually exclusive events (meaning Aᵢ ∩ Aⱼ = ∅ for i ≠ j), then: P(A₁ ∪ A₂ ∪ A₃ ∪ ...) = P(A₁) + P(A₂) + P(A₃) + ...

This axiom tells us how to calculate the probability of compound events. When events cannot occur simultaneously (they're mutually exclusive), we can find the probability that at least one occurs by adding their individual probabilities. For instance, the probability of rolling either a 2 or a 5 on a fair die is P(2) + P(5) = 1/6 + 1/6 = 1/3.

Implications and Derived Properties

From these three fundamental axioms, we can derive all other probability rules and theorems, including:

The probability of the empty set (impossible event) is 0: P(∅) = 0
The complement rule: P(A') = 1 - P(A), where A' is the complement of event A
For any event A: 0 ≤ P(A) ≤ 1
The inclusion-exclusion principle for non-disjoint events: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Conditional probability formulas and Bayes' theorem

These axioms are deliberately minimal yet complete - they provide just enough structure to build the entire edifice of probability theory while avoiding redundancy. They bridge the gap between intuitive notions of chance and rigorous mathematical analysis, enabling precise calculations in fields ranging from statistics and physics to finance and machine learning.

The beauty of Kolmogorov's axiomatization lies in its simplicity and generality. It applies equally well to discrete probability (like coin flips) and continuous probability (like measuring rainfall), providing a unified framework for understanding uncertainty across all domains.

Bayesian probability

Bayesian probability interprets probability as a degree of belief or confidence in a proposition, which gets updated as new evidence becomes available using Bayes' theorem.

Discrete Probability Distributions

These are three fundamental discrete probability distributions, each used to model different kinds of scenarios where you count outcomes.

1. Bernoulli Distribution

This is the simplest of the three. A Bernoulli distribution models the outcome of a single trial that has only two possible results: "success" or "failure."

What it models: A single event, like a yes/no question.
Key Parameter: It's defined by one parameter, p, which is the probability of success. The probability of failure is then 1-p.
Example: One flip of a coin. 🪙 If we call "Heads" a success, the probability of success is p = 0.5. The outcome is either 1 (Heads) or 0 (Tails).

2. Binomial Distribution

A Binomial distribution is what you get when you repeat a Bernoulli trial a fixed number of times and count the total number of successes.

What it models: The number of successes in a fixed number, n, of independent trials.
Key Parameters: It's defined by two parameters: n (the number of trials) and p (the probability of success on any single trial).
Example: You flip a coin 10 times (n=10) and want to know the probability of getting exactly 7 heads. The number of heads you might get (from 0 to 10) follows a binomial distribution.

3. Poisson Distribution

A Poisson distribution models the number of times an event occurs over a fixed interval of time or space, given that these events happen at a known constant average rate and are independent of the time since the last event.

What it models: The count of events in an interval (e.g., time, area, distance).
Key Parameter: It's defined by one parameter, λ (lambda), which is the average number of events per interval.
Example: A call center receives an average of 10 calls per hour (λ=10). The Poisson distribution can tell you the probability of receiving exactly 15 calls in the next hour. 📞

Summary Table

Distribution	What It Models	Number of Trials	Key Parameter(s)
Bernoulli	A single trial with two outcomes	One	p (probability of success)
Binomial	Number of successes in a fixed number of trials	Fixed (`n`)	n (number of trials), p (probability of success)
Poisson	Number of events in a fixed interval	Not applicable	λ (average rate of events)

Artificial Intelligence Theory and Application

Different approaches of probability

Experiment

Trial

Outcome

Sample Space

Event

Examples That Show The Difference

Complete Information (No Probability Needed):

Incomplete Information (Probability Helps):

Why We Usually Have Incomplete Information

How Probability Helps

A Simple Analogy

The Key Point

Axiomatic Probability - The Three Rules of Chance

What Are Axioms?

The Three Magic Rules

Rule 1: No Negative Chances (Non-negativity)

Rule 2: Something Must Happen (Normalization)

Rule 3: Add Separate Chances (Additivity)

Why These Rules Matter - The Spinner Game

Breaking the Rules - What Goes Wrong

If we break Rule 1 (negative probability):

If we break Rule 2 (doesn't add to 100%):

If we break Rule 3 (wrong adding):

The Cool Part - Everything Else Comes From These Three Rules!

A Story to Remember: The Magic Probability Kingdom

Try It Yourself!

Remember

What is Probability?

The Certain Event: P(Ω)=100%

The Impossible Event: P(∅)=0

Summary

Understanding the Parts

The Overlaps (Where Circles Cross)

The Center (Where All Circles Cross)

Visualizing Events with Venn Diagrams

Combining Events: Key Operations

In Probability and Statistics 🎲

In General Usage 🗓️

Axiom 1: Non-negativity

Axiom 2: Normalization (Unitarity)

Axiom 3: Additivity (Countable Additivity)

Implications and Derived Properties

1. Bernoulli Distribution

2. Binomial Distribution

3. Poisson Distribution

Summary Table

Comments

Post a Comment

Popular posts from this blog

Simple Linear Regression - and Related Regression Loss Functions

What problems can AI Neural Networks solve

Activation Functions in Neural Networks

The Certain Event: $P (Ω) = 100%$

The Impossible Event: $P (\emptyset) = 0$