Skip to main content

Probability Distribution Function Interview Questions [Normal, Bernoulli, Binomial, Poisson, etc ]

 A distribution function describes how probability is spread across possible values of a random variable. The probability density function (PDF) gives the likelihood of specific values for continuous variables, while probability mass function (PMF) does this for discrete variables. The cumulative distribution function (CDF) shows the probability that a variable takes a value less than or equal to a given point.

Most common distributions:

  • Normal (Gaussian): Bell curve, describes many natural phenomena
  • Bernoulli: Single binary outcome (success/failure, yes/no, 1/0)
  • Binomial: Number of successes in n trials
  • Poisson: Count of events in fixed time
  • Exponential: Time between events
  • Uniform: Equal probability across range
  • Beta: Probabilities and proportions

Here are 35 interview questions about distribution functions with multiple choice answers:

Questions

1. What are the two parameters that completely define a normal distribution?

A) Mode and range

B) Mean and standard deviation

C) Minimum and maximum

D) Skewness and kurtosis

E) Median and mode

2. In a standard normal distribution N(0,1), what percentage of data falls within ±1 standard deviation?

A) 50%

B) 68.27%

C) 95.45%

D) 99.73%

E) 100%

3. For a binomial distribution B(n,p), what is the expected value?

A) n + p

B) n - p

C) n × p

D) n / p

E) p^n

4. A call center receives an average of 5 calls per minute. What distribution models the number of calls in a given minute?

A) Normal

B) Binomial

C) Poisson

D) Exponential

E) Uniform

5. If X follows an exponential distribution with rate parameter λ = 2, what is P(X > 0.5)?

A) e^(-1)

B) e^(-2)

C) 1 - e^(-1)

D) 1 - e^(-2)

E) 0.5

6. For a uniform distribution U(a,b), what is the variance?

A) (b - a)²

B) (b - a)²/12

C) (b - a)/2

D) (b + a)/2

E) (b - a)²/6

7. The Beta distribution is commonly used as a prior for which parameter?

A) Mean of normal distribution

B) Variance of normal distribution

C) Probability in binomial distribution

D) Rate in exponential distribution

E) Number of trials

8. Which distribution is memoryless?

A) Normal

B) Binomial

C) Poisson

D) Exponential

E) Beta

9. If X ~ Binomial(10, 0.3), what is the variance?

A) 3

B) 2.1

C) 0.3

D) 0.09

E) 7

10. The Central Limit Theorem states that the sum of many independent random variables approaches which distribution?

A) Uniform

B) Exponential

C) Normal

D) Poisson

E) Beta

11. For a Poisson distribution with λ = 4, what is both the mean and variance?

A) 2

B) 4

C) 8

D) 16

E) Mean is 4, variance is 2

12. Which distribution would best model the time until a radioactive atom decays?

A) Normal

B) Binomial

C) Poisson

D) Exponential

E) Uniform

13. A Beta(1,1) distribution is equivalent to which distribution?

A) Standard normal

B) Exponential(1)

C) Uniform(0,1)

D) Binomial(1,0.5)

E) Poisson(1)

14. In a normal distribution, approximately what percentage of data falls within ±2 standard deviations?

A) 68%

B) 90%

C) 95%

D) 99%

E) 99.7%

15. If events occur at a rate of 3 per hour (Poisson process), what's the probability of exactly 2 events in one hour?

A) 9e^(-3)/2

B) 6e^(-3)

C) 3e^(-2)

D) 2e^(-3)/3

E) e^(-3)

16. For X ~ Normal(100, 15²), what is P(X > 130)?

A) 0.0228

B) 0.05

C) 0.1587

D) 0.5

E) 0.9772

17. Which distribution has the highest entropy for a given mean and variance?

A) Uniform

B) Exponential

C) Normal

D) Poisson

E) Beta

18. If you flip a fair coin 100 times, the number of heads follows which distribution?

A) Poisson(50)

B) Normal(50, 25)

C) Binomial(100, 0.5)

D) Uniform(0, 100)

E) Exponential(0.5)

19. The sum of n independent exponential(λ) random variables follows which distribution?

A) Exponential(nλ)

B) Gamma(n, λ)

C) Normal(n/λ, n/λ²)

D) Poisson(nλ)

E) Uniform(0, n/λ)

20. For a uniform distribution U(2, 8), what is the mean?

A) 3

B) 4

C) 5

D) 6

E) 10

21. Which relationship is correct for a Poisson distribution?

A) Variance = Mean²

B) Variance = Mean

C) Variance = 2×Mean

D) Variance = √Mean

E) Variance = Mean/2

22. A Beta(2, 5) distribution has its mode at approximately:

A) 0.14

B) 0.29

C) 0.40

D) 0.50

E) 0.71

23. The exponential distribution is a special case of which distribution?

A) Normal

B) Beta

C) Gamma

D) Binomial

E) Uniform

24. For large n and small p, Binomial(n,p) approximates which distribution?

A) Normal(np, np)

B) Exponential(np)

C) Poisson(np)

D) Uniform(0, n)

E) Beta(np, n(1-p))

25. If bus arrivals follow a Poisson process with rate 4 per hour, what's the expected waiting time for the next bus (in minutes)?

A) 4

B) 10

C) 15

D) 20

E) 30

26. What is the variance of a Bernoulli distribution with parameter p?

A) p

B) p²

C) p(1-p)

D) 1-p

E) p/(1-p)

27. The sum of n independent Bernoulli(p) random variables follows which distribution?

A) Bernoulli(np)

B) Binomial(n, p)

C) Poisson(np)

D) Normal(np, p)

E) Exponential(p)

28. For a Bernoulli trial with success probability 0.7, what is the expected value?

A) 0.3

B) 0.5

C) 0.7

D) 0.21

E) 1

29. Which distribution has the maximum variance among all distributions supported on [a,b]?

A) Normal (truncated)

B) Beta

C) Uniform

D) Triangular

E) U-shaped

30. The negative binomial distribution models:

A) Number of failures before the first success

B) Number of trials needed to get r successes

C) Time between events

D) Number of events in fixed time

E) Probability of success

31. If X ~ Exponential(λ=3), what is the mean?

A) 1/3

B) 3

C) 9

D) 1/9

E) √3

32. The standard deviation of a Poisson distribution with λ = 9 is:

A) 3

B) 9

C) 81

D) 1/3

E) 27

33. Which statement about the normal distribution is FALSE?

A) It's symmetric about its mean

B) Mean = median = mode

C) It has finite support

D) About 99.7% of data falls within ±3σ

E) It's completely determined by μ and σ

34. A coin with P(heads) = 0.6 is flipped. This single flip follows which distribution?

A) Binomial(1, 0.6)

B) Bernoulli(0.6)

C) Uniform(0, 1)

D) Beta(0.6, 0.4)

E) Both A and B

35. For Beta(α, β), what is the mean?

A) α/β

B) α/(α+β)

C) (α-1)/(α+β-2)

D) αβ/(α+β)

E) (α+β)/2


Step-by-Step Answers

1. Answer: B A normal distribution is completely characterized by two parameters: the mean (μ) which determines the center location, and the standard deviation (σ) which determines the spread. The notation N(μ, σ²) specifies any normal distribution uniquely.

2. Answer: B For any normal distribution, approximately 68.27% of values fall within ±1 standard deviation of the mean. This is part of the 68-95-99.7 rule (also called the empirical rule).

3. Answer: C For a binomial distribution with n trials and probability p of success, the expected value E[X] = n × p. This makes intuitive sense: if you flip a fair coin (p=0.5) 100 times (n=100), you expect 50 heads.

4. Answer: C The Poisson distribution models the count of events occurring in a fixed time interval when events occur independently at a constant average rate. With λ = 5 calls/minute, the number of calls follows Poisson(5).

5. Answer: A For exponential distribution with rate λ, P(X > x) = e^(-λx). Here, P(X > 0.5) = e^(-2×0.5) = e^(-1).

6. Answer: B For uniform distribution U(a,b), the variance = (b-a)²/12. This formula comes from integrating (x - mean)² over the distribution.

7. Answer: C The Beta distribution, with support on [0,1], is the conjugate prior for the probability parameter p in binomial distributions. Beta(α,β) represents prior beliefs about an unknown probability.

8. Answer: D The exponential distribution is memoryless: P(X > s+t | X > s) = P(X > t). The probability of waiting additional time t doesn't depend on how long you've already waited.

9. Answer: B For Binomial(n,p), variance = n × p × (1-p) = 10 × 0.3 × 0.7 = 2.1.

10. Answer: C The Central Limit Theorem states that the distribution of the sum (or average) of many independent, identically distributed random variables approaches a normal distribution, regardless of the original distribution's shape.

11. Answer: B A unique property of the Poisson distribution is that its mean equals its variance: both equal λ. Here, mean = variance = 4.

12. Answer: D Radioactive decay is a memoryless process - the atom doesn't "age." The exponential distribution models time until the next event in such memoryless processes.

13. Answer: C Beta(1,1) has PDF f(x) = 1 for x ∈ [0,1], which is exactly the uniform distribution on [0,1].

14. Answer: C Approximately 95.45% of data falls within ±2 standard deviations. The closest common approximation is 95%.

15. Answer: A For Poisson with λ=3, P(X=2) = (λ²e^(-λ))/2! = (9e^(-3))/2.

16. Answer: A First, standardize: Z = (130-100)/15 = 2. Then P(X > 130) = P(Z > 2) ≈ 0.0228 (from standard normal table).

17. Answer: C Among all distributions with specified mean and variance, the normal distribution has maximum entropy. This is why it appears so often in nature (maximum entropy principle).

18. Answer: C The number of successes in n independent trials with probability p follows Binomial(n,p). Here, it's Binomial(100, 0.5).

19. Answer: B The sum of n independent exponential(λ) random variables follows a Gamma(n, λ) distribution. The exponential is actually Gamma(1, λ).

20. Answer: C For uniform U(a,b), mean = (a+b)/2 = (2+8)/2 = 5.

21. Answer: B For Poisson distribution, a defining characteristic is that variance = mean = λ.

22. Answer: A For Beta(α,β), the mode = (α-1)/(α+β-2) when α,β > 1. Here, mode = (2-1)/(2+5-2) = 1/5 = 0.2. Closest answer is 0.14.

23. Answer: C The exponential distribution is Gamma(1, λ). More generally, Exponential(λ) = Gamma(1, λ).

24. Answer: C The Poisson approximation to binomial: when n is large and p is small (typically np < 10), Binomial(n,p) ≈ Poisson(np). This is useful for rare events.

25. Answer: C If buses arrive at rate 4 per hour, the time between buses follows Exponential(4 per hour). Expected waiting time = 1/λ = 1/4 hour = 15 minutes.

26. Answer: C For Bernoulli(p), Variance = E[X²] - (E[X])² = p - p² = p(1-p). This variance is maximized when p = 0.5.

27. Answer: B The sum of n independent Bernoulli(p) trials gives the number of successes in n trials, which is exactly the definition of Binomial(n,p). This is why Binomial is sometimes called "repeated Bernoulli."

28. Answer: C For Bernoulli(p), E[X] = 1×p + 0×(1-p) = p = 0.7. The expected value equals the probability of success.

29. Answer: E Among all distributions on [a,b], the U-shaped distribution (with all mass at the endpoints) has maximum variance. Uniform has maximum entropy, not maximum variance.

30. Answer: B The negative binomial distribution models the number of trials needed to achieve r successes. It generalizes the geometric distribution (which is r=1).

31. Answer: A For Exponential(λ), the mean = 1/λ = 1/3. Note that λ is the rate parameter, not the mean itself.

32. Answer: A For Poisson(λ), variance = λ = 9, so standard deviation = √9 = 3.

33. Answer: C The normal distribution has infinite support (-∞, +∞). All other statements are true properties of the normal distribution.

34. Answer: E A single coin flip is Bernoulli(0.6). However, Bernoulli(p) is also equivalent to Binomial(1,p), so both A and B are correct.

35. Answer: B For Beta(α,β), the mean = α/(α+β). This represents the expected value of a probability modeled by the Beta distribution.

Comments

Popular posts from this blog

Simple Linear Regression - and Related Regression Loss Functions

Today's Topics: a. Regression Algorithms  b. Outliers - Explained in Simple Terms c. Common Regression Metrics Explained d. Overfitting and Underfitting e. How are Linear and Non Linear Regression Algorithms used in Neural Networks [Future study topics] Regression Algorithms Regression algorithms are a category of machine learning methods used to predict a continuous numerical value. Linear regression is a simple, powerful, and interpretable algorithm for this type of problem. Quick Example: These are the scores of students vs. the hours they spent studying. Looking at this dataset of student scores and their corresponding study hours, can we determine what score someone might achieve after studying for a random number of hours? Example: From the graph, we can estimate that 4 hours of daily study would result in a score near 80. It is a simple example, but for more complex tasks the underlying concept will be similar. If you understand this graph, you will understand this blog. Sim...

What problems can AI Neural Networks solve

How does AI Neural Networks solve Problems? What problems can AI Neural Networks solve? Based on effectiveness and common usage, here's the ranking from best to least suitable for neural networks (Classification Problems, Regression Problems and Optimization Problems.) But first some Math, background and related topics as how the Neural Network Learn by training (Supervised Learning and Unsupervised Learning.)  Background Note - Mathematical Precision vs. Practical AI Solutions. Math can solve all these problems with very accurate results. While Math can theoretically solve classification, regression, and optimization problems with perfect accuracy, such calculations often require impractical amounts of time—hours, days, or even years for complex real-world scenarios. In practice, we rarely need absolute precision; instead, we need actionable results quickly enough to make timely decisions. Neural networks excel at this trade-off, providing "good enough" solutions in seco...

Activation Functions in Neural Networks

  A Guide to Activation Functions in Neural Networks 🧠 Question: Without activation function can a neural network with many layers be non-linear? Answer: Provided at the end of this document. Activation functions are a crucial component of neural networks. Their primary purpose is to introduce non-linearity , which allows the network to learn the complex, winding patterns found in real-world data. Without them, a neural network, no matter how deep, would just be a simple linear model. In the diagram below the f is the activation function that receives input and send output to next layers. Commonly used activation functions. 1. Sigmoid Function 2. Tanh (Hyperbolic Tangent) 3. ReLU (Rectified Linear Unit - Like an Electronic Diode) 4. Leaky ReLU & PReLU 5. ELU (Exponential Linear Unit) 6. Softmax 7. GELU, Swish, and SiLU 1. Sigmoid Function                       The classic "S-curve," Sigmoid squashes any input value t...