Quiz:
-
Can mathematics solve complex, high-dimensional problems (those involving hundreds or thousands of parameters)?
- A) Yes, mathematics can easily handle all such problems
- B) No, not without significant computational time and resources, and some problems may not have solutions
- C) Only problems with three or fewer dimensions
-
When a perfect solution with zero error is unattainable, what approach should be taken?
- A) Abandon the problem
- B) Seek an approximate solution that minimizes error
- C) Keep trying until zero error is achieved
-
What is the conventional method for measuring the difference between actual and calculated values?
- A) Simple subtraction
- B) Mean Squared Error (MSE) or Sum of Squared Errors
- C) Just guessing the difference
-
If an error function is visualized as a 3D landscape with mountains and valleys, which region represents the optimal solution?
- A) The highest peak
- B) The lowest valley (global minimum)
- C) The flattest area
-
How can we avoid getting trapped in local minima?
- A) Once trapped, there's no escape
- B) Initialize from multiple random starting points
- C) Always start from zero
-
What should be done with problems that have no perfect solutions?
- A) Declare them unsolvable
- B) Find the best approximate solution that minimizes error
- C) Wait for better mathematics to be invented
-
What is the primary purpose of the learning rate in gradient descent?
- A) To determine the size of steps taken toward the minimum
- B) To count the number of iterations
- C) To measure the final accuracy
-
What happens if the learning rate is set too high?
- A) The algorithm converges faster
- B) The algorithm may overshoot the minimum and diverge
- C) Nothing significant changes
-
When the gradient (slope) equals zero, what does this indicate about our current position?
- A) We're at the starting point
- B) We're at either a minimum, maximum, or saddle point
- C) We need to increase the learning rate
-
Why do we use the negative gradient direction in gradient descent?
- A) Because positive directions don't work
- B) Because the negative gradient points toward the steepest decrease
- C) It's just a convention
-
What is a "batch" in batch gradient descent?
- A) A single data point
- B) The entire dataset used in each iteration
- C) A random subset of data
-
How does stochastic gradient descent differ from batch gradient descent?
- A) It uses one random data point at a time instead of the entire dataset
- B) It's always slower
- C) It guarantees finding the global minimum
-
What is the main advantage of mini-batch gradient descent?
- A) It balances between batch and stochastic methods
- B) It requires no learning rate
- C) It always converges in fewer iterations
-
What does "convergence" mean in the context of gradient descent?
- A) When the algorithm crashes
- B) When parameter updates become negligibly small
- C) When we run out of data
-
How can we tell if gradient descent is working properly?
- A) The cost/error should generally decrease with each iteration
- B) The parameters should always increase
- C) The gradient should increase
-
What is "vanishing gradient" problem?
- A) When gradients become too large
- B) When gradients become so small that learning effectively stops
- C) When we lose the gradient calculation
-
What role does the derivative (or partial derivative) play in gradient descent?
- A) It tells us the direction and steepness of the slope
- B) It counts the iterations
- C) It measures the error
-
Why might gradient descent move slowly when approaching a minimum?
- A) Because the gradient becomes smaller near flat regions
- B) Because the learning rate automatically decreases
- C) Because it gets tired
-
What is momentum in gradient descent?
- A) The speed of computation
- B) A technique that helps accelerate convergence by considering previous updates
- C) The initial starting point
-
What is adaptive learning rate?
- A) A fixed rate that never changes
- B) A learning rate that adjusts based on the optimization progress
- C) The maximum possible learning rate
-
In a 2D error surface visualization, what do contour lines represent?
- A) Points with the same error value
- B) The path taken by gradient descent
- C) Random patterns
-
What is the "exploding gradient" problem?
- A) When gradients become extremely large, causing unstable updates
- B) When the computer explodes
- C) When gradients become zero
-
How many iterations does gradient descent typically need?
- A) Always exactly 100
- B) It depends on the problem, data, and parameters
- C) Just one
-
What happens if we initialize all parameters to zero?
- A) It's always the best approach
- B) It may cause problems in neural networks due to symmetry
- C) The algorithm won't start
-
What is a saddle point?
- A) The global minimum
- B) A point where gradients are zero but it's neither minimum nor maximum in all directions
- C) The starting point
-
Why is gradient descent called an iterative optimization algorithm?
- A) Because it repeats the update process multiple times
- B) Because it only works once
- C) Because it's slow
1. Can mathematics solve complex, high-dimensional problems? Answer: B - No, not without significant computational time and resources, and some problems may not have solutions. Explanation: High-dimensional problems face the "curse of dimensionality" and may be computationally intractable or have no closed-form solutions.
2. When a perfect solution with zero error is unattainable, what approach should be taken? Answer: B - Seek an approximate solution that minimizes error. Explanation: In real-world problems, we aim for the best possible solution within constraints rather than perfection.
3. What is the conventional method for measuring the difference between actual and calculated values? Answer: Mean Squared Error (MSE) or Sum of Squared Errors. Explanation: Squaring penalizes larger errors more and ensures all errors are positive.
4. Which region of the error function graph represents the optimal solution? Answer: The lowest valley (global minimum). Explanation: The minimum point has the lowest error/cost value, representing the best solution.
5. How can we avoid getting trapped in local minima? Answer: B - Initialize from multiple random starting points. Explanation: Different starting points may lead to different minima; some may reach the global minimum.
6. What to do with problems that have no perfect solutions? Answer: Find the best approximate solution that minimizes error. Explanation: Optimization aims to get as close as possible to ideal when perfection is unattainable.
7. What is the primary purpose of the learning rate? Answer: A - To determine the size of steps taken toward the minimum. Explanation: Learning rate controls how much we adjust parameters in response to the gradient.
8. What happens if the learning rate is set too high? Answer: B - The algorithm may overshoot the minimum and diverge. Explanation: Large steps can jump over the minimum, causing oscillation or divergence.
9. When the gradient equals zero, what does this indicate? Answer: B - We're at either a minimum, maximum, or saddle point. Explanation: Zero gradient means no slope in any direction; could be any critical point.
10. Why do we use the negative gradient direction? Answer: B - Because the negative gradient points toward the steepest decrease. Explanation: Gradient points uphill; negative gradient points downhill toward lower error.
11. What is a "batch" in batch gradient descent? Answer: B - The entire dataset used in each iteration. Explanation: Batch gradient descent computes gradients using all training examples.
12. How does stochastic gradient descent differ? Answer: A - It uses one random data point at a time instead of the entire dataset. Explanation: SGD updates parameters after each single example, making it faster but noisier.
13. What is the main advantage of mini-batch gradient descent? Answer: A - It balances between batch and stochastic methods. Explanation: Mini-batch offers a compromise: faster than batch, less noisy than stochastic.
14. What does "convergence" mean? Answer: B - When parameter updates become negligibly small. Explanation: The algorithm has essentially found a minimum and changes are minimal.
15. How can we tell if gradient descent is working? Answer: A - The cost/error should generally decrease with each iteration. Explanation: Successful optimization shows declining error over time.
16. What is "vanishing gradient" problem? Answer: B - When gradients become so small that learning effectively stops. Explanation: Tiny gradients mean tiny updates, causing training to stall.
17. What role does the derivative play? Answer: A - It tells us the direction and steepness of the slope. Explanation: Derivatives indicate how much and in which direction to adjust parameters.
18. Why might gradient descent move slowly near a minimum? Answer: A - Because the gradient becomes smaller near flat regions. Explanation: Flatter surfaces have smaller gradients, resulting in smaller steps.
19. What is momentum in gradient descent? Answer: B - A technique that helps accelerate convergence by considering previous updates. Explanation: Momentum adds a fraction of the previous update to the current one, helping navigate past small local variations.
20. What is adaptive learning rate? Answer: B - A learning rate that adjusts based on the optimization progress. Explanation: Methods like AdaGrad or Adam adjust learning rates during training for better convergence.
21. What do contour lines represent? Answer: A - Points with the same error value. Explanation: Like elevation lines on a map, contours connect points of equal cost/error.
22. What is the "exploding gradient" problem? Answer: A - When gradients become extremely large, causing unstable updates. Explanation: Huge gradients cause massive parameter jumps, destabilizing training.
23. How many iterations does gradient descent need? Answer: B - It depends on the problem, data, and parameters. Explanation: Convergence time varies with problem complexity, data size, and hyperparameters.
24. What happens if we initialize all parameters to zero? Answer: B - It may cause problems in neural networks due to symmetry. Explanation: Zero initialization can cause all neurons to learn the same features (symmetry breaking problem).
25. What is a saddle point? Answer: B - A point where gradients are zero but it's neither minimum nor maximum in all directions. Explanation: Like a horse saddle - minimum in one direction, maximum in another.
26. Why is it called an iterative optimization algorithm? Answer: A - Because it repeats the update process multiple times. Explanation: Each iteration improves the solution gradually through repeated parameter updates.
>> OLD
Comments
Post a Comment