Decision Trees: Bias and Variance Explained
Quick Recap: What are Decision Trees?
Decision trees are rule-based algorithms that make predictions by asking a series of yes/no questions, splitting data at each node until reaching a decision.
Note: This is not Neural Networks Yet. We are still talking about Rule Based Systems - that includes "Decision Trees"
Bias in Decision Trees
Single Decision Trees: Generally LOW BIAS
Decision trees have low bias because they're very flexible and can create complex decision boundaries. They can:
- Fit almost any pattern in the data
- Create very detailed rules
- Capture non-linear relationships easily
- Keep splitting until they perfectly classify training data
Why Low Bias?
- They don't make strong assumptions about data structure
- Can model complex relationships with enough depth
- Flexible enough to fit training data very closely
Example: If predicting house prices, a deep tree can capture intricate rules like: "IF location=downtown AND bedrooms>3 AND year>2010 AND garage=yes AND school_rating>8 THEN price=high"
Variance in Decision Trees
Single Decision Trees: Generally HIGH VARIANCE
Decision trees have high variance because small changes in training data can create completely different trees.
Why High Variance?
- Different split points can lead to entirely different tree structures
- Sensitive to which data points are in training set
- One different sample at the top can change the entire tree
- Prone to overfitting on training data
Example: Train on two slightly different datasets of the same problem:
- Dataset 1: Tree splits first on "Age > 30"
- Dataset 2: Tree splits first on "Income > 50000" → Completely different tree structures and rules!
The Bias-Variance Profile
| Tree Type | Bias | Variance | Problem |
|---|---|---|---|
| Shallow Tree (depth=2-3) | HIGH | LOW | Underfitting |
| Deep Tree (depth=20+) | LOW | HIGH | Overfitting |
| Unpruned Tree | VERY LOW | VERY HIGH | Severe Overfitting |
| Pruned Tree | MODERATE | MODERATE | Balanced |
Visual Analogy
Think of it like giving directions:
High Bias (Shallow Tree): "Just go north" - Too simple, misses important turns
High Variance (Deep Tree): "Turn left at the red house with the broken mailbox that the Johnson family painted last Tuesday" - Too specific, won't work if anything changes
Controlling Bias and Variance
To Reduce Variance (usual problem):
- Pruning - Cut back overly specific branches
- Set minimum samples per leaf
- Limit maximum depth
- Require minimum samples to split
To Reduce Bias (if tree is too simple):
- Increase depth
- Reduce minimum samples per leaf
- Use more features
- Allow more complex splits
Ensemble Methods Solution
Since single trees have high variance, we use ensembles:
Random Forests:
- Multiple trees with random subsets
- Reduces variance while keeping low bias
- Averages out the individual tree variations
Gradient Boosting:
- Sequential trees that correct errors
- Reduces both bias and variance
- Each tree learns from previous mistakes
Real-World Example
Predicting Customer Churn:
Shallow Tree (High Bias):
IF contract_length < 12 months
THEN predict: CHURN
ELSE predict: STAY
Too simple - misses many patterns
Deep Tree (High Variance):
IF contract=12 AND age=33 AND city="Houston"
AND last_call_duration=5.2 minutes
AND payment_method="credit"
AND joined_on_Tuesday
THEN predict: CHURN
Too specific - won't generalize
Balanced Tree (Pruned):
IF contract < 12 months AND satisfaction < 3
THEN predict: CHURN
ELSE IF contract >= 12 AND usage_drop > 50%
THEN predict: CHURN
ELSE predict: STAY
Just right - captures patterns without overfitting
Summary
- Bias: Decision trees naturally have LOW BIAS (very flexible)
- Variance: Decision trees naturally have HIGH VARIANCE (unstable)
- Challenge: Controlling the high variance without increasing bias too much
- Solution: Pruning, regularization, or using ensembles like Random Forests
The key insight: A single decision tree is like a very detailed but unreliable witness - it remembers everything about what it saw (low bias) but might tell a completely different story next time (high variance)!
Comments
Post a Comment