Understanding R-squared (The Coefficient of Determination)
Understanding R-squared (The Coefficient of Determination)
What Does R² Measure?
tells you the proportion of the variance in the target variable that your model can explain [metrics to calculate the difference of predicted and expected values]. It provides a score between 0 and 1, though it can be negative for very poor models.
: A perfect model. It explains 100% of the variability in the data.
: A useless model. It performs no better than a baseline model that simply predicts the average of the target variable.
: A very poor model. It performs worse than just predicting the average. This can happen when evaluating the model on new, unseen data.
tells you the proportion of the variance in the target variable that your model can explain [metrics to calculate the difference of predicted and expected values]. It provides a score between 0 and 1, though it can be negative for very poor models.
: A perfect model. It explains 100% of the variability in the data.
: A useless model. It performs no better than a baseline model that simply predicts the average of the target variable.
: A very poor model. It performs worse than just predicting the average. This can happen when evaluating the model on new, unseen data.
R-squared () and the coefficient of determination are two names for the exact same statistical measure. It's one of the most common metrics used to evaluate how well a regression model fits the data.
R-squared () and the coefficient of determination are two names for the exact same statistical measure. It's one of the most common metrics used to evaluate how well a regression model fits the data.
Why Two Names?
"Coefficient of Determination" is the formal statistical term. It accurately describes what the metric does: it determines the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
"R-squared" or "" is the common name and mathematical notation. The "R" comes from its relationship with Pearson's correlation coefficient (r). In a simple linear regression with one variable, R2 is literally the square of Pearson's r ().
In practice, the terms are used interchangeably. "R-squared" is common among practitioners for its brevity, while "coefficient of determination" is often used in formal academic papers.
"Coefficient of Determination" is the formal statistical term. It accurately describes what the metric does: it determines the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
"R-squared" or "" is the common name and mathematical notation. The "R" comes from its relationship with Pearson's correlation coefficient (r). In a simple linear regression with one variable, R2 is literally the square of Pearson's r ().
In practice, the terms are used interchangeably. "R-squared" is common among practitioners for its brevity, while "coefficient of determination" is often used in formal academic papers.
The Math Behind R-squared
The formula for is a ratio of how much variance the model explains versus the total variance in the data.
The formula for is a ratio of how much variance the model explains versus the total variance in the data.
The Formula
Sum of Squared Residuals/Total Sum of Squares from Mean)
SSres (Sum of Squared Residuals [Residuals are Errors = expected minus predicted value at each point]): This is the error of your model. It's the sum of the squared differences between the actual values (yi) and your model's predicted values (ŷi).
SStot (Total Sum of Squares from "Mean"): This represents the total variance in the data. It's the sum of the squared differences between the actual values (yi) and the mean of all actual values (ȳ).
ȳ (y with a bar/dash on top) is spelled "y-bar" and represents the MEAN or average of the observed valuesŷ (y with a hat/caret on top) is spelled "y-hat" and represents the predicted/fitted values
So in conversation, you'd say:
"y-bar" for the mean"y-hat" for the predictions
SSres (Sum of Squared Residuals [Residuals are Errors = expected minus predicted value at each point]): This is the error of your model. It's the sum of the squared differences between the actual values (yi) and your model's predicted values (ŷi).
SStot (Total Sum of Squares from "Mean"): This represents the total variance in the data. It's the sum of the squared differences between the actual values (yi) and the mean of all actual values (ȳ).
Why "Sum of Squares"?
The term "sum of squares" is literal. To measure variation, we can't just sum the differences from the mean (e.g., ȳ), because positive and negative differences would cancel each other out.
Solution: We square each difference to make it positive. Squaring also has the benefit of heavily penalizing larger errors. The "Total Sum of Squares" is the sum of the areas of these squares.
The term "sum of squares" is literal. To measure variation, we can't just sum the differences from the mean (e.g., ȳ), because positive and negative differences would cancel each other out.
Solution: We square each difference to make it positive. Squaring also has the benefit of heavily penalizing larger errors. The "Total Sum of Squares" is the sum of the areas of these squares.
Regular R² vs. Adjusted R²
While standard is useful, it has a critical flaw: it always increases as you add more variables to the model, even if those new variables are completely useless. This can be misleading and encourage overfitting.
Adjusted solves this problem by adding a penalty for each new variable included in the model.
While standard is useful, it has a critical flaw: it always increases as you add more variables to the model, even if those new variables are completely useless. This can be misleading and encourage overfitting.
Adjusted solves this problem by adding a penalty for each new variable included in the model.
The Problem Illustrated
Imagine predicting a house price with progressively more variables:
Model Variables Added Regular R² Adjusted R² Model 1 Square footage 0.70 0.69 Model 2 + Number of bedrooms 0.75 0.73 Model 3 + Zip code 0.80 0.77 Model 4 + Owner's birthday 0.81 0.76 Model 5 + Favorite color 0.82 0.74
Notice how Regular R² keeps rising, while Adjusted R² starts to drop when we add irrelevant "nonsense" variables, correctly signaling that the model is becoming unnecessarily complex.
Imagine predicting a house price with progressively more variables:
| Model | Variables Added | Regular R² | Adjusted R² |
| Model 1 | Square footage | 0.70 | 0.69 |
| Model 2 | + Number of bedrooms | 0.75 | 0.73 |
| Model 3 | + Zip code | 0.80 | 0.77 |
| Model 4 | + Owner's birthday | 0.81 | 0.76 |
| Model 5 | + Favorite color | 0.82 | 0.74 |
Notice how Regular R² keeps rising, while Adjusted R² starts to drop when we add irrelevant "nonsense" variables, correctly signaling that the model is becoming unnecessarily complex.
The Formulas Compared
Component Regular R² Adjusted R² Formula Meaning Measures raw explanatory power. Balances explanatory power against model complexity.
Here, n is the number of data points (samples), and p is the number of predictors (features) in the model. The term (n-1)/(n-p-1) acts as the penalty factor.
| Component | Regular R² | Adjusted R² |
| Formula | ||
| Meaning | Measures raw explanatory power. | Balances explanatory power against model complexity. |
Here, n is the number of data points (samples), and p is the number of predictors (features) in the model. The term (n-1)/(n-p-1) acts as the penalty factor.
When to Use Each
Use Regular R² for simple linear regression (one variable) or when comparing models with the same number of variables.
Use Adjusted R² for multiple regression or whenever you're comparing models with a different number of variables. It's essential for model selection and protecting against overfitting.
Quick Decision Rule: Think of Regular R² as the raw test score and Adjusted R² as the score after a "curve" that accounts for the difficulty (complexity). For choosing the best model, you almost always want to use Adjusted R².
Use Regular R² for simple linear regression (one variable) or when comparing models with the same number of variables.
Use Adjusted R² for multiple regression or whenever you're comparing models with a different number of variables. It's essential for model selection and protecting against overfitting.
Quick Decision Rule: Think of Regular R² as the raw test score and Adjusted R² as the score after a "curve" that accounts for the difficulty (complexity). For choosing the best model, you almost always want to use Adjusted R².
Appendix: Pearson's Correlation Coefficient (r)
Pearson's r is a measure that quantifies the strength and direction of a linear relationship between two continuous variables. Its value is always between -1 and +1.
r = +1: Perfect positive linear relationship.
r = 0: No linear relationship.
r = -1: Perfect perfect negative linear relationship.
In simple linear regression, the connection is direct: . For example, if the correlation (r) between study hours and exam scores is +0.8, the R-squared (R2) would be 0.64, meaning that study hours explain 64% of the variance in exam scores.
Pearson's r is a measure that quantifies the strength and direction of a linear relationship between two continuous variables. Its value is always between -1 and +1.
r = +1: Perfect positive linear relationship.
r = 0: No linear relationship.
r = -1: Perfect perfect negative linear relationship.
In simple linear regression, the connection is direct: . For example, if the correlation (r) between study hours and exam scores is +0.8, the R-squared (R2) would be 0.64, meaning that study hours explain 64% of the variance in exam scores.
More information (Same concept different words.)
R² (R-squared) and the coefficient of determination are exactly the same thing. They are just two different names for the identical statistical measure.
Why Two Names?
"Coefficient of determination" is the formal statistical term that describes what the measure actually does - it determines how much of the variance is explained
"R-squared" or "R²" is the mathematical notation, where the "R" comes from the correlation coefficient (Pearson's r), and squaring it gives us this measure
The Connection
The relationship becomes clearer when you consider:
In simple linear regression, R² literally equals the square of the Pearson correlation coefficient (r) between predicted and actual values
Hence: R² = r²
Common Usage
In practice, you'll see both terms used interchangeably:
Academic papers might use "coefficient of determination" for formal precision
Data scientists and practitioners often just say "R-squared" for brevity
Documentation might write "R² (coefficient of determination)" to be clear
So when you see either term in the context of neural networks or any regression analysis, they're referring to the same metric that measures the proportion of variance explained by the model.
R² (R-squared) in the context of neural networks is a statistical measure that indicates how well the model's predictions match the actual data. It's borrowed from traditional statistics and represents the coefficient of determination.
What R² Measures
R² tells you the proportion of variance in the target variable that your neural network can explain. It ranges from 0 to 1 (though can be negative for very poor models):
R² = 1: Perfect prediction - the model explains all variability in the target
R² = 0: The model performs no better than simply predicting the mean
R² < 0: The model performs worse than predicting the mean (possible with test data)
Mathematical Definition
R² is calculated as:
R² = 1 - (SS_res / SS_tot)
Where:
SS_res (residual sum of squares) = Σ(y_actual - y_predicted)²
SS_tot (total sum of squares) = Σ(y_actual - y_mean)²
Use in Neural Networks
In neural networks, R² is primarily used for:
Regression tasks - It's most appropriate when your network outputs continuous values
Model evaluation - Comparing how well different architectures or hyperparameters perform
Interpretability - Providing a more intuitive metric than raw loss values (MSE, MAE)
Important Considerations
While R² is useful, it has limitations in neural network contexts:
It's not suitable for classification tasks (use accuracy, F1-score, etc. instead)
High R² doesn't necessarily mean your model generalizes well (overfitting can inflate R²)
For complex, non-linear relationships that neural networks often model, R² might not capture all aspects of model performance
Unlike simpler models, adding parameters to neural networks doesn't automatically increase R²
In practice, R² serves as one metric among several for evaluating regression neural networks, particularly useful when you need to communicate model performance to stakeholders familiar with traditional statistical measures.
The regular R² is the simpler, original formula:
R² = 1 - (SS_res / SS_tot)
Or equivalently:
R² = (SS_explained / SS_tot)
Where:
- SS_res = Σ(y - ŷ)² = Sum of squared residuals (errors)
- SS_tot = Σ(y - ȳ)² = Total sum of squares from mean
- SS_explained = SS_tot - SS_res = Variance explained by model
What We're Actually Squaring
We're squaring the distance of each data point from the mean:
Total Sum of Squares = Σ(y - ȳ)²
↑
This gets squared!
Step-by-Step Example
Let's say you have test scores: 70, 80, 90
Step 1: Find the mean
ȳ = (70 + 80 + 90) / 3 = 80
Step 2: Find each difference from mean
Student 1: 70 - 80 = -10
Student 2: 80 - 80 = 0
Student 3: 90 - 80 = +10
Step 3: Square each difference
Student 1: (-10)² = 100 ← This is a "square"
Student 2: (0)² = 0 ← This is a "square"
Student 3: (+10)² = 100 ← This is a "square"
Step 4: Sum all the squares
Total Sum of Squares = 100 + 0 + 100 = 200
Why Do We Square?
Problem Without Squaring:
Differences: -10, 0, +10
Sum: -10 + 0 + 10 = 0 ← Cancels out!
The negative and positive differences cancel each other, suggesting no variance when there clearly is!
Solution With Squaring:
Squared differences: 100, 0, 100
Sum: 100 + 0 + 100 = 200 ← Shows actual spread!
Visual Representation
Imagine each squared difference as an actual square:
Student 1 (70): Student 3 (90):
┌──────────┐ ┌──────────┐
│ │ │ │
│ 100 │ 10×10 │ 100 │ 10×10
│ │ │ │
└──────────┘ └──────────┘
↑ ↑
Area = (-10)² Area = 10²
Student 2 (80):
• (no square, difference = 0)
The "Total Sum of Squares" is literally the sum of all these square areas!
Why "Squares" Instead of Absolute Values?
We could use absolute values: |y - ȳ|
But squaring has advantages:
- Mathematical: Derivatives are easier (important for optimization)
- Statistical: Links to variance and standard deviation
- Penalizes outliers: Large errors get extra weight
- Difference of 2: squared = 4
- Difference of 10: squared = 100 (25× more penalty!)
In Context of R²
SS_tot = Σ(y - ȳ)² = Total squared distances from mean
SS_res = Σ(y - ŷ)² = Total squared distances from predictions
SS_explained = SS_tot - SS_res = Variance explained by model
R² = SS_explained/SS_tot = Proportion of "squares" explained
Real-World Analogy
Think of it like measuring how "wrong" each guess is:
- Small miss (off by 2): Penalty = 4
- Medium miss (off by 5): Penalty = 25
- Big miss (off by 10): Penalty = 100
The "sum of squares" is your total penalty score. The model's job is to minimize this penalty!
The Name's Origin
The term comes from early statistics (early 1900s) when calculations were done by hand. Statisticians would literally:
- Calculate differences
- Square them (multiply by themselves)
- Sum up all these squared values
Hence: "Sum of Squares" = Adding up all the squared differences
So when you hear "Total Sum of Squares," think: "Total amount of squared variation in the data" - it's measuring how spread out your data is from its average!
Regular R² vs Adjusted R²
R² and Adjusted R² - Side-by-Side Comparison
R² (R-squared)
R² = 1 - (SSres/SStot)
R² = 1 - [Σ(yi - ŷi)²] / [Σ(yi - ȳ)²]
Adjusted R²
R²adj = 1 - [(1 - R²) × (n - 1) / (n - p - 1)]
R²adj = 1 - [(SSres/(n - p - 1)) / (SStot/(n - 1))]
Breaking Down the Components
| Component | Symbol | Meaning |
|---|---|---|
| SSres | Σ(yi - ŷi)² | Sum of squared residuals (errors) |
| SStot | Σ(yi - ȳ)² | Total sum of squares from "mean" (total variance) |
| n | n | Number of observations/samples |
| p | p | Number of predictors/features (excluding intercept) |
| yi | yi | Actual value |
| ŷi | ŷi | Predicted value |
| ȳ | ȳ | Mean of actual values |
Alternative Form - Showing the Relationship
Starting from R²:
R² = 1 - (SSres/SStot)
Adjusted R² modifies this by adding degrees of freedom:
R²adj = 1 - [(SSres/SStot) × (n - 1)/(n - p - 1)]
Which can be rewritten as:
R²adj = 1 - [(1 - R²) × (n - 1)/(n - p - 1)]
Key Mathematical Differences
Penalty Term
The adjustment factor is: (n - 1)/(n - p - 1)
- When p = 0 (no predictors): R²adj = R²
- As p increases: The denominator (n - p - 1) decreases, making the penalty larger
- The penalty becomes more severe with smaller sample sizes
Numerical Example
Let's say:
- R² = 0.80
- n = 100 samples
- p = 5 predictors
Calculating R²:
R² = 0.80 (given)
Calculating Adjusted R²:
R²adj = 1 - [(1 - 0.80) × (100 - 1)/(100 - 5 - 1)]
R²adj = 1 - [0.20 × 99/94]
R²adj = 1 - [0.20 × 1.0532]
R²adj = 1 - 0.2106
R²adj = 0.7894
Why Adjusted R² < R²
From the formulas, we can see:
- The factor (n - 1)/(n - p - 1) is always > 1 when p > 0
- This multiplies the error term (1 - R²)
- Therefore, adjusted R² always penalizes for additional predictors
When to Use Each
Use R² when:
- Comparing models with the same number of predictors
- Working with simple models
- You want the raw explanatory power
Use Adjusted R² when:
- Comparing models with different numbers of predictors
- Concerned about overfitting
- Need to account for model complexity
- Feature selection decisions
The fundamental difference is that Adjusted R² includes a penalty for model complexity, making it more suitable for model selection in neural networks and machine learning where we often have many parameters.
In other words,
Regular R² (Coefficient of Determination)
The regular R² is the simpler, original formula:
R² = 1 - (SS_res / SS_tot)
Or equivalently:
R² = (SS_explained / SS_tot)
Where:
- SS_res = Σ(y - ŷ)² = Sum of squared residuals (errors)
- SS_tot = Σ(y - ȳ)² = Total sum of squares
- SS_explained = SS_tot - SS_res = Variance explained by model
The Key Difference
Regular R² has a fundamental flaw: it ALWAYS increases when you add more variables, even if they're completely useless!
Example Showing the Problem:
Let's say you're predicting house prices:
| Model | Variables | R² | Adjusted R² |
|---|---|---|---|
| Model 1 | Square footage only | 0.70 | 0.69 |
| Model 2 | + Number of bedrooms | 0.75 | 0.73 |
| Model 3 | + Zip code | 0.80 | 0.77 |
| Model 4 | + Owner's birthday | 0.81 | 0.76 ⬇️ |
| Model 5 | + Favorite color | 0.82 | 0.74 ⬇️ |
Notice:
- Regular R² keeps going up (0.70 → 0.82) even with nonsense variables
- Adjusted R² starts decreasing (0.77 → 0.74) when we add useless variables!
Why Regular R² Always Increases
Mathematically, adding any variable (even random noise) gives the model more "flexibility" to fit the training data:
- More parameters = more ways to reduce SS_res
- SS_tot stays the same
- Therefore, R² = 1 - (SS_res/SS_tot) must increase
The Adjusted R² Solution
Adjusted R² adds a penalty for each additional variable:
Adjusted R² = 1 - [(1-R²)(n-1)/(n-k-1)]
Breaking this down:
- (1-R²) = Unexplained variance proportion
- (n-1)/(n-k-1) = Penalty factor that increases with more predictors (k)
- As k increases, the denominator (n-k-1) gets smaller, making the fraction larger
- This increases the subtracted term, lowering Adjusted R²
Numerical Example
Dataset: 100 observations (n=100)
Model with 1 predictor (k=1):
- R² = 0.60
- Adjusted R² = 1 - [(1-0.60)(99)/(98)]
- Adjusted R² = 1 - [0.40 × 1.0102]
- Adjusted R² = 0.596 (barely different)
Model with 10 predictors (k=10):
- R² = 0.65 (higher!)
- Adjusted R² = 1 - [(1-0.65)(99)/(89)]
- Adjusted R² = 1 - [0.35 × 1.112]
- Adjusted R² = 0.611 (less impressive gain)
Model with 50 predictors (k=50):
- R² = 0.80 (much higher!)
- Adjusted R² = 1 - [(1-0.80)(99)/(49)]
- Adjusted R² = 1 - [0.20 × 2.02]
- Adjusted R² = 0.596 (actually WORSE than simpler model!)
When to Use Each
Use Regular R²:
- Comparing models with the same number of predictors
- Simple linear regression (one predictor)
- When you want to know raw explanatory power
- Academic requirement for basic reporting
Use Adjusted R²:
- Comparing models with different numbers of predictors
- Multiple regression (several predictors)
- Model selection (choosing best model)
- Preventing overfitting
Quick Decision Rule
if number_of_predictors == 1:
use_regular_R2() # They're nearly identical
elif comparing_models_with_different_predictors:
use_adjusted_R2() # Must account for complexity
else:
report_both() # Full transparency
The Bottom Line
- Regular R²: Pure measure of variance explained, but naive about model complexity
- Adjusted R²: Smarter measure that balances fit quality against model simplicity
Think of Regular R² as raw score and Adjusted R² as the grade after curve adjustment. Adjusted R² essentially asks: "Is this variable improving the model enough to justify making it more complex?" If not, Adjusted R² will decrease even though Regular R² increases!
Extra Reading:
Pearson's r (Pearson correlation coefficient) is a statistical measure that quantifies the linear relationship between two continuous variables. It tells you both the strength and direction of a linear association.
What It Measures
Pearson's r captures:
- Direction: Whether variables move together (positive) or in opposite directions (negative)
- Strength: How closely the points follow a straight line
- Range: Always between -1 and +1
Interpretation
- r = +1: Perfect positive linear relationship (as X increases, Y increases perfectly)
- r = 0: No linear relationship (variables are linearly independent)
- r = -1: Perfect negative linear relationship (as X increases, Y decreases perfectly)
- |r| > 0.7: Generally considered strong correlation
- 0.3 < |r| < 0.7: Moderate correlation
- |r| < 0.3: Weak correlation
Mathematical Formula
r = Σ[(xi - x̄)(yi - ȳ)] / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]
Or in a more intuitive form:
r = covariance(X,Y) / (std_dev(X) × std_dev(Y))
This is essentially the standardized covariance - it measures how variables vary together, normalized by their individual variations.
Visual Examples
r ≈ +0.9: • • •
• • • •
• • •
• •
r ≈ 0: • • •
• • •
• • •
• • •
r ≈ -0.9: • •
• • •
• • • •
• •
Key Limitations
-
Only measures LINEAR relationships: Pearson's r can be near zero even with strong non-linear relationships (e.g., parabolic, exponential)
-
Sensitive to outliers: A single extreme point can dramatically change r
-
Correlation ≠ Causation: Even r = 0.99 doesn't mean one variable causes the other
-
Assumes normal distribution: Most reliable when both variables are roughly normally distributed
Relationship to R²
- In simple linear regression: R² = r²
- R² tells you the proportion of variance explained
- r tells you the direction and strength of the linear relationship
- Example: r = -0.8 means strong negative correlation; R² = 0.64 means 64% of variance explained
Practical Example
Temperature vs Ice Cream Sales:
- r ≈ +0.85: Strong positive correlation
- As temperature rises, ice cream sales tend to increase
- R² ≈ 0.72: Temperature explains about 72% of the variation in ice cream sales
Pearson's r is fundamental in statistics and machine learning, particularly for feature selection, understanding variable relationships, and as a building block for more complex analyses.
Observed Values (y)
- The actual, true values from your dataset
- The ground truth labels/targets you're trying to predict
- What actually happened in reality
- What you're hoping your model will produce or output
- Example: The actual house price of $500,000
Predicted Values (ŷ)
- The actual output from your neural network
- What the model thinks the value should be based on the input features
- The result after forward propagation through all layers
- Example: The neural network's prediction of $485,000 for that house
The Relationship
The whole point of training is to minimize the difference between these two:
- Loss function measures the difference (e.g., MSE = mean of (y - ŷ)²)
- Backpropagation adjusts weights to make predicted values closer to observed values
- Perfect model would have predicted values = observed values (never happens in practice)
During training:
- You feed inputs → neural network produces predicted values
- You compare these to observed values from your training set
- The difference drives the learning process
Observed values are what you're trying to match (hoping to see) , predicted values are what your model produces.
Here are 5 interview questions about R² (R-squared) in neural networks and general machine learning:
Questions
1. Fundamental Understanding What is R² (coefficient of determination) and how is it calculated? What does an R² value of 0.7 mean in practical terms?
2. Interpretation Challenges Why can R² sometimes be negative when evaluating a model on test data? What does this indicate about model performance?
3. Neural Networks vs Linear Models When using R² as a metric for neural network regression tasks, what are some key differences or considerations compared to using it with linear regression models?
4. Limitations and Alternatives What are the main limitations of using R² as the sole evaluation metric for regression problems? What complementary metrics would you recommend using alongside R²?
5. Practical Scenario You've trained a neural network for a regression task and obtained an R² of 0.95 on training data but 0.3 on validation data. What might be happening and how would you address it?
Answers
1. Fundamental Understanding R² measures the proportion of variance in the dependent variable that's predictable from the independent variables. It's calculated as: R² = 1 - (SS_res / SS_tot), where SS_res is the sum of squared residuals (Errors) and SS_tot is the total sum of squares (from Mean.) An R² of 0.7 means the model explains 70% of the variance in the target variable, with the remaining 30% unexplained.
2. Interpretation Challenges R² can be negative on test data when the model performs worse than a horizontal line at the mean of the test set. This happens when SS_res > SS_tot, typically indicating the model is making predictions that are systematically far from the actual values - often due to overfitting on training data or distribution shift between train and test sets.
3. Neural Networks vs Linear Models Key considerations include: (a) Neural networks can capture non-linear relationships, potentially achieving higher R² than linear models on complex data; (b) R² in neural networks is more prone to overfitting due to high model capacity; (c) The interpretation is less straightforward since neural networks don't provide simple coefficient interpretations; (d) R² should be monitored on validation sets during training to detect overfitting early.
4. Limitations and Alternatives Limitations: R² doesn't indicate whether predictions are biased, can be artificially inflated by adding parameters, doesn't show if the model violates assumptions, and can be misleading for non-linear relationships. Complementary metrics: MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), MAPE (Mean Absolute Percentage Error), residual plots, and prediction interval coverage.
5. Practical Scenario This is classic overfitting. The model has memorized training data but fails to generalize. Solutions include: (a) Add regularization (L1/L2, dropout); (b) Reduce model complexity (fewer layers/neurons); (c) Increase training data or use data augmentation; (d) Implement early stopping based on validation R²; (e) Use cross-validation to better assess generalization; (f) Check for data leakage or distribution differences between train/validation sets.
Comments
Post a Comment