Regular R² vs Adjusted R²

Understanding R-squared (The Coefficient of Determination)

What Does R² Measure?

$R^{2}$ tells you the proportion of the variance in the target variable that your model can explain [metrics to calculate the difference of predicted and expected values]. It provides a score between 0 and 1, though it can be negative for very poor models.
$R^{2} = 1$ : A perfect model. It explains 100% of the variability in the data.
$R^{2} = 0$ : A useless model. It performs no better than a baseline model that simply predicts the average of the target variable.
$R^{2} < 0$ : A very poor model. It performs worse than just predicting the average. This can happen when evaluating the model on new, unseen data.

R-squared ( $R^{2}$ ) and the coefficient of determination are two names for the exact same statistical measure. It's one of the most common metrics used to evaluate how well a regression model fits the data.

Why Two Names?

"Coefficient of Determination" is the formal statistical term. It accurately describes what the metric does: it determines the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
"R-squared" or " $R^{2}$ " is the common name and mathematical notation. The "R" comes from its relationship with Pearson's correlation coefficient (r). In a simple linear regression with one variable, $R^{2}$ is literally the square of Pearson's r ( $R^{2} = r^{2}$ ).
In practice, the terms are used interchangeably. "R-squared" is common among practitioners for its brevity, while "coefficient of determination" is often used in formal academic papers.

The Math Behind R-squared

The formula for $R^{2}$ is a ratio of how much variance the model explains versus the total variance in the data.

The Formula

$R^{2} = 1 - \frac{S S _{res}}{S S _{t o t/}}$ $Sum of Squared Residuals/Total Sum of Squares from Mean)$
SSres (Sum of Squared Residuals [Residuals are Errors = expected minus predicted value at each point]): This is the error of your model. It's the sum of the squared differences between the actual values (yi) and your model's predicted values (ŷi).
$S S_{res} = i = 1 \sum n (y_{i} -_{i})^{2}$
SStot (Total Sum of Squares from "Mean"): This represents the total variance in the data. It's the sum of the squared differences between the actual values (yi) and the mean of all actual values (ȳ).
$S S_{t o t} = i = 1 \sum n (y_{i} - ȳ)^{2}$

ȳ (y with a bar/dash on top) is spelled "y-bar" and represents the MEAN or average of the observed values
ŷ (y with a hat/caret on top) is spelled "y-hat" and represents the predicted/fitted values

So in conversation, you'd say:

"y-bar" for the mean
"y-hat" for the predictions

Why "Sum of Squares"?

The term "sum of squares" is literal. To measure variation, we can't just sum the differences from the mean (e.g., $y_{i} -$ ȳ), because positive and negative differences would cancel each other out.
Solution: We square each difference to make it positive. Squaring also has the benefit of heavily penalizing larger errors. The "Total Sum of Squares" is the sum of the areas of these squares.

Regular R² vs. Adjusted R²

While standard $R^{2}$ is useful, it has a critical flaw: it always increases as you add more variables to the model, even if those new variables are completely useless. This can be misleading and encourage overfitting.
Adjusted $R^{2}$ solves this problem by adding a penalty for each new variable included in the model.

The Problem Illustrated

Imagine predicting a house price with progressively more variables:
Model Variables Added Regular R² Adjusted R²
Model 1 Square footage 0.70 0.69
Model 2 + Number of bedrooms 0.75 0.73
Model 3 + Zip code 0.80 0.77
Model 4 + Owner's birthday 0.81 0.76
Model 5 + Favorite color 0.82 0.74
Notice how Regular R² keeps rising, while Adjusted R² starts to drop when we add irrelevant "nonsense" variables, correctly signaling that the model is becoming unnecessarily complex.

Model	Variables Added	Regular R²	Adjusted R²
Model 1	Square footage	0.70	0.69
Model 2	+ Number of bedrooms	0.75	0.73
Model 3	+ Zip code	0.80	0.77
Model 4	+ Owner's birthday	0.81	0.76
Model 5	+ Favorite color	0.82	0.74

The Formulas Compared

Component Regular R² Adjusted R²
Formula $R^{2} = 1 - \frac{S S _{res}}{S S _{t o t}}$ $R_{a d j}^{2} = 1 - [\frac{( 1 - R ^{2} ) ( n - 1 )}{n - p - 1}]$
Meaning Measures raw explanatory power. Balances explanatory power against model complexity.
Here, n is the number of data points (samples), and p is the number of predictors (features) in the model. The term `(n-1)/(n-p-1)` acts as the penalty factor.

Component	Regular R²	Adjusted R²
Formula	$R^{2} = 1 - \frac{S S _{res}}{S S _{t o t}}$	$R_{a d j}^{2} = 1 - [\frac{( 1 - R ^{2} ) ( n - 1 )}{n - p - 1}]$
Meaning	Measures raw explanatory power.	Balances explanatory power against model complexity.

When to Use Each

Use Regular R² for simple linear regression (one variable) or when comparing models with the same number of variables.
Use Adjusted R² for multiple regression or whenever you're comparing models with a different number of variables. It's essential for model selection and protecting against overfitting.
Quick Decision Rule: Think of Regular R² as the raw test score and Adjusted R² as the score after a "curve" that accounts for the difficulty (complexity). For choosing the best model, you almost always want to use Adjusted R².

Appendix: Pearson's Correlation Coefficient (r)

Pearson's r is a measure that quantifies the strength and direction of a linear relationship between two continuous variables. Its value is always between -1 and +1.

r = +1: Perfect positive linear relationship.
r = 0: No linear relationship.
r = -1: Perfect perfect negative linear relationship.

In simple linear regression, the connection is direct: $R^{2} = r^{2}$ . For example, if the correlation (r) between study hours and exam scores is +0.8, the R-squared ( $R^{2}$ ) would be 0.64, meaning that study hours explain 64% of the variance in exam scores.

More information (Same concept different words.)

R² (R-squared) and the coefficient of determination are exactly the same thing. They are just two different names for the identical statistical measure.

Why Two Names?

"Coefficient of determination" is the formal statistical term that describes what the measure actually does - it determines how much of the variance is explained

"R-squared" or "R²" is the mathematical notation, where the "R" comes from the correlation coefficient (Pearson's r), and squaring it gives us this measure

The Connection

The relationship becomes clearer when you consider:

In simple linear regression, R² literally equals the square of the Pearson correlation coefficient (r) between predicted and actual values

Hence: R² = r²

Common Usage

In practice, you'll see both terms used interchangeably:

Academic papers might use "coefficient of determination" for formal precision

Data scientists and practitioners often just say "R-squared" for brevity

Documentation might write "R² (coefficient of determination)" to be clear

So when you see either term in the context of neural networks or any regression analysis, they're referring to the same metric that measures the proportion of variance explained by the model.

R² (R-squared) in the context of neural networks is a statistical measure that indicates how well the model's predictions match the actual data. It's borrowed from traditional statistics and represents the coefficient of determination.

What R² Measures

R² tells you the proportion of variance in the target variable that your neural network can explain. It ranges from 0 to 1 (though can be negative for very poor models):

R² = 1: Perfect prediction - the model explains all variability in the target

R² = 0: The model performs no better than simply predicting the mean

R² < 0: The model performs worse than predicting the mean (possible with test data)

Mathematical Definition

R² is calculated as:

R² = 1 - (SS_res / SS_tot)

Where:

SS_res (residual sum of squares) = Σ(y_actual - y_predicted)²

SS_tot (total sum of squares) = Σ(y_actual - y_mean)²

Use in Neural Networks

In neural networks, R² is primarily used for:

Regression tasks - It's most appropriate when your network outputs continuous values

Model evaluation - Comparing how well different architectures or hyperparameters perform

Interpretability - Providing a more intuitive metric than raw loss values (MSE, MAE)

Important Considerations

While R² is useful, it has limitations in neural network contexts:

It's not suitable for classification tasks (use accuracy, F1-score, etc. instead)

High R² doesn't necessarily mean your model generalizes well (overfitting can inflate R²)

For complex, non-linear relationships that neural networks often model, R² might not capture all aspects of model performance

Unlike simpler models, adding parameters to neural networks doesn't automatically increase R²

In practice, R² serves as one metric among several for evaluating regression neural networks, particularly useful when you need to communicate model performance to stakeholders familiar with traditional statistical measures.

The regular R² is the simpler, original formula:

R² = 1 - (SS_res / SS_tot)

Or equivalently:

R² = (SS_explained / SS_tot)

Where:

SS_res = Σ(y - ŷ)² = Sum of squared residuals (errors)
SS_tot = Σ(y - ȳ)² = Total sum of squares from mean
SS_explained = SS_tot - SS_res = Variance explained by model

What We're Actually Squaring

We're squaring the distance of each data point from the mean:

Total Sum of Squares = Σ(y - ȳ)²
                          ↑
                    This gets squared!

Step-by-Step Example

Let's say you have test scores: 70, 80, 90

Step 1: Find the mean

ȳ = (70 + 80 + 90) / 3 = 80

Step 2: Find each difference from mean

Student 1: 70 - 80 = -10
Student 2: 80 - 80 = 0
Student 3: 90 - 80 = +10

Step 3: Square each difference

Student 1: (-10)² = 100  ← This is a "square"
Student 2: (0)² = 0       ← This is a "square"
Student 3: (+10)² = 100   ← This is a "square"

Step 4: Sum all the squares

Total Sum of Squares = 100 + 0 + 100 = 200

Why Do We Square?

Problem Without Squaring:

Differences: -10, 0, +10
Sum: -10 + 0 + 10 = 0  ← Cancels out!

The negative and positive differences cancel each other, suggesting no variance when there clearly is!

Solution With Squaring:

Squared differences: 100, 0, 100
Sum: 100 + 0 + 100 = 200  ← Shows actual spread!

Visual Representation

Imagine each squared difference as an actual square:

Student 1 (70):          Student 3 (90):
┌──────────┐             ┌──────────┐
│          │             │          │
│   100    │ 10×10       │   100    │ 10×10
│          │             │          │
└──────────┘             └──────────┘
     ↑                          ↑
  Area = (-10)²              Area = 10²

Student 2 (80):
• (no square, difference = 0)

The "Total Sum of Squares" is literally the sum of all these square areas!

Why "Squares" Instead of Absolute Values?

We could use absolute values: |y - ȳ|

But squaring has advantages:

Mathematical: Derivatives are easier (important for optimization)
Statistical: Links to variance and standard deviation
Penalizes outliers: Large errors get extra weight
- Difference of 2: squared = 4
- Difference of 10: squared = 100 (25× more penalty!)

In Context of R²

SS_tot = Σ(y - ȳ)²     = Total squared distances from mean
SS_res = Σ(y - ŷ)²     = Total squared distances from predictions
SS_explained = SS_tot - SS_res = Variance explained by model

R² = SS_explained/SS_tot = Proportion of "squares" explained

Real-World Analogy

Think of it like measuring how "wrong" each guess is:

Small miss (off by 2): Penalty = 4
Medium miss (off by 5): Penalty = 25
Big miss (off by 10): Penalty = 100

The "sum of squares" is your total penalty score. The model's job is to minimize this penalty!

The Name's Origin

The term comes from early statistics (early 1900s) when calculations were done by hand. Statisticians would literally:

Calculate differences
Square them (multiply by themselves)
Sum up all these squared values

Hence: "Sum of Squares" = Adding up all the squared differences

So when you hear "Total Sum of Squares," think: "Total amount of squared variation in the data" - it's measuring how spread out your data is from its average!

Regular R² vs Adjusted R²

R² and Adjusted R² - Side-by-Side Comparison

R² (R-squared)

R² = 1 - (SSres/SStot)

R² = 1 - [Σ(yi - ŷi)²] / [Σ(yi - ȳ)²]

Adjusted R²

R²adj = 1 - [(1 - R²) × (n - 1) / (n - p - 1)]

R²adj = 1 - [(SSres/(n - p - 1)) / (SStot/(n - 1))]

Breaking Down the Components

Component	Symbol	Meaning
SSres	Σ(yi - ŷi)²	Sum of squared residuals (errors)
SStot	Σ(yi - ȳ)²	Total sum of squares from "mean" (total variance)
n	n	Number of observations/samples
p	p	Number of predictors/features (excluding intercept)
yi	yi	Actual value
ŷi	ŷi	Predicted value
ȳ	ȳ	Mean of actual values

Alternative Form - Showing the Relationship

Starting from R²:

R² = 1 - (SSres/SStot)

Adjusted R² modifies this by adding degrees of freedom:

R²adj = 1 - [(SSres/SStot) × (n - 1)/(n - p - 1)]

Which can be rewritten as:

R²adj = 1 - [(1 - R²) × (n - 1)/(n - p - 1)]

Key Mathematical Differences

Penalty Term

The adjustment factor is: (n - 1)/(n - p - 1)

When p = 0 (no predictors): R²adj = R²
As p increases: The denominator (n - p - 1) decreases, making the penalty larger
The penalty becomes more severe with smaller sample sizes

Numerical Example

Let's say:

R² = 0.80
n = 100 samples
p = 5 predictors

Calculating R²:

R² = 0.80 (given)

Calculating Adjusted R²:

R²adj = 1 - [(1 - 0.80) × (100 - 1)/(100 - 5 - 1)]
R²adj = 1 - [0.20 × 99/94]
R²adj = 1 - [0.20 × 1.0532]
R²adj = 1 - 0.2106
R²adj = 0.7894

Why Adjusted R² < R²

From the formulas, we can see:

The factor (n - 1)/(n - p - 1) is always > 1 when p > 0
This multiplies the error term (1 - R²)
Therefore, adjusted R² always penalizes for additional predictors

When to Use Each

Use R² when:

Comparing models with the same number of predictors
Working with simple models
You want the raw explanatory power

Use Adjusted R² when:

Comparing models with different numbers of predictors
Concerned about overfitting
Need to account for model complexity
Feature selection decisions

The fundamental difference is that Adjusted R² includes a penalty for model complexity, making it more suitable for model selection in neural networks and machine learning where we often have many parameters.

In other words,

Regular R² (Coefficient of Determination)

The regular R² is the simpler, original formula:

R² = 1 - (SS_res / SS_tot)

Or equivalently:

R² = (SS_explained / SS_tot)

Where:

SS_res = Σ(y - ŷ)² = Sum of squared residuals (errors)
SS_tot = Σ(y - ȳ)² = Total sum of squares
SS_explained = SS_tot - SS_res = Variance explained by model

The Key Difference

Regular R² has a fundamental flaw: it ALWAYS increases when you add more variables, even if they're completely useless!

Example Showing the Problem:

Let's say you're predicting house prices:

Model	Variables	R²	Adjusted R²
Model 1	Square footage only	0.70	0.69
Model 2	+ Number of bedrooms	0.75	0.73
Model 3	+ Zip code	0.80	0.77
Model 4	+ Owner's birthday	0.81	0.76 ⬇️
Model 5	+ Favorite color	0.82	0.74 ⬇️

Notice:

Regular R² keeps going up (0.70 → 0.82) even with nonsense variables
Adjusted R² starts decreasing (0.77 → 0.74) when we add useless variables!

Why Regular R² Always Increases

Mathematically, adding any variable (even random noise) gives the model more "flexibility" to fit the training data:

More parameters = more ways to reduce SS_res
SS_tot stays the same
Therefore, R² = 1 - (SS_res/SS_tot) must increase

The Adjusted R² Solution

Adjusted R² adds a penalty for each additional variable:

Adjusted R² = 1 - [(1-R²)(n-1)/(n-k-1)]

Breaking this down:

(1-R²) = Unexplained variance proportion
(n-1)/(n-k-1) = Penalty factor that increases with more predictors (k)
As k increases, the denominator (n-k-1) gets smaller, making the fraction larger
This increases the subtracted term, lowering Adjusted R²

Numerical Example

Dataset: 100 observations (n=100)

Model with 1 predictor (k=1):

R² = 0.60
Adjusted R² = 1 - [(1-0.60)(99)/(98)]
Adjusted R² = 1 - [0.40 × 1.0102]
Adjusted R² = 0.596 (barely different)

Model with 10 predictors (k=10):

R² = 0.65 (higher!)
Adjusted R² = 1 - [(1-0.65)(99)/(89)]
Adjusted R² = 1 - [0.35 × 1.112]
Adjusted R² = 0.611 (less impressive gain)

Model with 50 predictors (k=50):

R² = 0.80 (much higher!)
Adjusted R² = 1 - [(1-0.80)(99)/(49)]
Adjusted R² = 1 - [0.20 × 2.02]
Adjusted R² = 0.596 (actually WORSE than simpler model!)

When to Use Each

Use Regular R²:

Comparing models with the same number of predictors
Simple linear regression (one predictor)
When you want to know raw explanatory power
Academic requirement for basic reporting

Use Adjusted R²:

Comparing models with different numbers of predictors
Multiple regression (several predictors)
Model selection (choosing best model)
Preventing overfitting

Quick Decision Rule

if number_of_predictors == 1:
    use_regular_R2()  # They're nearly identical
elif comparing_models_with_different_predictors:
    use_adjusted_R2()  # Must account for complexity
else:
    report_both()  # Full transparency

The Bottom Line

Regular R²: Pure measure of variance explained, but naive about model complexity
Adjusted R²: Smarter measure that balances fit quality against model simplicity

Think of Regular R² as raw score and Adjusted R² as the grade after curve adjustment. Adjusted R² essentially asks: "Is this variable improving the model enough to justify making it more complex?" If not, Adjusted R² will decrease even though Regular R² increases!

Extra Reading:

Pearson's r (Pearson correlation coefficient) is a statistical measure that quantifies the linear relationship between two continuous variables. It tells you both the strength and direction of a linear association.

What It Measures

Pearson's r captures:

Direction: Whether variables move together (positive) or in opposite directions (negative)
Strength: How closely the points follow a straight line
Range: Always between -1 and +1

Interpretation

r = +1: Perfect positive linear relationship (as X increases, Y increases perfectly)
r = 0: No linear relationship (variables are linearly independent)
r = -1: Perfect negative linear relationship (as X increases, Y decreases perfectly)
|r| > 0.7: Generally considered strong correlation
0.3 < |r| < 0.7: Moderate correlation
|r| < 0.3: Weak correlation

Mathematical Formula

r = Σ[(xi - x̄)(yi - ȳ)] / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]

Or in a more intuitive form:

r = covariance(X,Y) / (std_dev(X) × std_dev(Y))

This is essentially the standardized covariance - it measures how variables vary together, normalized by their individual variations.

Visual Examples

r ≈ +0.9:  •     • •
          • • • •
        • • •
      • •

r ≈ 0:    • • •
        •   •   •
          • • •
        •   •   •

r ≈ -0.9:  • •
            • • •
              • • • •
                • •

Key Limitations

Only measures LINEAR relationships: Pearson's r can be near zero even with strong non-linear relationships (e.g., parabolic, exponential)
Sensitive to outliers: A single extreme point can dramatically change r
Correlation ≠ Causation: Even r = 0.99 doesn't mean one variable causes the other
Assumes normal distribution: Most reliable when both variables are roughly normally distributed

Relationship to R²

In simple linear regression: R² = r²
R² tells you the proportion of variance explained
r tells you the direction and strength of the linear relationship
Example: r = -0.8 means strong negative correlation; R² = 0.64 means 64% of variance explained

Practical Example

Temperature vs Ice Cream Sales:

r ≈ +0.85: Strong positive correlation
As temperature rises, ice cream sales tend to increase
R² ≈ 0.72: Temperature explains about 72% of the variation in ice cream sales

Pearson's r is fundamental in statistics and machine learning, particularly for feature selection, understanding variable relationships, and as a building block for more complex analyses.

Observed Values (y)

The actual, true values from your dataset

The ground truth labels/targets you're trying to predict

What actually happened in reality
What you're hoping your model will produce or output

Example: The actual house price of $500,000

Predicted Values (ŷ)

The actual output from your neural network

What the model thinks the value should be based on the input features

The result after forward propagation through all layers

Example: The neural network's prediction of $485,000 for that house

The Relationship

The whole point of training is to minimize the difference between these two:

Loss function measures the difference (e.g., MSE = mean of (y - ŷ)²)

Backpropagation adjusts weights to make predicted values closer to observed values

Perfect model would have predicted values = observed values (never happens in practice)

During training:

You feed inputs → neural network produces predicted values

You compare these to observed values from your training set

The difference drives the learning process

Observed values are what you're trying to match (hoping to see) , predicted values are what your model produces.
Here are 5 interview questions about R² (R-squared) in neural networks and general machine learning:

Questions

1. Fundamental Understanding What is R² (coefficient of determination) and how is it calculated? What does an R² value of 0.7 mean in practical terms?

2. Interpretation Challenges Why can R² sometimes be negative when evaluating a model on test data? What does this indicate about model performance?

3. Neural Networks vs Linear Models When using R² as a metric for neural network regression tasks, what are some key differences or considerations compared to using it with linear regression models?

4. Limitations and Alternatives What are the main limitations of using R² as the sole evaluation metric for regression problems? What complementary metrics would you recommend using alongside R²?

5. Practical Scenario You've trained a neural network for a regression task and obtained an R² of 0.95 on training data but 0.3 on validation data. What might be happening and how would you address it?

Answers

1. Fundamental Understanding R² measures the proportion of variance in the dependent variable that's predictable from the independent variables. It's calculated as: R² = 1 - (SS_res / SS_tot), where SS_res is the sum of squared residuals (Errors) and SS_tot is the total sum of squares (from Mean.) An R² of 0.7 means the model explains 70% of the variance in the target variable, with the remaining 30% unexplained.

2. Interpretation Challenges R² can be negative on test data when the model performs worse than a horizontal line at the mean of the test set. This happens when SS_res > SS_tot, typically indicating the model is making predictions that are systematically far from the actual values - often due to overfitting on training data or distribution shift between train and test sets.

3. Neural Networks vs Linear Models Key considerations include: (a) Neural networks can capture non-linear relationships, potentially achieving higher R² than linear models on complex data; (b) R² in neural networks is more prone to overfitting due to high model capacity; (c) The interpretation is less straightforward since neural networks don't provide simple coefficient interpretations; (d) R² should be monitored on validation sets during training to detect overfitting early.

4. Limitations and Alternatives Limitations: R² doesn't indicate whether predictions are biased, can be artificially inflated by adding parameters, doesn't show if the model violates assumptions, and can be misleading for non-linear relationships. Complementary metrics: MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), MAPE (Mean Absolute Percentage Error), residual plots, and prediction interval coverage.

5. Practical Scenario This is classic overfitting. The model has memorized training data but fails to generalize. Solutions include: (a) Add regularization (L1/L2, dropout); (b) Reduce model complexity (fewer layers/neurons); (c) Increase training data or use data augmentation; (d) Implement early stopping based on validation R²; (e) Use cross-validation to better assess generalization; (f) Check for data leakage or distribution differences between train/validation sets.

Artificial Intelligence Theory and Application

Regular R² vs Adjusted R²

Understanding R-squared (The Coefficient of Determination)

What Does R² Measure?

R-squared (R2) and the coefficient of determination are two names for the exact same statistical measure. It's one of the most common metrics used to evaluate how well a regression model fits the data.

Why Two Names?

The Math Behind R-squared

The formula for R2 is a ratio of how much variance the model explains versus the total variance in the data.

The Formula

Why "Sum of Squares"?

Regular R² vs. Adjusted R²

The Problem Illustrated

The Formulas Compared

When to Use Each

Appendix: Pearson's Correlation Coefficient (r)

What We're Actually Squaring

Step-by-Step Example

Step 1: Find the mean

Step 2: Find each difference from mean

Step 3: Square each difference

Step 4: Sum all the squares

Why Do We Square?

Problem Without Squaring:

Solution With Squaring:

Visual Representation

Why "Squares" Instead of Absolute Values?

In Context of R²

Real-World Analogy

The Name's Origin

Regular R² vs Adjusted R²

R² and Adjusted R² - Side-by-Side Comparison

R² (R-squared)

Adjusted R²

Breaking Down the Components

Alternative Form - Showing the Relationship

Key Mathematical Differences

Penalty Term

Numerical Example

Why Adjusted R² < R²

When to Use Each

Regular R² (Coefficient of Determination)

The Key Difference

Example Showing the Problem:

Why Regular R² Always Increases

The Adjusted R² Solution

Numerical Example

When to Use Each

Use Regular R²:

Use Adjusted R²:

Quick Decision Rule

The Bottom Line

What It Measures

Interpretation

Mathematical Formula

Visual Examples

Key Limitations

Relationship to R²

Practical Example

Observed Values (y)

The actual, true values from your dataset The ground truth labels/targets you're trying to predict What actually happened in realityWhat you're hoping your model will produce or output Example: The actual house price of $500,000

Predicted Values (ŷ)

The actual output from your neural network What the model thinks the value should be based on the input features The result after forward propagation through all layers Example: The neural network's prediction of $485,000 for that house

The Relationship

Questions

Answers

Comments

Post a Comment

Popular posts from this blog

Simple Linear Regression - and Related Regression Loss Functions

What problems can AI Neural Networks solve

Activation Functions in Neural Networks

R-squared ( $R^{2}$ ) and the coefficient of determination are two names for the exact same statistical measure. It's one of the most common metrics used to evaluate how well a regression model fits the data.

The formula for $R^{2}$ is a ratio of how much variance the model explains versus the total variance in the data.

The actual, true values from your dataset

The ground truth labels/targets you're trying to predict

What actually happened in reality
What you're hoping your model will produce or output

Example: The actual house price of $500,000

The actual output from your neural network

What the model thinks the value should be based on the input features

The result after forward propagation through all layers

Example: The neural network's prediction of $485,000 for that house