6.9 C
London
Thursday, December 19, 2024
HomeStatistics TutorialRHow to Calculate SST, SSR, and SSE in R

How to Calculate SST, SSR, and SSE in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

We often use three different sum of squares values to measure how well a regression line actually fits a dataset:

1. Sum of Squares Total (SST) – The sum of squared differences between individual data points (yi) and the mean of the response variable (y).

  • SST = Σ(yiy)2

2. Sum of Squares Regression (SSR) – The sum of squared differences between predicted data points (ŷi) and the mean of the response variable(y).

  • SSR = Σ(ŷiy)2

3. Sum of Squares Error (SSE) – The sum of squared differences between predicted data points (ŷi) and observed data points (yi).

  • SSE = Σ(ŷi – yi)2

The following step-by-step example shows how to calculate each of these metrics for a given regression model in R.

Step 1: Create the Data

First, let’s create a dataset that contains the number of hours studied and exam score received for 20 different students at a certain college:

#create data frame
df frame(hours=c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3,
                         3, 4, 4, 4, 5, 5, 6, 7, 7, 8),
                 score=c(68, 76, 74, 80, 76, 78, 81, 84, 86, 83,
                         88, 85, 89, 94, 93, 94, 96, 89, 92, 97))

#view first six rows of data frame
head(df)

  hours score
1     1    68
2     1    76
3     1    74
4     2    80
5     2    76
6     2    78

Step 2: Fit a Regression Model

Next, we’ll use the lm() function to fit a simple linear regression model using score as the response variable and hours as the predictor variable:

#fit regression model
model #view model summary
summary(model)

Call:
lm(formula = score ~ hours, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.6970 -2.5156 -0.0737  3.1100  7.5495 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  73.4459     1.9147  38.360  

Step 3: Calculate SST, SSR, and SSE

We can use the following syntax to calculate SST, SSR, and SSE:

#find sse
sse sum((fitted(model) - df$score)^2)
sse

[1] 331.0749

#find ssr
ssr sum((fitted(model) - mean(df$score))^2)
ssr

[1] 917.4751

#find sst
sst 

The metrics turn out to be:

  • Sum of Squares Total (SST): 1248.55
  • Sum of Squares Regression (SSR): 917.4751
  • Sum of Squares Error (SSE): 331.0749

We can verify that SST = SSR + SSE:

  • SST = SSR + SSE
  • 1248.55 = 917.4751 + 331.0749

We can also manually calculate the R-squared of the regression model:

  • R-squared = SSR / SST
  • R-squared = 917.4751 / 1248.55
  • R-squared = 0.7348

This tells us that 73.48% of the variation in exam scores can be explained by the number of hours studied.

Additional Resources

You can use the following calculators to automatically calculate SST, SSR, and SSE for any simple linear regression line:

SST Calculator
SSR Calculator
SSE Calculator

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories