29.3 C
London
Thursday, June 19, 2025
HomeStatistics TutorialStatologyMultivariate Adaptive Regression Splines in Python

Multivariate Adaptive Regression Splines in Python

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Multivariate adaptive regression splines (MARS) can be used to model nonlinear relationships between a set of predictor variables and a response variable.

This method works as follows:

1. Divide a dataset into k pieces.

2. Fit a regression model to each piece.

3. Use k-fold cross-validation to choose a value for k.

This tutorial provides a step-by-step example of how to fit a MARS model to a dataset in Python.

Step 1: Import Necessary Packages

To fit a MARS model in Python, we’ll use the Earth() function from sklearn-contrib-py-earth. We’ll start by installing this package:

pip install sklearn-contrib-py-earth

Next, we’ll install a few other necessary packages:

import pandas as pd
from numpy import mean
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.datasets import make_regression
from pyearth import Earth

Step 2: Create a Dataset

For this example we’ll use the make_regression() function to create a fake dataset with 5,000 observations and 15 predictor variables:

#create fake regression data
X, y = make_regression(n_samples=5000, n_features=15, n_informative=10,
                       noise=0.5, random_state=5)

Step 3: Build & Optimize the MARS Model

Next, we’ll use the Earth() function to build a MARS model and the RepeatedKFold() function to perform k-fold cross-validation to evaluate the model performance.

For this example we’ll perform 10-fold cross-validation, repeated 3 times.

#define the model
model = Earth()

#specify cross-validation method to use to evaluate model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

#evaluate model performance
scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error',
                         cv=cv, n_jobs=-1)

#print results
mean(scores)

-1.745345918289

From the output we can see that the mean absolute error (ignore the negative sign) for this type of model is 1.7453.

In practice we can fit a variety of different models to a given dataset (like Ridge, Lasso, Multiple Linear Regression, Partial Least Squares, Polynomial Regression, etc.) and compare the mean absolute error among all models to determine the one that produces the lowest MAE.

Note that we could also use other metrics to measure error such as adjusted R-squared or mean squared error.

You can find the complete Python code used in this example here.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories