14.8 C
London
Tuesday, July 2, 2024
HomePythonFix Common Errors in PythonHow to Fix: No module named ‘sklearn.cross_validation’

How to Fix: No module named ‘sklearn.cross_validation’

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

One error you may encounter when using Python is:

ModuleNotFoundError: No module named 'sklearn.cross_validation'

This error usually occurs when you attempt to import the train_test_split function from sklearn using the following line:

from sklearn.cross_validation import train_test_split

However, the cross_validation sub-module has been replaced with the model_selection sub-module, so you need to use the following line instead:

from sklearn.model_selection import train_test_split

The following example shows how to resolve this error in practice.

How to Reproduce the Error

Suppose we would like to use the train_test_split function from sklearn to split a pandas DataFrame into training and testing sets.

Suppose we attempt to use the following code to import the train_test_split function:

from sklearn.cross_validation import train_test_split

ModuleNotFoundError: No module named 'sklearn.cross_validation' 

We receive an error because we used the wrong sub-module name when attempting to import the train_test_split function.

How to Fix the Error

To fix this error, we simply need to use the model_selection sub-module instead:

from sklearn.model_selection import train_test_split

This time we don’t receive any error.

We could then proceed to use the train_test_split function to split a pandas DataFrame into a training and testing set:

from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np

#make this example reproducible
np.random.seed(1)

#create DataFrame with 1000 rows and 3 columns
df = pd.DataFrame({'x1': np.random.randint(30, size=1000),
                   'x2': np.random.randint(12, size=1000),
                   'y': np.random.randint(2, size=1000)})

#split original DataFrame into training and testing sets
train, test = train_test_split(df, test_size=0.2, random_state=0)

#view first few rows of each set
print(train.head())

     x1  x2  y
687  16   2  0
500  18   2  1
332   4  10  1
979   2   8  1
817  11   1  0

print(test.head())

     x1  x2  y
993  22   1  1
859  27   6  0
298  27   8  1
553  20   6  0
672   9   2  1

We’re successfully able to use the train_test_split function without any error.

Additional Resources

The following tutorials explain how to fix other common errors in Python:

How to Fix: columns overlap but no suffix specified
How to Fix: ‘numpy.ndarray’ object has no attribute ‘append’
How to Fix: if using all scalar values, you must pass an index
How to Fix: ValueError: cannot convert float NaN to integer

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories