11.1 C
London
Sunday, July 7, 2024
HomePandas in PythonDataFrame Functions in PythonPandas: How to Use factorize() to Encode Strings as Numbers

Pandas: How to Use factorize() to Encode Strings as Numbers

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

The pandas factorize() function can be used to encode strings as numeric values.

You can use the following methods to apply the factorize() function to columns in a pandas DataFrame:

Method 1: Factorize One Column

df['col1'] = pd.factorize(df['col'])[0]

Method 2: Factorize Specific Columns

df[['col1', 'col3']] = df[['col1', 'col3']].apply(lambda x: pd.factorize(x)[0])

Method 3: Factorize All Columns

df = df.apply(lambda x: pd.factorize(x)[0])

The following example shows how to use each method with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'conf': ['West', 'West', 'East', 'East'],
                   'team': ['A', 'B', 'C', 'D'],
                   'position': ['Guard', 'Forward', 'Guard', 'Center'] })

#view DataFrame
df

   conf team position
0  West    A    Guard
1  West    B  Forward
2  East    C    Guard
3  East    D   Center

Example 1: Factorize One Column

The following code shows how to factorize one column in the DataFrame:

#factorize the conf column only
df['conf'] = pd.factorize(df['conf'])[0]

#view updated DataFrame
df

	conf	team	position
0	0	A	Guard
1	0	B	Forward
2	1	C	Guard
3	1	D	Center

Notice that only the ‘conf’ column has been factorized.

Every value that used to be ‘West’ is now 0 and every value that used to be ‘East’ is now 1.

Example 2: Factorize Specific Columns

The following code shows how to factorize specific columns in the DataFrame:

#factorize conf and team columns only
df[['conf', 'team']] = df[['conf', 'team']].apply(lambda x: pd.factorize(x)[0])

#view updated DataFrame
df

        conf	team	position
0	0	0	Guard
1	0	1	Forward
2	1	2	Guard
3	1	3	Center

Notice that the ‘conf’ and ‘team’ columns have both been factorized.

Example 3: Factorize All Columns

The following code shows how to factorize all columns in the DataFrame:

#factorize all columns
df = df.apply(lambda x: pd.factorize(x)[0])

#view updated DataFrame
df

     conf	team	position
0	0	0	0
1	0	1	1
2	1	2	0
3	1	3	2

Notice that all of the columns have been factorized.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Convert Pandas DataFrame Columns to Strings
How to Convert Categorical Variable to Numeric in Pandas
How to Convert Pandas DataFrame Columns to Integer

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories