How to Read HTML Tables with Pandas (Including Example)

Example: Read HTML Table with Pandas

Before using the read_html() function, you’ll likely have to install lxml:

pip install lxml

Note: If you’re using a Jupyter notebook, you need to restart the kernel after performing this installation.

Next, we can use the read_html() function to read every HTML table on this Wikipedia page:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize

#read all HTML tables from specific URL
tabs = pd.read_html('https://en.wikipedia.org/wiki/National_Basketball_Association')

#display total number of tables read
len(tabs)

44

We can see that a total of 44 HTML tables were found on this page.

I know that the table I’m interested in has the word “Division” in it, so I can use the match argument to only retrieve HTML tables that contain this word:

#read HTML tables from specific URL with the word "Division" in them
tabs = pd.read_html('https://en.wikipedia.org/wiki/National_Basketball_Association',
                    match='Division')

#display total number of tables read
len(tabs)

1

I can then list the names of the columns of the table:

#define table
df = tabs[0]

#list all column names of table
list(df)

[('Division', 'Eastern Conference'),
 ('Team', 'Eastern Conference'),
 ('Location', 'Eastern Conference'),
 ('Arena', 'Eastern Conference'),
 ('Capacity', 'Eastern Conference'),
 ('Coordinates', 'Eastern Conference'),
 ('Founded', 'Eastern Conference'),
 ('Joined', 'Eastern Conference'),
 ('Unnamed: 8_level_0', 'Eastern Conference')]

I’m only interested in the first two columns, so I can filter the DataFrame to only contain these columns:

#filter DataFrame to only contain first two columns
df_final = df.iloc[:, 0:2]

#rename columns
df_final.columns = ['Division', 'Team']

#view first few rows of final DataFrame
print(df_final.head())

   Division                Team
0  Atlantic      Boston Celtics
1  Atlantic       Brooklyn Nets
2  Atlantic     New York Knicks
3  Atlantic  Philadelphia 76ers
4  Atlantic     Toronto Raptors

The final table contains only the ‘Division’ and ‘Team’ columns.

Additional Resources

The following tutorials explain how to read other types of files in pandas:

How to Read a Text File with Pandas
How to Read Excel Files with Pandas
How to Read CSV Files with Pandas

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

Canadian Tax – Personal Tax Deadline 2022

Example: Read HTML Table with Pandas

Additional Resources

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

ABOUT US

Latest

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Popular

How to Create a Stem-and-Leaf Plot in SPSS

How to Create a Correlation Matrix in SPSS

Excel: How to Use IF Function with Multiple Conditions

Sitemap