14.8 C
London
Friday, June 28, 2024
HomeStatistics TutorialStatologyMatplotlib vs. ggplot2: Which Should You Use?

Matplotlib vs. ggplot2: Which Should You Use?

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Two of the most popular data visualization libraries in all of data science are ggplot2 and Matplotlib.

The ggplot2 library is used in the R statistical programming language while Matplotlib is used in Python.

Although both libraries allow you to create highly customized data visualizations, ggplot2 generally allows you to do so in fewer lines of code compared to Matplotlib.

To illustrate this point, we’ll show how to create the same types of charts using both libraries.

Line Charts: ggplot2 vs. Matplotlib

The following code shows how to create a line chart using ggplot2:

library(ggplot2)

#create data frame
df frame(day=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                 sales=c(2, 4, 5, 8, 6, 12, 15, 19, 15, 22))

#create line chart
ggplot(df, aes(x=day, y=sales)) +
  geom_line(size=1.2, col='purple') +
  ggtitle('Sales by Day') +
  xlab('Day') +
  ylab('Sales')

And the following code shows how to create the same line chart using Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt 

#create DataFrame
df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'sales': [2, 4, 5, 8, 6, 12, 15, 19, 15, 22]})

#create line chart
plt.plot(df.day, df.sales, color='purple')
plt.title('Sales by Day', loc='left')
plt.ylabel('Sales')
plt.xlabel('Day')

For this example, the number of lines of code needed to generate each plot is roughly the same between ggplot2 and Matplotlib.

Scatter Plots: ggplot2 vs. Matplotlib

The following code shows how to create a scatter plot in ggplot2 in which the points are colored by category:

library(ggplot2)

#create data frame
df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 assists=c(1, 2, 2, 4, 5, 7, 8, 10),
                 points=c(4, 6, 10, 8, 12, 15, 22, 28))

#create scatter plot
ggplot(df, aes(x=assists, y=points)) +
  geom_point(aes(col=team), size=3)

And the following code shows how to create the same scatter plot using Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt 

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'assists': [1, 2, 2, 4, 5, 7, 8, 10],
                   'points': [4, 6, 10, 8, 12, 15, 22, 28]})

#define colors to use
color_list = [] 
for x in df['team']: 
    if x == 'A': color_list.append('#F8766D') 
    else: color_list.append('#00BFC4') 

#create scatter plot
plt.scatter(df.assists, df.points, c=color_list)
plt.ylabel('points')
plt.xlabel('assists')

Notice that we had to use many more lines of code in Matplotlib to generate the same plot as ggplot2.

Histograms: ggplot2 vs. Matplotlib

The following code shows how to create a histogram in ggplot2:

library(ggplot2)

#create data frame
df frame(x=c(2, 2, 4, 4, 4, 5, 5, 6, 7, 7, 8, 8,
                     10, 11, 11, 11, 12, 13, 14, 14))

#create scatter plot
ggplot(df, aes(x=x)) +
  geom_histogram(bins=6, fill='red', color='black') +
  ggtitle('My Histogram')

And the following code shows how to create a similar histogram using Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt 

#create DataFrame
df = pd.DataFrame({'x': [2, 2, 4, 4, 4, 5, 5, 6, 7, 7, 8, 8,
                         10, 11, 11, 11, 12, 13, 14, 14]})

#create histogram
plt.hist(df['x'], bins=6, color='red', ec='black')
plt.title('My Histogram', loc='left') 
plt.xlabel('x') 
plt.ylabel('Count')

Once again the Matplotlib version requires more lines of code than ggplot2.

Conclusion

Both ggplot2 and Matplotlib allow you to create highly customizable data visualizations, but ggplot2 tends to use less code.

Often the preference between ggplot2 and Matplotlib simply comes down to which programming language you use for data analysis.

People who use Python tend to use Matplotlib since they can perform their data analysis and create data visualizations using one programming language.

Conversely, people who use R tend to use ggplot2 because this allows them to perform all of their data analysis and visualizations in one programming language.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories