Plotnine: Python Port of R’s GGplot

People who love Python but rely on R’s GGplot for visualization might want to explore Plotnine. Plotnine is a Python implementation of R’s GGPlot and has exactly same API. Further, I love the fact that it works directly with Pandas DataFrame and thereby fits perfectly with the data analytics process. Below are few examples to demonstrate it’s power and basic usage.

from sklearn import datasets
from plotnine import *
import pandas as pd

data = datasets.load_boston()
df = pd.DataFrame(
         data['data'], 
         columns=data['feature_names']
)
df['target'] = data['target']
df[list(df.columns)[0:4] + ["target"]].sample(4)
CRIM ZN INDUS CHAS target
256 0.01538 90.0 3.75 0.0 44.0
23 0.98843 0.0 8.14 0.0 14.5
50 0.08873 21.0 5.64 0.0 19.7
464 7.83932 0.0 18.10 0.0 21.4

Example 1: A simple density plot for the target variable

(
ggplot(df)
+ geom_density(aes("target"))
)

plotnine_2_0

Example 2: Density Plot with bells and whistles

  • theme_...: Use theme to stylize chart. There are many different themes available ranging from professional looking to informal. I particularly enjoy using xkcd
  • theme(figure_size=(...)): In order to change the size of the figure, using figure_size attribute.
  • np.log(...): Notice that I am converting target variable to log scale. You can use any other expression to transform values.
import numpy as np

(
ggplot(df, aes("np.log(target)"))
+ geom_density()
+ theme_seaborn()
+ xlab("House Price") + ylab("Density")
+ theme(figure_size=(10, 5))
)

plotnine_4_0

 

Example 3: Using facet_wrap

One of the best features of GGplot is facet_wrap. It allows to render multiple plots associated with different variables. Luckily, facet_wrap is available as part of the plotnine library

meltedDF = pd.melt(df, value_vars=['AGE', 'target', 'TAX'])
(
ggplot(meltedDF, aes("np.log(value)", fill="variable"))
+ geom_density(alpha=0.5)
+ theme_xkcd()
+ facet_wrap("~variable")
+ theme(figure_size=(20, 5))
+ xlab("Value (Log Scale)") + ylab("Density")
)

plotnine_6_0

 
Few other tips:
1. Hiding Legend: scale_color_discrete(guide=False)
2. Change Legend Title: scale_fill_discrete(name="New Title")
3. Rotate labels: theme(axis_text_x=element_text(rotation=90, hjust=1))

Note:
An alternative port of GGplot is yhat. I tried using yhat long back and it was missing lot of critical features. I haven’t tried in recent months. Let me know if you have any recent experience with yhat.

Advertisements