Using R’s ggplot within IPython Notebook

As a Data Scientist I often use Python to write quick scripts to transform/massage data. But for data visualization I love using R’s gggplot. Although there is a version of ggplot written in python, I found it be lacking lot of features as compared to its R’s counterpart.

Luckily using IPython Notebook you can have the goodness of both the worlds. Ipython notebook (especially rpy2 package) allows to seamlessly transfer objects between python and R environment. Below is a brief explanation and code snippet of how data generated/processed in python can be visualized using R’s ggplot. [In hurry ! Sample notebook over here.]

Step 1: Load R Kenel within IPython using rpy2 package
Previously communication between R and Ipython notebook was handled by rmagic extension. Now most of this logic has been abstracted into its own python package known as rpy2. You can install rpy2 using the following command: pip install rpy2 --upgrade. Once rpy2 is installed, you can initialize R kernel within IPython Notebook using rpy2.ipython extension as shown below.

%load_ext rpy2.ipython

Step 2: Convert Data To Pandas Dataframe
If you already have some data available as pandas dataframe then feel free to use that data in the next step (and skip this step). If not, let’s randomly select 1000 points from normal distribution using numpy numpy and finally convert it to pandas dataframe. In the next step we will pass this dataframe to R’s ggplot library and plot the density curve.

import pandas as pd
import numpy as np
data = np.random.randn(5000, 1)
df = pd.DataFrame(data, columns=["value"])

Step 3: Using %%R cell magic function
Finally use %%R cell magic function and pass df (python object pointing to pandas dataframe) using -i parameter. rpy2 package will make it available within R’s environment by applying necessary transformations. Now we can do anything to this data, including visualizing using R’s ggplot library.

%%R -i df2 -w 800 -h 480 -u px
ggplot(df) + geom_density(aes(x=value))

Below is the list of some of the important parameters that can be passed to %%R magic function:

  • -i : Input python objects to be passed to R. Multiple names can be passed separated only by commas (and no whitespace)
  • -o : Output R objects to be passed back to python. Multiple names can be passed
  • -w : width of the chart
  • -h : height of the chart
  • -u : unit in which height/weight is specified

Full documentation on parameters can be found over here.

1. Revolution Analytics’ Blog On Using R With Jupyter Notebook
2. Stack Overflow

Posted in Data Mining, General, Python | Tagged , , , | Leave a comment