Visualization has been an Achilles heel for Python. Personally, this changed for me since I discovered Plotnine, an excellent port of R’s GGPlot in Python. Plotnine is great to visualize data when you already know the subset of data that you want to visualize. However, often you want to explore the dataset and need the ability to subset the dataset on the fly. For instance, below screencast of what exactly I was looking for. Notice, how changing the slider is updating both the bullet list and the graph.
To be clear, I was looking for a solution that can meet the following conditions:
- The solution should be oblivious to the libraries I am using. In other words, I shouldn’t be forced to learn a new visualization library or toolkit such as plotly, bokeh, etc.
- It should give me full control in terms of logic and should be able to express them in pure Python.
- I wanted interactive visualization mainly for my exploration and not for sharing the dashboard with someone else. Hence, agility and integration with Jupyter Notebook was critical to me.
Luckily this is now possible using IPython Widgets. Below is an example of how to create interactive visualization using IPython Widgets. Below code demonstrate a quick way to wrap all your plots and other information into an interactive visualization with a dynamic filter.
%matplotlib inline import math import numpy as np from sklearn import datasets import pandas as pd from plotnine import * from ipywidgets import interact import ipywidgets as widgets from IPython.display import HTML # Download California Housing Dataset data = datasets.fetch_california_housing() housingDF = pd.DataFrame(data.data, columns = data.feature_names) ## We will use a range slider to filter on median income. ## Changing the median income bracket should update all my analytics. income_slider = widgets.IntRangeSlider( value=[math.floor(np.min(housingDF.MedInc)), math.ceil(np.max(housingDF.MedInc))], min=math.floor(np.min(housingDF.MedInc)), max=math.ceil(np.max(housingDF.MedInc)), step=1, description='Median Income:', disabled=False, continuous_update=False, orientation='horizontal', readout=True, readout_format='d', ) ## We use a global variable to keep track of ## of various filter options. Such as the range of median income current selected vizFilter = { 'ave_occup': None } # This function defines the main logic. It contains all the operations on the data and # generates all the necessary visualizations. Whenever the filters change, # we call this method. def update(): # use the vizFilter to create new subset of data. tmpDF = housingDF[ (housingDF.MedInc >= vizFilter['MedInc'][0]) & (housingDF.MedInc <= vizFilter['MedInc'][1]) ] # use display function to write HTML and display graphs and # other information display(HTML("<h1>Filters</h1>")) display(HTML(""" <ul> <li>Median Income: {}</li> <li>Filtered Dataset Size: {} </ul> """.format( vizFilter['MedInc'], tmpDF.shape[0] ))) # using plotnine (ggplot) to generate new plot. I love # the fact that now I don't have to learn a new library # and continue to use what I am comfortable with. display( ggplot(tmpDF, aes(x='Population', y='AveOccup')) + geom_point() + theme_linedraw() ) ## This method is called when the Median Income range filter is updated. def updateMedInc(x): vizFilter['MedInc'] = x update() # Assoicate income slider with method. Below # x correspond to the variable that updateMedInc requires. interact(updateMedInc, x=income_slider) # you can have additional interact commands as # interact(updateXXX, x=second_slider or multiselection, etc)