Creating Interactive Data Exploration Dashboard In Python In 2 minutes

Visualization has been an Achilles heel for Python. Personally, this changed for me since I discovered Plotnine, an excellent port of R’s GGPlot in Python. Plotnine is great to visualize data when you already know the subset of data that you want to visualize. However, often you want to explore the dataset and need the ability to subset the dataset on the fly. For instance, below screencast of what exactly I was looking for. Notice, how changing the slider is updating both the bullet list and the graph.

widgets.gif

To be clear, I was looking for a solution that can meet the following conditions:

  1. The solution should be oblivious to the libraries I am using. In other words, I shouldn’t be forced to learn a new visualization library or toolkit such as plotly, bokeh, etc.
  2. It should give me full control in terms of logic and should be able to express them in pure Python.
  3. I wanted interactive visualization mainly for my exploration and not for sharing the dashboard with someone else. Hence, agility and integration with Jupyter Notebook was critical to me.

Luckily this is now possible using IPython Widgets. Below is an example of how to create interactive visualization using IPython Widgets. Below code demonstrate a quick way to wrap all your plots and other information into an interactive visualization with a dynamic filter.

%matplotlib inline
import math
import numpy as np
from sklearn import datasets
import pandas as pd

from plotnine import *

from ipywidgets import interact
import ipywidgets as widgets
from IPython.display import HTML

# Download California Housing Dataset 
data = datasets.fetch_california_housing()
housingDF = pd.DataFrame(data.data, columns = data.feature_names)

## We will use a range slider to filter on median income. 
## Changing the median income bracket should update all my analytics. 
income_slider = widgets.IntRangeSlider(
    value=[math.floor(np.min(housingDF.MedInc)), math.ceil(np.max(housingDF.MedInc))],
    min=math.floor(np.min(housingDF.MedInc)),
    max=math.ceil(np.max(housingDF.MedInc)),
    step=1,
    description='Median Income:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d',
)

## We use a global variable to keep track of 
## of various filter options. Such as the range of median income current selected
vizFilter = {
    'ave_occup': None
}


# This function defines the main logic. It contains all the operations on the data and 
# generates all the necessary visualizations. Whenever the filters change, 
# we call this method. 
def update():

    # use the vizFilter to create new subset of data.
    tmpDF = housingDF[
        (housingDF.MedInc >= vizFilter['MedInc'][0])
        & (housingDF.MedInc <= vizFilter['MedInc'][1])
    ]

    # use display function to write HTML and display graphs and
    # other information
    display(HTML("<h1>Filters</h1>"))
    display(HTML("""
    <ul>
        <li>Median Income: {}</li>
        <li>Filtered Dataset Size: {}
    </ul>
    """.format(
        vizFilter['MedInc'],
        tmpDF.shape[0]
    )))

    # using plotnine (ggplot) to generate new plot. I love
    # the fact that now I don't have to learn a new library 
    # and continue to use what I am comfortable with. 
    display(
        ggplot(tmpDF, aes(x='Population', y='AveOccup'))
        + geom_point()
        + theme_linedraw()
    )

## This method is called when the Median Income range filter is updated. 
def updateMedInc(x):
    vizFilter['MedInc'] = x
    update()

# Assoicate income slider with method. Below
# x correspond to the variable that updateMedInc requires. 
interact(updateMedInc, x=income_slider)

# you can have additional interact commands as 
# interact(updateXXX, x=second_slider or multiselection, etc)
Advertisements