Google Web History: A Gold Mine of Personal Information (Part II)

At the end of Part I, I showed some word clouds that were derived out of Google Web History data. The word cloud visualization is useful in identifying important terms. But its simple single dimensional representation is unable to answer many questions that are of important to us. For instance, it doesn’t display relationships between different terms, how topics have evolved over time, etc.

To find answers to some of these questions, I started playing with semantic networks and explored graph based visualization. Below are some results of my quest.

Plot 1: JQuery Plot

In order to build the above graph, I randomly selected a term (in the above graph the term is “jquery”) and extracted all the terms that are within 2 degree of separation. In the above graph, nodes indicate various search terms and edges indicate relations between these terms. Node size further encodes frequency of the search term and color indicates temporal information. The darker the color gets (yellow, orange and red), the more recently I searched for that term. For instance about two years ago I used to work on PHP but for last year I am using ruby instead of PHP. From the graph you can notice that since  PHP node has yellow color whereas ruby node is red color. Edge thickness indicates the frequency of the bi-gram. Below is another example. The graph below is related to the term “clustering”

Plot 2: Cluster Plot

Note on the process

  1. Extract Google Web History data
  2. Clean queries: lowercase transformation, tokenization, stemming, etc
  3. Use Hadoop/MapReduce to compute uni-grams and bi-grams
  4. Use R to explore and build graph visualizations

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s