Leveraging Google Forms for Labeled-Data Collection

Often the challenge of starting a machine learning problem is lack of labeled data, especially when working on a classification problem. I recently encountered this problem while working on building a model that can route bugs to appropriate teams. For this, I needed to classify bugs into one of the few predefined categories. However, the challenge I had was that there was no prior data that I could have used for supervised classification. Hence I decided to conduct a survey and ask several engineers from different teams to examine few bugs and classify them into one of our predefined categories.

For long I wondered of re-purposing Google Forms to collect this kind of data. The idea I had was to put sample bugs in a google sheet and automatically generate google forms, each form containing few bugs and send different forms to different people. Below is an example of one of the created google form. The sample form shows tweet id and actual tweet message and asks the user to indicate whether it’s related to weather or not. Each survey contains 3 tweets.

Untitled

Below I have documented how I was able to generate this kind of survey using Google Sheet and Google App Script.

Step 1: Create a google sheet where each row corresponds to a data point. For instance, this google sheet contains few tweets messages and let’s assume the task is to identify which of these tweets are related to weather. For each tweet, we have a user id and the date of the tweet.

Step 2: Create a custom app script to iterate over the data in the sheet and dynamically generate google forms. Below is a sample app script that you can use to generate 2 forms each containing 3 tweets for which we want to collect labels. In the google sheet, click on Tools > Script Editor and put the below script over there.

// Number of Tweets In A Form
var NUM_TWEETS_PER_FORM = 3;

// Number of Forms to Generate
var NUM_FORMS = 2;


var generateForm = function(points_per_form, num_forms){

  //use output variable to return form id and published url
  output = [];

  // Some description message that appears at the top of the survey form. 
  var surveyDesc = 'Please indicate which of the below tweet message is related to weather. For more information this survey please check out this webpage';

  // Fetch Data points from the sheet
  var data = SpreadsheetApp.getActiveSheet().getDataRange().getValues();

  // cursor to keep track of row number in sheet. 
  var cursor = 1;

  // iterate over forms
  for(var fidx = 0; fidx < num_forms; fidx++){

    var form = FormApp.create("Weather Tweet Survey Form " + fidx.toString()) // Form title
                      .setCollectEmail(true)  // collect email address
                      .setLimitOneResponsePerUser(true) // limit one user to one response
                      .setAllowResponseEdits(true) // allow user to edit responses
                      .setProgressBar(true) // show progress bar
                      .setDescription(surveyDesc) // show help message
                      .setRequireLogin(false); // require users to login via google account


    // iterate over tweets
    for(var tidx = 0; tidx < points_per_form; tidx++){

      // fetch tweet related data
      row = data[cursor++];
      var id = row[0];
      var tweet = row[1];
      var pubdate = row[2]; 

      // create a section. We use section as a way to generate multipage form where each page
      // will show a single tweet and associated questions
      form.addSectionHeaderItem()
          .setTitle("Tweet ID: " + id)  // set section title as the id of the tweet
          .setHelpText("Tweet: " + tweet);       // set description as the tweet itself.

      // add label gathering questions over here. Note that the question title contains tweet id. This will 
      // be useful for associating responses back to the original tweet messages. 
      form.addMultipleChoiceItem()
          .setRequired(true)
          .setTitle(id + ": Is this tweet related to weather")  
          .setChoiceValues([
            "Yes", 
            "No",
            "Not Sure"
           ]);

      // you can add more questions if required

      form.addPageBreakItem();

    }

    //capture form id and form url. Form id is required to fetch responses using app script    
    output.push([form.getId(), form.getPublishedUrl()])

  }

  return output;

};

function main(){
  output = generateForm(NUM_TWEETS_PER_FORM, NUM_FORMS);
  for(var idx=0; idx < output.length; idx++){
    Logger.log(output[idx][0] + " --> " + output[idx][1]);
  }
}

In the script editor page, now select the main function from the drop-down menu and click on run button. Running the script will generate two forms that you can verify that look like the above image by going to forms.google.com.

Few Gotchas:

  1. FormApp.create function requires paid version of GSuite. I was able to validate that the script works using our corporate google suite and used a version of it to generate several forms and collect responses.

  2. Google form has no concept of hidden field. I thought of using a hidden field to pass tweet id. That way I don’t have put tweet id along with the question.

  3. Users have to go through the whole form before they can hit submit button. All the work is lost if the users stops in between.

  4. Since responses get spread across different forms, maintaining the forms and responses can be tricky. You have to be careful and test out the complete approach from creating the form to how you will collate all the responses.

Advertisements