Machine Learning With AdWords Scripts And Google Prediction API

google-adwords-bigA9-1920

For many of us, analyzing our AdWords data starts with downloading a massive .CSV file into Excel and running various calculations and building charts. After that, we turn into fortune tellers, trying to read the analytical tea leaves of our data and predict what changes to make.

That analysis is time consuming, difficult, and biased by our personal experiences and emotions. Machine learning can help us fix that. Today, we are going to use the Google Prediction API and AdWords Scripts to predict the future.

Asking For A Prediction

With the Google Prediction API, you no longer need a dedicated team of PhDs to build and maintain an analytical model for your pay-per-click (PPC) data. All you need to do is format and push your data in, then ask it for a prediction. The more data you can provide, the more accurate that prediction should be.

The world of machine learning seems a little daunting at first, but I’m going to give you a crash course to help you get started quickly. I’ll start by saying that I’ve never taken any advanced statistics courses or programed anything in R, but I am able to use the Prediction API without a problem — and you can, too.

We need to start with a question, or something that we want our model to be able to predict. I’m going to build a model that is able to predict what the average CPC will be for a given temperature, wind speed, and weather condition in my account. Of course, we all know that weather impacts our bids, but this will tell me exactly how much of an impact I should expect.

Collect Historical Data

In order for my model to make a prediction, just like a student, I need to teach (or train) it with examples. That means I will need to collect historical weather data for my account. This will allow my model to understand the relationships between the data. It will use those training examples to return a prediction for a new query it has never seen before.

In this post, we are going to be writing two scripts at the account level. The first one is simply to gather and store training data for the account. The second one will use that training data to build and update a model.

The training data is made up to two parts: the example value, which will be the answer returned, and a set of features. In this example, my value is going to be the average CPC for a specific location and the features are going to include all the information I know at the time. (This is just an example, so I don’t have all the data, but this should get you started.)

On to script one. We’re looking at weather is based on location, so a good place to start would be a function that pulls the Geo Performance Report. We can use that data to get an idea of where our traffic is coming from.

Of course, if you have very specific campaign targets in your account, you could simply supply a list of locations you are interested in, but where’s the fun in that? Here is a function to help us grab the performance data by geo.

/*******************************
 * Grab the GEO_PERFORMANCE_REPORT for the given
 * date range. dateRange can be things like LAST_30_DAYS
 * or more specific like 20150101,20150201 
 *******************************/
function getGeoReport(dateRange) {
  // The columns are in a function so I can call them
  // later when I build the spreadsheet.
  var cols = getReportColumns();
  var report = 'GEO_PERFORMANCE_REPORT';
  var query = ['select',cols.join(','),'from',report,
               'where CampaignStatus = ENABLED',
               'and AdGroupStatus = ENABLED',
               'and Clicks > 0', // You can increase this to 
                                 // reduce the number of results
               'during',dateRange].join(' ');
  var reportIter = AdWordsApp.report(query).rows();
  var retVal = [];
  while(reportIter.hasNext()) {
    var row = reportIter.next();
    // If we don't have city level data, let's ignore it
    if(row['CityCriteriaId'] == 'Unspecified') { continue; }
    retVal.push(row);
  }
  return retVal;
}

// Helper function to return the report columns
function getReportColumns() {
  return ['Date','DayOfWeek',
          'CampaignId','CampaignName',
          'AdGroupId','AdGroupName',
          'CountryCriteriaId','RegionCriteriaId','MetroCriteriaId','CityCriteriaId',
          'Impressions','Clicks','Cost',
          'ConvertedClicks','ConversionValue',
          'AverageCpc'];
}

 

You can use that function to grab data for any date range you want. For the initial data pull, you may want to look back 30 days or more. After that, you can schedule the script to run daily to continue collecting new information.

For pulling historical weather data, I am going to use Weather Underground. It is free to sign up and get started with an API key, but you will hit the limits pretty quickly. Another option is the Open Weather Map API, but I found it a little more confusing to use. We are just trying to get some training data, so the limits aren’t as important right now. I have added variables and caching to the final version of the script to help deal with any limits you might run in to.

We will need to translate the locations in the AdWords report into locations that Weather Underground can understand. For that, we can use their AutoComplete API. The following code uses the CityCriteriaId, RegionCriteriaId, and CountryCriteriaId information from the geo report and looks up the weather location URL to use with Weather Underground.

/*********************************
 * Given a city, region, and country, return
 * the location information from WeatherUnderground.
 * Uses CACHEs to improve performance and reduce
 * calls to API.
 *********************************/
var CITY_LOOKUP_CACHE = {};
var COUNTRY_TO_CODE_MAP = getCountryCodesMap();
var WU_AUTOCOMPLETE_BASE_URL = 'http://autocomplete.wunderground.com/aq';
function getWeatherUndergroundLocation(city,region,country) {
  // Create a key for looking up data in our cache
  var cityCacheKey = [city,region,country].join('-');
  if(CITY_LOOKUP_CACHE[cityCacheKey]) {
    return CITY_LOOKUP_CACHE[cityCacheKey];
  }
  var urlParams = { 'cities': 1, 'h': 0 };
  if(country) {
    urlParams['c'] = COUNTRY_TO_CODE_MAP[country];
  }
  var urlsToCheck = [];
  // We check the more specific location first
  if(region && region != 'Unspecified') {
    urlParams['query'] = city+', '+region;
    urlsToCheck.push(WU_AUTOCOMPLETE_BASE_URL+'?'+ toQueryString(urlParams));
  }
  // But also try just the city
  urlParams['query'] = city;
  urlsToCheck.push(WU_AUTOCOMPLETE_BASE_URL+'?'+ toQueryString(urlParams));
  
  for(var i in urlsToCheck) {
    var urlToCheck = urlsToCheck[i];
    (ENABLE_LOGGING && Logger.log('Checking Url: '+urlToCheck));
    var resp = UrlFetchApp.fetch(urlToCheck,{muteHttpExceptions:true});
    if(resp.getResponseCode() == 200) {
      var jsonResults = JSON.parse(resp.getContentText());
      if(jsonResults.RESULTS.length > 0) {
        CITY_LOOKUP_CACHE[cityCacheKey] = jsonResults.RESULTS[0];
        return jsonResults.RESULTS[0];
      }
    }
    // Otherwise sleep a bit and try the 
    // other url.
    Utilities.sleep(500);
  }
  // If we can't find the city, just ignore it
  (ENABLE_LOGGING && Logger.log('Could not find data for: '+cityCacheKey));
  CITY_LOOKUP_CACHE[cityCacheKey] = false;
  return {};
}

// Converts {a:b,c:d} to a=b&c=d
// Taken from: http://goo.gl/5pG5oY
function toQueryString(hash) {
  var parts = [];
  for (var key in hash) {
    parts.push(key + "=" + encodeURIComponent(hash[key]));
  }
  return parts.join("&");
}

 

One thing you will notice is that the AdWords Geo report returns the full country names, while Weather Underground uses only the two-digit ISO country code. Here is a quick function that will build a mapping of full country names to two-digit codes based on the data from Open Knowledge.

/**********************************
 * This function returns a mapping of country codes
 * to two digit country codes.
 * { "United States" : "US", ... }
 **********************************/
function getCountryCodesMap() {
  var url = 'http://data.okfn.org/data/core/country-codes/r/country-codes.json';
  var resp = UrlFetchApp.fetch(url,{muteHttpExceptions:true});
  if(resp.getResponseCode() == 200) {
    var jsonData = JSON.parse(resp.getContentText());
    var retVal = {};
    for(var i in jsonData) {
      var country = jsonData[i];
      if(!country){ continue; }
      retVal[country.name] = country['ISO3166-1-Alpha-2'];
    }
    // Fixing some values. There may be more but
    // Weather Undergrounds' country mapping is 
    // pretty arbitrary. This page might help
    // http://goo.gl/J17Ve6 but it always accurate.
    retVal['South Korea'] = 'KR';
    retVal['Japan'] = 'JA';
    retVal['Isreal'] = 'IS';
    retVal['Spain'] = 'SP';
    retVal['United Kingdom'] = 'UK';
    retVal['Switzerland'] = 'SW';
    return retVal;
  } else {
    throw 'ERROR: Could not fetch country mapping. Response Code: '+
          resp.getResponseCode();
  }
}

 

This code utilizes caches to speed up the lookups for city and reduce API calls. Once we find the city, we store it in the CITY_LOOKUP_CACHE variable so that we don’t need to request it again.

Now that we have the geo data from AdWords and the location information from Weather Underground, we can look up the historical weather data for the location. The following function looks up the historical weather information for the given date and location. Again, we are using a cache to limit the number of calls to the API.

/************************************
 * Calls the Weather Underground history api
 * for a given location, date, and timezone
 * and returns the weather information. It 
 * utilizes a cache to conserve api calls
 ************************************/
var WU_API_KEY = 'YOUR WU API KEY HERE';
var WEATHER_LOOKUP_CACHE = {};
function getHistoricalTemp(wuUrl,date,timeZone) {
  if(!wuUrl) { return {}; }
  if(WU_TOTAL_CALLS_PER_DAY <= 0) {
    throw 'Out of WU API calls for today.';
  }
  var weatherCacheKey = [wuUrl,date,timeZone].join('-');
  if(WEATHER_LOOKUP_CACHE[weatherCacheKey]) {
    return WEATHER_LOOKUP_CACHE[weatherCacheKey];
  }
  var formattedDate = date.replace(/-/g,'');
  var url = ['http://api.wunderground.com/api/',
             WU_API_KEY,
             '/history_',
             formattedDate,
             wuUrl,'.json'].join('');
  (ENABLE_LOGGING && Logger.log('Checking Url: '+url));
  var resp = UrlFetchApp.fetch(url,{muteHttpExceptions:true});
  
  // This keeps you within the WU API guidelines
  WU_TOTAL_CALLS_PER_DAY--;
  Utilities.sleep(1000*(60/WU_CALLS_PER_MIN));
  
  if(resp.getResponseCode() == 200) {
    var jsonResults = JSON.parse(resp.getContentText());
    WEATHER_LOOKUP_CACHE[weatherCacheKey] = jsonResults.history.dailysummary[0];
    return jsonResults.history.dailysummary[0];
  }
  (ENABLE_LOGGING && Logger.log('Could not find historical weather for: '+weatherCacheKey));
  WEATHER_LOOKUP_CACHE[weatherCacheKey] = false;
  return {};
}

 

When we put all these pieces together, we have a script that we can run daily which will store our training data in a spreadsheet, which we can then access from our modeling script. Here is the main function and some more helpers to tie things together.

// Set this to false to disable all logging
var ENABLE_LOGGING = true;
// The name of the spreadsheet you want to store your training
// data in. Should be unique.
var TRAINING_DATA_SPREADSHEET_NAME = 'Super Cool Training Data';
// The date range for looking up data. On the first run, you can 
// set this value to be longer. When scheduling for daily runs, this
// should be set for YESTERDAY
var DATE_RANGE = 'YESTERDAY';
// These values help you stay within the limits of
// the Weather Underground API. More details can be found
// in your Weather Underground account.
var WU_CALLS_PER_MIN = 10; // or 100 or 1000 for paid plans
var WU_TOTAL_CALLS_PER_DAY = 500; // or 5000, or 100,000 for paid plans

function main() {
  var sheet = getSheet(TRAINING_DATA_SPREADSHEET_NAME);
  // If the sheet is blank, let's add the headers
  if(sheet.getDataRange().getValues()[0][0] == '') {
    sheet.appendRow(getReportColumns().concat(['Mean Temp','Mean Windspeed','Conditions']));
  }
  var results = getGeoReport(DATE_RANGE);
  for(var key in results) {
    var row = results[key];
    var loc = getWeatherUndergroundLocation(row.CityCriteriaId,
                                            row.RegionCriteriaId,
                                            row.CountryCriteriaId);
    var historicalWeather = getHistoricalTemp(loc.l,results[key].Date,loc.tz);
    // See below. This pulls the info out of the weather results
    // and translates the average conditions.
    var translatedConditions = translateConditions(historicalWeather);
    sheet.appendRow(translateRowToArray(row).concat(translatedConditions));
    // Break before you run out of quota
    if(AdWordsApp.getExecutionInfo().getRemainingTime() < 10/*seconds*/) { break; }
  }
}

// Helper function to get or create a spreadsheet 
function getSheet(spreadsheetName) {
  var fileIter = DriveApp.getFilesByName(spreadsheetName);
  if(fileIter.hasNext()) {
    return SpreadsheetApp.openByUrl(fileIter.next().getUrl()).getActiveSheet();
  } else {
    return SpreadsheetApp.create(spreadsheetName).getActiveSheet();
  }
}

// Helper function to convert a report row to an array
function translateRowToArray(row) {
  var cols = getReportColumns();
  var ssRow = [];
  for(var i in cols) {
    ssRow.push(row[cols[i]]);
  }
  return ssRow;
}

/**********************************
 * Given a result from the Weather Underground
 * history API, it pulls out the average temp,
 * windspeed, and translates the conditons for 
 * rain, snow, etc. It returns an array of values.
 **********************************/
function translateConditions(historicalWeather) {
  var retVal = [];
  // in meantempi, the i is for Imperial. use c for Celcius.
  if(historicalWeather && historicalWeather['meantempi']) {
    retVal.push(historicalWeather['meantempi']);
    retVal.push(historicalWeather['meanwindspdi']);
    if(historicalWeather['rain'] == 1) {
      retVal.push('rain');
    } else if(historicalWeather['snow'] == 1) {
      retVal.push('snow');
    } else if(historicalWeather['hail'] == 1) {
      retVal.push('hail');
    } else if(historicalWeather['thunder'] == 1) {
      retVal.push('thunder');
    } else if(historicalWeather['tornado'] == 1) {
      retVal.push('tornado');
    } else if(historicalWeather['fog'] == 1) {
      retVal.push('fog');
    } else {
      retVal.push('clear');
    }
    return retVal;
  }
  return [];
}

 

Now that we have a script to build and continuously add to our training data, we can create the second script that will actually build our model. To do this, we need to enable the Prediction API for our script under the Advanced APIs button and follow the link to the Developers Console to enable it there as well.

Once that’s done, we can create our model. The following code will pull the data from the same spreadsheet we built in part one and create a model.

One thing to note is the field for ignoring columns from our training data. When a field is unique for every row, such as a date, it doesn’t really help us with a prediction. Also, items that have the same level of uniqueness, such as Campaign Name and Campaign Id, don’t add much either. Many of these actually make your model a little less flexible because you will need to pass those values in with your query. So, I have ignored any column which does not impact the average cost-per-click (CPC).

I have also excluded values such as impressions, clicks and cost, only because those are items I won’t know when I query the model. Of course, you could pass desired values for these fields in your query to see how the output reacts. There are plenty of things to play around with here, and you can create and train as many model variations as you like if you want to compare performance. Just change the names.

/***********************************
 * This function accepts a sheet full of training
 * data and creates a trained model for you to query.
 ***********************************/
// Unique name for your model. Maybe add a date here
// if you are doing iterations on your model.
var MODEL_NAME = 'Weather Training Model';
// This Id should be listed in your Developers Console
// when you authorize the script
var PROJECT_ID = 'PROJECT ID FROM DEVELOPER CONSOLE';
// These are the names of your columns from the training
// data to ignore. Change these to create variations of your
// model for testing
var COLS_TO_IGNORE = [
  'Date','CampaignId','CampaignName','AdGroupId','AdGroupName',
  'MetroCriteriaId','Impressions','Clicks','Cost'
];
// This is the output column for your training data, or
// what value the model is supposed to predict
var OUTPUT_COLUMN = 'AverageCpc';

function createTrainingModel(sheet) {
  var trainingInstances = [];
  // get the spreadsheet values
  var trainingData = sheet.getDataRange().getValues();
  var headers = trainingData.shift();
  for(var r in trainingData) {
    var inputs = [];
    var row = trainingData[r];
    for(var i in headers) {
      if(COLS_TO_IGNORE.indexOf(headers[i]) == -1 && headers[i] != OUTPUT_COLUMN) {
        inputs.push(row[i])
      }
    }
    var output = row[headers.indexOf(OUTPUT_COLUMN)];
    trainingInstances.push(createTrainingInstance(inputs,output));
  }

  var insert = Prediction.newInsert();
  insert.id = MODEL_NAME;
  insert.trainingInstances = trainingInstances;

  var insertReply = Prediction.Trainedmodels.insert(insert, PROJECT_ID);
  Logger.log('Trained model with data.');
}

// Helper function to create the training instance.
function createTrainingInstance(inputs,output) {
  var trainingInstances = Prediction.newInsertTrainingInstances();
  trainingInstances.csvInstance = inputs;
  trainingInstances.output = output;
  return trainingInstances;
}

 

The code to create the model only needs to run once, then it should be disabled. You can call it from the main function like this.

// The name of the spreadsheet containing your training data
var TRAINING_DATA_SPREADSHEET_NAME = 'Weather Model Training Data';
function main() {
  var sheet = getSheet(TRAINING_DATA_SPREADSHEET_NAME);
  createTrainingModel(sheet);
  //makePrediction();
}

// Helper function to get an existing sheet
// Throws an error if the sheet doesn't exist
function getSheet(spreadsheetName) {
  var fileIter = DriveApp.getFilesByName(spreadsheetName);
  if(fileIter.hasNext()) {
    return SpreadsheetApp.openByUrl(fileIter.next().getUrl()).getActiveSheet();
  }
  throw 'Sheet not found: '+spreadsheetName;
}

 

Now that your model has been created, you can continue to update it as more data is added to your training spreadsheet. You can use the following code, which is very similar to the training function, to add new training data to your model on a daily basis.

function updateTrainedModelData(sheet) {
  var updateData = sheet.getDataRange().getValues();
  var headers = updateData.shift();
  for(var r in updateData) {
    var inputs = [];
    var row = updateData[r];
    for(var i in headers) {
      if(COLS_TO_IGNORE.indexOf(headers[i]) == -1 && headers[i] != OUTPUT_COLUMN) {
        inputs.push(row[i])
      }
    }
    var output = row[headers.indexOf(OUTPUT_COLUMN)];
    var update = createUpdateInstance(inputs,output)
    var updateResponse = Prediction.Trainedmodels.update(update, PROJECT_ID, MODEL_NAME);
    Logger.log('Trained model updated with new data.');
  }
}

// Helper function to create the update instance.
function createUpdateInstance(inputs,output) {
  var updateInstance = Prediction.newUpdate();
  updateInstance.csvInstance = inputs;
  updateInstance.output = output;
  return updateInstance;
}

 

Just make sure that you don’t continue to update your model with previous training data. Once the data is in the model, you should move it to a different spreadsheet or just clear it.

Now we can start to query the model pretty easily. The following code will accept an array of queries (which are just arrays of values, the same as the features in the training data) and return the prediction results for each row. One way to test your model is to grab a chunk of your training data, remove the output column, and pass it to this function.

/***************************
 * Accepts a 2d array of query data and returns the
 * predicted output in an array.
 ***************************/
function makePrediction(data) {
  var retVal = [];
  for(var r in data) {
    var request = Prediction.newInput();
    request.input = Prediction.newInputInput();
    request.input.csvInstance = data[r];
    var predictionResult = Prediction.Trainedmodels.predict(
      request, PROJECT_ID, MODEL_NAME);
    Logger.log("Prediction for data: %s is %s",
               JSON.stringify(data[r]), predictionResult.outputValue);
    retVal.push(predictionResult.outputValue);
  }
  return retVal;
}

 

And that’s exactly what I do here as an example. Our main now looks like this:

function main() {
  var sheet = getSheet(TRAINING_DATA_SPREADSHEET_NAME);
  //createTrainingModel(sheet);
  //updateTrainedModelData(sheet);
  var queries = [];
  // We are going to test it by querying with training data
  var testData = sheet.getDataRange().getValues();
  var headers = testData.shift();
  for(var r in testData) {
    var query = [];
    var row = testData[r];
    for(var i in headers) {
      if(COLS_TO_IGNORE.indexOf(headers[i]) == -1 && headers[i] != OUTPUT_COLUMN) {
        query.push(row[i])
      }
    }
    queries.push(query);
  }
  Logger.log(makePrediction(queries));
}

 

And that’s all there is to it. If you made it this far, then congratulations! You will still need to tweak and test the model to make sure the data being returned makes sense. If not, take a look at the training data and see if there are things in there that don’t make any sense.

It’s pretty amazing to think that what used to take a team of PhD data scientists months to produce can now be built using AdWords Scripts and a few lines of code. And this is only the beginning. You can use any external data you want to build out your model. You can now go from “if greater than x, do y” to using the output from your own machine learning algorithm to determine your actions.

Of course, with great power comes great responsibility. It will take time and loads of data before your model is a good predictor of behavior, so you better start training now!

The post Machine Learning With AdWords Scripts And Google Prediction API appeared first on Search Engine Land.