Get list of custom segments from Google Analytics API

This is a post on how to create custom Google Analytics Profile Segments for the purpose of removing referral spam (and there is increasingly soo much of it!) from GA reporting.

However if you want to use these new Custom Segments to filter results using Google Analytics API with a Service Account there are some challenges.

If you are retrieving GA results for many web sites you need to get the GA API to loop through each sites’s View / Profiles in your GA Account to retrieve the data for each.

The challenge is that each Profile has its own Custom Segment. In order to filter out referral spam completely, two types of filters are required. The ‘exclude’ filter which is same for all Profile, and the ‘include’ filter which is specific to each Profile as it refers to the Profile’s domain.

So that makes looping through each Profile a bit more challenging. You need a dictionary of each Profile’s Custom Segment Id so it can be applied for each Profile’s data.

These Custom Segment Id’s look something like “gaid::BXxFLXZfSAeXbm4RZuFd9w”

The Custom Segment Id needs to be used in the service.data.ga().get().execute() criteria.


data = service.data().ga().get(
ids=ids,
start_date=”2015-07-01″,
end_date=”2015-07-19″,
segment: “gaid::BXxFLXZfSAeXbm4RZuFd9w”,
metrics=metrics
).execute()

It wasn’t easy to find these Custom Segment Id’s. First I tried looping through the segments() as follows:


    # Authenticate and construct service.
    service = get_service(‘analytics’, ‘v3’, scope, key_file_location,
    service_account_email)
    
    segments = service.management().segments().list().execute()
    
    for segment in segments.get(‘items’, []):
      print ‘Segment ID ‘ + segment.get(‘id’) + ” – ” + segment.get(‘name’)

But that only retrieved the Standard Google Segments, but not the Custom Segments and apparently this is not possible with a Service Account.

So I found that you are able to see the Custom Segment Ids in the https://ga-dev-tools.appspot.com/qery-explorer.

But while you can see the Custom Segments here it wasn’t very helpful as you have to go one by one in the Segments criteria field. If you have many sites it will be time consuming.

Then I finally found the “stand alone explorer” at the bottom of the GA API Segments documentation page.

https://developers.google.com/analytics/devguides/config/mgmt/v3/mgmtReference/management/segments/list#try-it

This outputs a json file containing all of the Segment details. Unfortunately this isn’t useful as a ready dictionary as it only has the segment details, not the account id. But it does have the Custom Segment Ids which can be used to create manual dictionary of Account Id and Segment Id that can be used in the loop.

Perhaps it might also be possible to do a reverse lookup and find the Custom Segment Id by looping through the Segments and finding those with the name.

Hope that helps someone!

How to filter referral spam from Google Analytics using API and Python

Google Analytics data has become incredibly polluted by “spam referrals” which inflate site visits with what are essentially spam advertisements delivered to you via Google Analytics.

The spammers are entirely bypassing your site and directly hitting Google’s servers pretending to be a visitor to your site. Its a bit odd that a technological superpower like Google has fallen prey to spammers. Apparently a fix is in the works but it feels like its taking way too long.

In the meantime the fix is to filter out any “visit” that doesn’t have a legitimate referrer hostname. You determine what hostnames you find legitimate. At a minimum you want to include your domain. You can also filter out spam visits based on where their source. The source name is the where the spammers advertise to you by giving their spam domains hoping you will visit their sites. Setting up these filters can be done in Google Analytics built-in filters and it takes some manual effort and some ongoing updating as spammers keep changing source names.

The screenshot below shows the Google Analytics filter screen where you build filters for hostname and source using rules based filtering.

google filter

However this same rules based filtering can be done using the Google Analytics API. There is a lot of code around for you to work with and Google documentation is pretty good. I have implemented a hostname and source filter using Python and the code below. This enables me to download run the code in scheduled job and always have analytics data for analysis.

The “hostMatch” and “sourceExp” are the two things that filter out fake hostnames and fake visit source respectively.

You will need to get yourself Google API access and setup the OAuth (which I am not describing here). You will need the OAuth key and a secret file to authorize access to the API then you can use the code below.

'''access the Google Analytics API.'''
# https://developers.google.com/analytics/devguides/reporting/core/v3/reference#maxResults

import argparse
import csv
import re
from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials
import httplib2
from oauth2client import client
from oauth2client import file
from oauth2client import tools
from datetime import datetime, timedelta

todaydate = datetime.today().strftime('%Y-%m-%d')

def get_service(api_name, api_version, scope, key_file_location,
				service_account_email):
	'''Get a service that communicates to a Google API.
	Args:
	api_name: The name of the api to connect to.
	api_version: The api version to connect to.
	scope: A list auth scopes to authorize for the application.
	key_file_location: The path to a valid service account p12 key file.
	service_account_email: The service account email address.
	Returns:
	A service that is connected to the specified API.
	'''
	# client_secrets.p12 is secrets file for analytics
	f = open(key_file_location, 'rb')
	key = f.read()
	f.close()
	credentials = SignedJwtAssertionCredentials(service_account_email, key,
	scope=scope)
	http = credentials.authorize(httplib2.Http())
	# Build the service object.
	service = build(api_name, api_version, http=http)

	return service


def get_accounts(service):
	# Get a list of all Google Analytics accounts for this user
	accounts = service.management().accounts().list().execute()

	return accounts


def hostMatch(host):
        #this is used to filter analytics results to only those that came from your hostnames eg not from a spam referral host
	hostnames="domainname1","domainname2","domainname3"

	hostExp = "(" + ")|(".join(hostnames) + ")"
	hostMatch = re.search(hostExp, host[3].lower())

	if hostMatch:
		return True
	else:
		return False


def main():

    #this is where you build your filter expression, note it similar to what you would build in Google Analytics filter feature, you can be as specific of generalized using regex as you want/need
    #ga:source filter
    sourceExp=('ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected]*-gratis;ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected];ga:[email protected]')

    # Define the auth scopes to request.
    scope = ['https://www.googleapis.com/auth/analytics.readonly']

    #Provide service account email and relative location of your key file.
    service_account_email = 'xxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com'
    key_file_location = 'client_secrets.p12'
    #scope = 'http://www.googleapis.com/auth/analytics'

    # Authenticate and construct service.
    service = get_service('analytics', 'v3', scope, key_file_location, service_account_email)

    #get accounts
    accounts = service.management().accounts().list().execute()
    #create list for results
    output = list()

    # loop through accounts
    for account in accounts.get('items', []):
    	account_id = account.get('id')
    	account_name = account.get('name')

    #get properties
    	properties = service.management().webproperties().list(accountId=account_id).execute()

    #loop through each account property default profileid (set in GA admin)
    #get metrics from profile/view level
    #instead of looping through all profiles/views
    	for property in properties.get('items', []):
    		data = service.data().ga().get(
    			ids='ga:' + property.get('defaultProfileId'),
    			start_date='2012-01-01',
    			end_date= todaydate, #'2015-08-05',
    			metrics = 'ga:sessions, ga:users, ga:newUsers, ga:sessionsPerUser, ga:bounceRate, ga:sessionDuration, ga:adsenseRevenue',
    			dimensions = 'ga:date, ga:source, ga:hostname',
                max_results = '10000',
    			filters = sourceExp # the filters from above 
    		).execute()


    		for row in data.get('rows', '1'):
    			results = account_name, row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[8], row[9]
    			output.append(results)
	#print output
		#count of response rows
        #print account_name, data['itemsPerPage'], len(data['rows'])

    #here is the hostname filter call to function above
    hostFilter = [host for host in output if hostMatch(host)==True]

    with open('output_analytics.csv', 'wb') as file:
        writer = csv.DictWriter(file, fieldnames = ['account', 'date', 'source', 'hostname', 'sessions', 'users', 'newUsers', 'sessionsPerUser', 'bounceRate', 'sessionDuration',  'adsenseRevenue'], delimiter = ',')
        writer.writeheader()
        for line in hostFilter:
			file.write(','.join(line) + '\n')
            #print>>file, ','.join(line)

if __name__ == '__main__':
	main()

How a UK tent rental company used Google analytics and Tableau to improve sales

UK wedding marquee rental website Google analytics data was analysed using Tableau Public.  The client’s target geographic area is UK Northwest centered around Manchester.

It was interesting to see significant number of site visitors from Pakistan, India and Phillippines.

A bit of customer research reveals that these site visitors are friends and family helping with UK wedding.

The client does have first hand information that his clients have family members offshore who might have helped do wedding planning. But getting hard data from website analytics and seeing this clearly highlighted in the Tableau analysis prompted a call to action for the client to do sales and marketing efforts to advertise to Pakistani, Indian and Phillippines offshore but also to specifically target advertising to these demographic groups inside the UK.

The result was increased bookings and a lift in word of mouth advertising within these demographic groups.

Well done analytical entrepreneur. Yes, analytics can be that easy and effective. Just use the tools, do the work, and listen to the analysis!