How to serve multiple domain name websites with one CakePHP application

Say you have a series of websites that are the same with the exception of city, topic, etc. It could be a real estate listing of different countries and you want a separate website for each or maybe different sites for different brands, product categories, client, etc. You could create subdomains or parameterized filtering in a single website on single domain, but what if you wanted to keep sites separate?

You could just use one CakePHP application and one MySQL datababse to serve websites for multiple domains.

All domains have to be hosted on shared hosting. Shared hosting has to allow pointing multiple domains to one subdomain directory. I’ve done it with Webfaction and Justhost shared web hosting.

Put the CakePHP application files into the shared subdomain directory connect that application to the single MySQL database, which should have all the data for every website.

Create your CakePHP application as normal but add the following things so that you can filter database records specific to each website/domain being served/requested:

  • A table called domain (or whatever you want with as many columns as you need) that will hold domain specific values that are retrieved to be used in in application. But at a minimum it should have the each site’s domain so you can do a lookup to match it to the domain being requested.
  • Add domain table foreign key (domain_id) to other tables so that they can be filtered by domain when retrieving data. (This means likely you will require additional steps to ensure that the foreign key is written to your tables. How you do that is up to you but for example if you allow users to create accounts or post comments then you can simply record the domain_id (for example you CakePHP Configure::read value when new user or comment is added.)
  • Add a getDomainSettings function to AppController and call it in AppController beforeFilter. This getDomainSettings function will read domain name from current site visitor and use it as parameter to filter database records to just that domain’s values. You can get these into application many different ways but it is nice and easy to use CakePHP Configure::write variables that can be used elsewhere in application.
  • As required, add conditions when retrieving model data using CakePHP Configure:write in other controllers’ to retrieve only records for the current domain and modify view presentation.

The alternative to this is maintaining 6 separate CakePHP applications and MySQL databases. So this is a nice clean and easy solution.

Lots of potential here, you could also swap out css styles or CakePHP themes based on domain and make sites totally different.

Code is on Github:

https://github.com/sitrucp/single_cakephp_mysql_multiple_domain_app

Always remember to start Python Virtualenv

When learning to develop Django and Flask Python applications using virtual environment I did not know that I should activate the virtual environment before running the application. Well, more precisely, I did know the virtual env should be activated but rather I assumed it would be activated somehow auto-magically.

I was using MS Visual Studio 2015 RC and its new Python Tools which work fine by the way. One can install new Python/Flask/Django/etc modules via the IDE.

This was also the first time I was using Python virtual environment. I hadn’t previously done any Python web development. I had used Python for analytical purposes in files, command line and iPython but all of these used the ‘native’ operating system Python installation.

In MS VS 2015 RC, after Python Tools are installed, there are templates to create new Django or Flask projects, and one of the steps asks if I want to install virtual env or not.

So after a bit of search and read I realized virtual env is the way to go. Its better for many reasons to have a standalone Python environment for each new Python/Flask/Django project.

I just assumed that since I had created my new Python application with a virtual env that when I opened Visual Studio and started working on it, it would be in virtual environment by default somehow auto-magically.

But no, the virtual environment has to be activated manually each time the project is opened or before being able to interact with the project via web browser. So remember to activate your virtual env before running your Python/Flash/Django application.

What mislead me was that running the application without first activating the virtual environment can often be ok because the native operating system Python installation has the required modules, so application runs just fine.

But I ran into problems when after installing new Python modules only to see the application complaining that they weren’t available eg got error message in browser and command line from the server saying ‘no module named xxx’. This was confusing because I was thinking hey I just installed that.

So remember activating the virtual env before running the Python application is required. Now it is second nature to do this.

To activate the virtual env in Windows simply navigate to the ‘Scripts’ folder in your virtual env folder and run the ‘activate.bat’ file.

In command line enter ‘activate’ in that folder. Or you can enter the full path to the activate.bat from anywhere in command line.

You can easily see if virtual env has been started because when it is you will see ‘(env)’ at the start of the command line.

Then you can go back to your application folder with the ‘runserver.py’ (or whatever you call it) and then start the application!

Flask SQLAlchemy relationship associations and backref

Flask-SQLAlchemy relationship associations are useful to easily retrieve related records from a related Model.

In a Python Flask application I wanted to use a many-to-many relationship between Photos and Tags (eg many categories assigned to one or more photos).

To do this I had the following things set up in my application:

  • 3 database tables
    • Photos
    • Tags
    • Photo_Tags
  • 3 model.py sections
    • Photos
    • Tags
    • photo_tags
  • 1 views.py section ‘photo’
  • 1 template ‘photo.html’

The models looked like this:


class Photos(db.Model):
….id = db.Column(db.Integer, primary_key=True)
….filename = db.Column(db.String(100))
….tags = db.relationship(‘Tags’,  secondary=photo_tags, backref=’photos’)

class Tags(db.Model):
…. id = db.Column(db.Integer, primary_key=True)
…. tagname = db.Column(db.String(100))

photo_tags = db.Table(‘photo_tags’,
…. db.Column(‘tag_id’, db.Integer, db.ForeignKey(‘tags.id’)),
…. db.Column(‘photo_id’, db.Integer, db.ForeignKey(‘photos.id’))
)

The bolded text above shows how the ‘tags’ relationship between the Photos and the Tags models that is added to the Photos model. SQLAlchemy wants us to add this ‘tags’ association to one of the models that are in it (only one more on that below).

The ‘tags’ relationship is physically recorded in the database table ‘photo_tags’ and it doesn’t get a ‘class Model’ like the Photos and Tags have. Instead, it is setup as shown above to indicate that it is a table with foreign keys to the Photos and Tags tables.

Key things to note about the ‘tags’ reference in the Photos model:

  • ‘Tags’ model is related model
  • ‘photo_tags’ is referenced as the ‘secondary’ or relationship table.
  • ‘photos’ is referenced as ‘backref’. The significance of this is that you don’t have to add a ‘tags’ to the Tags model. The ‘backref’ will allow you to reversibly reference Photos and Tags eg you can use the ‘tags’ to get
    • Tags related to a specific Photo
    • Photos related to a specific Tag

An example of a view that can use the relationship established above follows. This view retrieves all tags for a specific photo:


@app.route(‘/photo/’)
def photo(requested_photo_id=None):
….requested_photo= Photos.query.get(requested_photo_id)
….return render_template( ‘photo.html’, requested_photo = requested_photo
)

Here is what happens in this view:

  • The requested_photo_id is retrieved from the url.
  • The SQLAlchemy query Photos.query.get(requested_photo_id) uses that requested_photo_id to retrieve the photo from the Photos model/MySQL database table, and its related tags.
  • The retrieved requested_photo object is passed to template ‘photo.html’ along with the related tag info.
  • Finally it is in the photo.html template where everything comes together. The template below is simple HTML page that presents the requested_photo_id and its related tags.






Photo Details

Photo ID: {{ requested_photo.id }}

Photo file name: {{ requested_photo.filename }}

{% for tag in requested_photo.tags %}

tag.id: {{ tag.id}}, tag.tagname: {{ phototag.tagname }}

{% endfor %}

The requested_photo id and filename come from the Photos Model/MySQL table.

The photo’s related tags come from the requested_photo ‘tags’ association by referencing it as ‘requested_photo.tags’ (this is the awesome part) and then looping through the tags to retrieve each tag so they can be printed on webpage individually.

Using ATMs as wifi hotspot

Wirebase was a concept created in 2001 before broadbrand, wired or wireless was widely available.

wirebaselogo_top
wirebaselogo_bottom

The concept was shopped around to a bunch of Vancouver area VC’s and potential investors ultimately got lost in the dot.com bubble crash.

The idea was to create ‘wirebases’ that were a modular component integrated into an automated banking machine or deployed as a stand-alone unit in a public area.

At the time I was contracting with a western Canadian credit union banking services bureau who was in the process of building its first national automated banking machine switching system so we were privy to the details of ATMs and banking machine switching. Automated banking machines have computers inside. At the time they were regular PC’s and could have peripherals connected just like any ordinary PC eg internal slots, etc.

The Wirebase set-up is illustrated in diagram below.

overview diagram

The Wirebase module is connected to the internet via a proprietary network solution that is similar to that used by the global ABM network.

A mobile device (Palm Pilot [LOL, remember this was 2001!!], digital camera, portable MP3 player, portable computer, etc) can connect to the Wirebase.

A copy of the website (long since gone) is maintained at the Internet Archive:

https://web.archive.org/web/20010227065841/http://wirebase.com/

Our business plan and presentation can be downloaded below:

WireBase Biz Plan Slideshow

WireBase Short Biz Plan

Get list of custom segments from Google Analytics API

This is a post on how to create custom Google Analytics Profile Segments for the purpose of removing referral spam (and there is increasingly soo much of it!) from GA reporting.

However if you want to use these new Custom Segments to filter results using Google Analytics API with a Service Account there are some challenges.

If you are retrieving GA results for many web sites you need to get the GA API to loop through each sites’s View / Profiles in your GA Account to retrieve the data for each.

The challenge is that each Profile has its own Custom Segment. In order to filter out referral spam completely, two types of filters are required. The ‘exclude’ filter which is same for all Profile, and the ‘include’ filter which is specific to each Profile as it refers to the Profile’s domain.

So that makes looping through each Profile a bit more challenging. You need a dictionary of each Profile’s Custom Segment Id so it can be applied for each Profile’s data.

These Custom Segment Id’s look something like “gaid::BXxFLXZfSAeXbm4RZuFd9w”

The Custom Segment Id needs to be used in the service.data.ga().get().execute() criteria.


data = service.data().ga().get(
ids=ids,
start_date=”2015-07-01″,
end_date=”2015-07-19″,
segment: “gaid::BXxFLXZfSAeXbm4RZuFd9w”,
metrics=metrics
).execute()

It wasn’t easy to find these Custom Segment Id’s. First I tried looping through the segments() as follows:


    # Authenticate and construct service.
    service = get_service(‘analytics’, ‘v3’, scope, key_file_location,
    service_account_email)
    
    segments = service.management().segments().list().execute()
    
    for segment in segments.get(‘items’, []):
      print ‘Segment ID ‘ + segment.get(‘id’) + ” – ” + segment.get(‘name’)

But that only retrieved the Standard Google Segments, but not the Custom Segments and apparently this is not possible with a Service Account.

So I found that you are able to see the Custom Segment Ids in the https://ga-dev-tools.appspot.com/qery-explorer.

But while you can see the Custom Segments here it wasn’t very helpful as you have to go one by one in the Segments criteria field. If you have many sites it will be time consuming.

Then I finally found the “stand alone explorer” at the bottom of the GA API Segments documentation page.

https://developers.google.com/analytics/devguides/config/mgmt/v3/mgmtReference/management/segments/list#try-it

This outputs a json file containing all of the Segment details. Unfortunately this isn’t useful as a ready dictionary as it only has the segment details, not the account id. But it does have the Custom Segment Ids which can be used to create manual dictionary of Account Id and Segment Id that can be used in the loop.

Perhaps it might also be possible to do a reverse lookup and find the Custom Segment Id by looping through the Segments and finding those with the name.

Hope that helps someone!

Display Django queryset results on Google Map

I wanted to be able to show locations by latitude and longitude with data from Django 1.8 website on a Google Map using the Google Maps API.

I used the accepted answer from a stackoverflow question to create a Google Map with multiple markers that had the store name with a link that would open that store’s details when clicked.

I did the following:

  • copied the stackoverflow solution javascript and html code into new Django template called fountain_map.html
  • created new Django view called fountain_map for that template
  • create new urls.py line to route the fountain_map url for new view/template

The stackoverflow answer used Google Maps javascript that had a javascript array like this:

    var locations = [
      [‘Bondi Beach’, -33.890542, 151.274856, 4],
      [‘Coogee Beach’, -33.923036, 151.259052, 5],
      [‘Cronulla Beach’, -34.028249, 151.157507, 3],
      [‘Manly Beach’, -33.80010128657071, 151.28747820854187, 2],
      [‘Maroubra Beach’, -33.950198, 151.259302, 1]
    ];

However while the example has this hard coded list of locations I wanted a dynamic list populated by queryset records from the new view.

So I created a queryset in the view that retrieved the location records:

   

map_points = Fountains.objects.filter(lat__isnull=False)

Note that I filtered to only retrieve records that had a lat value so I wasn’t sending records that couldn’t be mapped.

Since the queryset object is not immediately readable by the javascript as the location variable, it needed to be transformed into a format acceptable for the javascript.

There are a couple of options:

  • Use Django’s serialization to turn it into JSON
  • Loop through queryset object and manually build the array in correct format, this could be done in the view or in the template

I choose to do this transformation in the template. Django’s serialization has lots of documentation and lots of SO question and answer but seemed easier to do this in template for now.

So in the template i simply looped through the map_point queryset object to create the array that the var locations required.

The javascript required the square brackets as shown in example above along with quotes around the location name.

Note that the Stack Overflow answer also has a digit as the fourth item in the record but I excluded that in mine. Not sure what it was but user obviously wanted to show it in marker label or something like that.

Anyways my template loop looked like this:

      var locations = [
        {% for point in map_points %}
            {% if point.lat = None %}
            {% else %}
              {{ point.name }}’, {{ point.lat }}, {{ point.lon }}],
            {% endif %}
        {% endfor%}
        ]

You can see that I retrieved the following values for the locations array from the queryset results:

  • name (fountain name to show on marker label popup)
  • id (so that it could be used to create marker link to the Django view for that store)
  • lat
  • lon

That was all I needed to do and gave me a Google Map showing each fountain’s location with a Google red pin marker. When user clicked on marker, the fountain name would show that had link to that fountain’s detail page.

google map

How to setup and and use a Twitter API application

The Twitter API allows you to read and write tweets on your own Twitter account.

But you can also create an ‘app’ that can be granted permission by other Twitter accounts to access their information and even tweet on their behalf.

If you have ever granted this permission you will see the app in your Twitter account’s Settings, Apps options where you can see what apps have been granted permissions as well as revoke permissions.

This comes in handy if you have multiple Twitter accounts and you want to use the Twitter API to tweet to all of them but do not want to create multiple Twitter apps. Instead you can create one Twitter app and grant it permission to use all of the other Twitter accounts.

This is also very handy because a Twitter app must be associated with a mobile phone number. It is rare that you will have multiple mobile phone numbers.

Note there is nothing special about the ‘dev’ account. It is a normal Twitter account but you just choose it in which to create the Twitter app that will tweet on all of your accounts’ behalves.

Step 1. Make sure your ‘dev’ account has a mobile phone on the account otherwise you cannot proceed.

Step 2. While logged into your dev account, open your web browser in another tab to https://apps.twitter.com/ which will open the Twitter API app management page.

Step 3. If you haven’t already set up an API app there will be nothing to manage and it will simply show you a ‘create an app’ button. Click that button and follow instructions to setup the API app.

Step 4. Save your new API app’s consumer_key, consumer_secret, access_token, access_token_secret credentials for use in following steps.

Step 5. Log out of your dev account once the dev API app is setup and running properly.

Step 6. Log into another Twitter account that you want to add the dev account API app as a 3rd party app. You will successively log into each additional Twitter account that you want to use with dev API app with. Note you do not have to login in advance, as the code you run in next steps will popup web page to prompt you enter user and pw for Twitter account you want your new API app to work with.

Step 7. Go get this code which is what will do some magic to grant permission for your new Twitter API app to tweet on behalf of your other Twitter accounts. Save this code on your computer and keep it open in your code/text editor so you can make some changes.

Step 8. Get your app’s consumer_key, consumer_secret, access_token, access_token_secret credentials that you got from Twitter API management and paste them into the appropriate place in code you just downloaded and saved.

Step 9. Now run the code using command line interface and it will generate a URL in the command line window and show a URL that contains tokens that were generated specifically for your app and the other Twitter account. You just have to copy and paste the url into a web browser. Copy the url from command line window.

command-line-run-python-code1

 

Step 10. Paste it into a browser which also has your other twitter account logged in.  This is a crucial part of this process. The page that opens will ask you to click button to authorize this the dev app to use your other Twitter account. Click Authorize App button.

paste-url-into-browser1-authorize-app

 

Step 11.  After clicking Authorize App button you will see a web page that shows a special PIN code generated just for this one time use. I have hidden the PIN in screenshot but you will have a number where it says “PIN number is here”. Copy this PIN.

paste-url-into-browser2-get-code

Step 12. Now go back to the command line window. Python code command line also has a prompt to select ‘y’ once you have copied the PIN code and then prompt you to paste/enter the PIN code in the command line. Enter ‘y’ and hit enter and paste the PIN into the next command line.

Entering PIN code automatically registers/adds your dev API app as a 3rd party app in the Twitter account you are currently logged into and generates two strings of characters in the command line window. These are tokens you need to copy and save so you can use them later. You will need to enter them as credentials for the dev API app to read and write Tweets for the currently logged in Twitter account.

command-line-run-python-code2-paste-pin

 

If you go to your Twitter account and select Settings – Apps you will see that your dev App is now registered as an authorized app that can Tweet and read Tweets on behalf of your Twittter account.

twitter-settings-apps-new-app-just-added

 

Now its up to you to create Twitter API code to use these new authorizations. Try using Tweepy or any of the other many Twitter Python modules that are available. Happy Tweeting.

Your dev app can be added as authorized app to multiple other Twitter accounts. You just have to repeat process for each of the other Twitter accounts one by one. Simply log into each in succession (log out of previous account and then log into the next).

You will end up with a bunch of tokens that you can use to read and tweet for all of your Twitter accounts.

So just to summarize what you have done is authorized one of your Twitter accounts, which has a verified mobile phone number, to read and write Tweets for any number of other Twitter accounts you have, using the Twitter API.

For more information on Twitter API check out their developer site at https://dev.twitter.com/

How to move Google Chart x-axis to top of chart

The newest Google Charts API allows you to more easily change the axis that you want the ticks and labels are on.

In older version you had to create a secondary axis with dummy data and hide it showing only the axis with the real data on the where you want it (eg right side instead of left or top instead of bottom).

You will still do this in the new version but it is easier because you can use the ‘axes’ option which is part of the ‘Dual-axis’ feature .

Note that you will continue to use the hAxis and vAxis options as normal but will also use a new option called axes in which you specify the ‘side’ you want axis on.

Also note that the axes option will be used to name the axis so the hAxis or vAxis option axis title will be ignored.

Here is example options showing axes options being used to the x-axis on the top of a horizontal bar chart:

//set chart options
var options = {
     title: ”,
     width: ‘80%’,
     height: 700,
     bar: { groupWidth: ‘95%’ },
     legend: { position: ‘none’ },
     bars: ‘horizontal’,
     hAxis: {
          title: ‘my x axis’,
          minValue: 0,
          format: ‘decimal’,
          textPosition: ‘out’,
     },
     vAxis: {
          title: ‘my y axis’,
          textPosition: ‘out’
     },
     axes: {
          x: {
               0: { side: ‘top’, label: ‘my x axis’}
          },
          y: {
               0: { side: ‘left’, label: ‘my y axis’}
          }
}

BC Hydro’s amazing #BCStorm social media turnaround

BC Hydro made an amazing social media turnaround to communicate with customers in crisis! Go BC Hydro!

On August 29, 2015 high winds caused tree falls that took out BC Hydro power lines throughout BC’s Lower Mainland, Vancouver Island, Sunshine Coast leaving up to 500,000 customers without electricity. Many including me were without power for 48 hours.

BC Hydro’s web site was their primary channel for communicating with customers about extent of damage and expected time for repairs, but the site also went down during this time. H

They had used Twitter in the past to communicate with customers but they weren’t using it much on first day of crisis.

However on the second day of the crisis with 100,000+ customers still without power BC Hydro dramatically increased Twitter communication by responding directly to customer tweets about extent of damage and expected time for repairs.

To visualize this dramatic increase in BC Hydro Twitter usage I downloaded all @BCHydro tweets for August 29 and 30 from Twitter’s API using Python Tweepy and used Microsoft Power BI to create a visualization of BC Hydro tweet counts which is shown below.

Some notes on chart:

  • x axis shows date and hour of day
  • y-axis shows count of tweets
  • ‘response’ tweets are light green part of bar
  • ‘status’  tweets are dark green part of bar

You can see that on August 30 at 11 AM, about 28 hours after the storm hit, BC Hydro suddenly starts tweeting responses to customers’ questions. They continued for the next few days until their website was back up in addition to their regular ‘status’ tweets.

The chart clearly shows a very amazing social media turnaround! BC Hydro dramatically increased their use of Twitter to get customers answers and information. Go BC Hydro!

Note: the Twitter data was last updated Aug 31, 2015 at 16.45 PM PST.

bchtweets

 

Historically BC Hydro did not use Twitter this way.  The chart below shows BC Hydro’s tweeting before the storm. They were tweeting once or twice per day with occasional spikes to 7 tweets per day.

 

bchtweets_before

 

The ‘response‘ category includes tweets by BC Hydro responding to a customer tweet. Example ‘response’ tweet:

2015-08-30 14:26:01, @Adamhillz Crews will be on site as soon as possible. Stay back from any downed lines and call 911 if it is on fire.

The ‘status‘ category includes tweets by BC Hydro about ongoing status of repair work etc. Example ‘status’ tweet:

2015-08-30 14:28:32, Crews have been brought in from across province, including Prince George, Terrace, Kamloops, Smithers, Vernon & Vancouver Island #bcstorm

FYI if anyone is interested, here is the text file with tweet text – bchtweets

Here is the Python code used to get the BC Hydro Tweets using the Twitter API.

import tweepy
#import MySQLdb
from datetime import datetime
import random
import sys, os
import json
from dateutil import tz
from tweet_auth import * #this is another .py file with the twitter api credentials

#get twitter auth
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

def main():
with open('bchtweets.csv', 'wb') as file:
for status in tweepy.Cursor(api.user_timeline, id='bchydro').items(1800):
file.write(str(status.created_at.replace(tzinfo=tz.gettz('UTC')).astimezone(tz.gettz('America/Los_Angeles')).replace(tzinfo=None)) + ', ' + status.text.encode('utf8') + '\n')

if __name__ == '__main__':
main()

Note that there is a separate file with my Twitter OAuth credentials that looks like the following. You just have to replace the ‘xxx…’ with your credentials.

#twitter api oauth credentials - replace with yours
consumer_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
consumer_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token = 'xxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

How to create OData data source Gateway for Power BI Report

Microsoft has a free Microsoft Data Management Gateway that you can install on your desktop or server to act as a broker to create an OData data source from a local data source. The OData source can be consumed by any application that can connect to OData url endpoint such as Power BI in Office 365 or in Microsoft’s new standalone online version of Power BI, or Excel etc.

For example setup Microsoft Data Management Gateway on your desktop which creates a local Service that is connects a local SQL Server database to a Data Management Gateway Cloud Service in Power BI.

Remote users can then connect to Office 365  or Power BI via internet and connect to the endpoint to use data using Excel Power Query add-on.

 

Introducing Speedvisit – a quick and easy way to record visits from people

Here is a demo web application / SaaS called SpeedVisit that individuals and companies can use to easily and quickly record visits from people to record basic customer demographic information.

Features include:

  • Groups – create a group so you add other users, locations and referrers to it.
  • Users – add additional users to your group as you want.
  • Locations – add locations where visits will take place.
  • Referrers – add referrers to record how visitor was referred.

The definition of ‘visit’ is pretty broad. It could be customers, co-workers, classmates, attendees in many situations including:

  • Retail Stores
  • Staff Meetings
  • Convenience Stores
  • Restaurants
  • Street Vendors
  • Art Studios
  • Non-Profits
  • Community Clubs
  • Sports Events
  • Social Events
  • Trade Shows

The demographic information that you can record currently includes:

  • Gender
  • Age
  • Zipcode or postal code
  • Income
  • Prev Visit
  • Referrer
  • Location
  • Purchase
  • Email
  • Comments

The site is mobile friendly responsive design so it works on phones, tablets, laptops and desktops.

The data is totally portable with easy export of all visits to csv file. It is setup with SSL so your data will be securely transferred between your browser and the server.

This is a screenshot of the top of the visit capture page which is the central feature of the application and workflow.  Simply use radio buttons and choosers to select categorize the visitor and add record. The page will then be ready for the next visitor.

speedvisit1

 

The data is available for anyone else to view it, to view aggregated statistics, etc. This is screenshot of the recorded visit data.

speedvisit2

You can sign into the demo account to see what sample account and visits look like too.

How to filter referral spam from Google Analytics using API and Python

Google Analytics data has become incredibly polluted by “spam referrals” which inflate site visits with what are essentially spam advertisements delivered to you via Google Analytics.

The spammers are entirely bypassing your site and directly hitting Google’s servers pretending to be a visitor to your site. Its a bit odd that a technological superpower like Google has fallen prey to spammers. Apparently a fix is in the works but it feels like its taking way too long.

In the meantime the fix is to filter out any “visit” that doesn’t have a legitimate referrer hostname. You determine what hostnames you find legitimate. At a minimum you want to include your domain. You can also filter out spam visits based on where their source. The source name is the where the spammers advertise to you by giving their spam domains hoping you will visit their sites. Setting up these filters can be done in Google Analytics built-in filters and it takes some manual effort and some ongoing updating as spammers keep changing source names.

The screenshot below shows the Google Analytics filter screen where you build filters for hostname and source using rules based filtering.

google filter

However this same rules based filtering can be done using the Google Analytics API. There is a lot of code around for you to work with and Google documentation is pretty good. I have implemented a hostname and source filter using Python and the code below. This enables me to download run the code in scheduled job and always have analytics data for analysis.

The “hostMatch” and “sourceExp” are the two things that filter out fake hostnames and fake visit source respectively.

You will need to get yourself Google API access and setup the OAuth (which I am not describing here). You will need the OAuth key and a secret file to authorize access to the API then you can use the code below.

'''access the Google Analytics API.'''
# https://developers.google.com/analytics/devguides/reporting/core/v3/reference#maxResults

import argparse
import csv
import re
from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials
import httplib2
from oauth2client import client
from oauth2client import file
from oauth2client import tools
from datetime import datetime, timedelta

todaydate = datetime.today().strftime('%Y-%m-%d')

def get_service(api_name, api_version, scope, key_file_location,
				service_account_email):
	'''Get a service that communicates to a Google API.
	Args:
	api_name: The name of the api to connect to.
	api_version: The api version to connect to.
	scope: A list auth scopes to authorize for the application.
	key_file_location: The path to a valid service account p12 key file.
	service_account_email: The service account email address.
	Returns:
	A service that is connected to the specified API.
	'''
	# client_secrets.p12 is secrets file for analytics
	f = open(key_file_location, 'rb')
	key = f.read()
	f.close()
	credentials = SignedJwtAssertionCredentials(service_account_email, key,
	scope=scope)
	http = credentials.authorize(httplib2.Http())
	# Build the service object.
	service = build(api_name, api_version, http=http)

	return service


def get_accounts(service):
	# Get a list of all Google Analytics accounts for this user
	accounts = service.management().accounts().list().execute()

	return accounts


def hostMatch(host):
        #this is used to filter analytics results to only those that came from your hostnames eg not from a spam referral host
	hostnames="domainname1","domainname2","domainname3"

	hostExp = "(" + ")|(".join(hostnames) + ")"
	hostMatch = re.search(hostExp, host[3].lower())

	if hostMatch:
		return True
	else:
		return False


def main():

    #this is where you build your filter expression, note it similar to what you would build in Google Analytics filter feature, you can be as specific of generalized using regex as you want/need
    #ga:source filter
    sourceExp=('ga:source!@top1-seo-service.com;ga:source!@justprofit;ga:source!@rankings-analytics;ga:source!@semalt;ga:source!@kambasoft.com;ga:source!@anticrawler;ga:source!@best-seo-offer;ga:source!@best-seo-solution;ga:source!@buttons-for-website;ga:source!@buttons-for-your-website;ga:source!@7makemoneyonline;ga:source!@-musicas*-gratis;ga:source!@kambasoft;ga:source!@savetubevideo;ga:source!@ranksonic;ga:source!@medispainstitute;ga:source!@offers.bycontext;ga:source!@100dollars-seo;ga:source!@sitevaluation;ga:source!@dailyrank;ga:source!@videos-for-your-business;ga:source!@videos-for-your-business;ga:source!@success-seo;ga:source!@4webmasters;ga:source!@get-free-traffic-now;ga:source!@free-social-buttons;ga:source!@trafficmonetizer;ga:source!@traffic2money;ga:source!@floating-share-buttons;ga:source!@video--production')

    # Define the auth scopes to request.
    scope = ['https://www.googleapis.com/auth/analytics.readonly']

    #Provide service account email and relative location of your key file.
    service_account_email = 'xxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com'
    key_file_location = 'client_secrets.p12'
    #scope = 'http://www.googleapis.com/auth/analytics'

    # Authenticate and construct service.
    service = get_service('analytics', 'v3', scope, key_file_location, service_account_email)

    #get accounts
    accounts = service.management().accounts().list().execute()
    #create list for results
    output = list()

    # loop through accounts
    for account in accounts.get('items', []):
    	account_id = account.get('id')
    	account_name = account.get('name')

    #get properties
    	properties = service.management().webproperties().list(accountId=account_id).execute()

    #loop through each account property default profileid (set in GA admin)
    #get metrics from profile/view level
    #instead of looping through all profiles/views
    	for property in properties.get('items', []):
    		data = service.data().ga().get(
    			ids='ga:' + property.get('defaultProfileId'),
    			start_date='2012-01-01',
    			end_date= todaydate, #'2015-08-05',
    			metrics = 'ga:sessions, ga:users, ga:newUsers, ga:sessionsPerUser, ga:bounceRate, ga:sessionDuration, ga:adsenseRevenue',
    			dimensions = 'ga:date, ga:source, ga:hostname',
                max_results = '10000',
    			filters = sourceExp # the filters from above 
    		).execute()


    		for row in data.get('rows', '1'):
    			results = account_name, row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[8], row[9]
    			output.append(results)
	#print output
		#count of response rows
        #print account_name, data['itemsPerPage'], len(data['rows'])

    #here is the hostname filter call to function above
    hostFilter = [host for host in output if hostMatch(host)==True]

    with open('output_analytics.csv', 'wb') as file:
        writer = csv.DictWriter(file, fieldnames = ['account', 'date', 'source', 'hostname', 'sessions', 'users', 'newUsers', 'sessionsPerUser', 'bounceRate', 'sessionDuration',  'adsenseRevenue'], delimiter = ',')
        writer.writeheader()
        for line in hostFilter:
			file.write(','.join(line) + '\n')
            #print>>file, ','.join(line)

if __name__ == '__main__':
	main()

Tableau vizualization of Toronto Dine Safe data

The City of Toronto’s open data site includes the results of the city’s regular food and restaurant inspections. This data was as of August 2014.

The interactive chart below allows filtering by review criteria and can be zoomed into to view more detail and specific locations.

 

The data file for Dine Safe contained about 73,000 rows and was in XML format. In order to work with it I transformed it to csv format.

from xml.dom.minidom import parse
from csv import DictWriter
 
fields = [
	'row_id',
	'establishment_id',
	'inspection_id',
	'establishment_name',
	'establishmenttype',
	'establishment_address',
	'establishment_status',
	'minimum_inspections_peryear',
	'infraction_details',
	'inspection_date',
	'severity',
	'action',
	'court_outcome',
	'amount_fined'
]
 
doc = parse(file('dinesafe.xml'))
 
writer = DictWriter(file('dinesafe.csv', 'w'), fields)
writer.writeheader()
 
row_data = doc.getElementsByTagName('ROWDATA')[0]
 
for row in row_data.getElementsByTagName('ROW'):
	row_values = dict()
	for field in fields:
		text_element = row.getElementsByTagName(field.upper())[0].firstChild
		value = ''
		if text_element:
			value = text_element.wholeText.strip()
			row_values[field] = value
	writer.writerow(row_values)

This data was not geocoded so I had to do that before it could be mapped. I wanted to use a free geocoding service but they have limits on the number of records that could be geooded per day. I used MapQuest’s geocoding API using a Python library that would automate the geocoding in daily batched job so that I could maximize free daily geocoding.

The Dine Safe data file addresses needed some cleaning up so that the geocoding service could read them. For example street name variations needed to be conformed to something that MapQuest would accept. This was a manual batch find and replace effort. If I was automating this I would use a lookup table of street name variations and replace them with accepted spellling/format.

#Uses MapQuest's Nominatim mirror.

import anydbm
import urllib2
import csv
import json
import time

# set up the cache. 'c' means create if necessary
cache = anydbm.open('geocode_cache', 'c')

# Use MapQuest's open Nominatim server.
# http://developer.mapquest.com/web/products/open/nominatim
API_ENDPOINT = 'http://open.mapquestapi.com/nominatim/v1/search.php?format=json&q={}'

def geocode_location(location):
    '''
    Fetch the geodata associated with the given address and return
    the entire response object (loaded from json).
    '''
    if location not in cache:
        # construct the URL
        url = API_ENDPOINT.format(urllib2.quote(location))
        
        # load the content at the URL
        print 'fetching %s' % url
        result_json = urllib2.urlopen(url).read()
        
        # put the content into the cache
        cache[location] = result_json
        
        # pause to throttle requests
        time.sleep(1)
    
    # the response is (now) in the cache, so load it
    return json.loads(cache[location])

if __name__ == '__main__':
    # open the input and output file objects
    with open('dinesafe.csv') as infile, open('dinesafe_geocoded.csv', 'w') as outfile:
      
        # wrap the files with CSV reader objects.
        # the output file has two additional fields, lat and lon
        reader = csv.DictReader(infile)
        writer = csv.DictWriter(outfile, reader.fieldnames + ['lat', 'lon'])
        
        # write the header row to the output file
        writer.writeheader()
        
        # iterate over the file by record 
        for record in reader:
            # construct the full address
            address = record['establishment_address']
            address += ', Toronto, ON, Canada'
            
            # log the address to the console
            print address
            
            try:
                # Nominatim returns a list of matches; take the first
                geo_data = geocode_location(address)[0]
                record['lat'] = geo_data['lat']
                record['lon'] = geo_data['lon']
            except IndexError:
                # if there are no matches, don't raise an error
                pass
            writer.writerow(record)

After the Dine Safe data was geocoded so that it had two new columns, one for latitude and another for longitude, all that was left to do was bring the data into Tableau and create the Tableau map visualization which is shown below.

Single CakePHP application & MySQL database to serve multiple domains

I have a bunch of websites that have the same structure and content that vary only by the category for each site. So content for all sites can reside in a single database schema and filtered to specific categories for each site.

In order to do this I created a custom CakePHP application with one MySQL database to serve multiple public facing websites each with their own unique domain.

In order to make this work you need a hosting account that will allow you to map website serving directories to domain names. Different hosting providers may have different methods of doing this.

The secret to make this work is to have the application parse the domain name that is calling the application and then use that to filter the database records for that domain category.

In CakePHP do this by creating a function in the AppController that will check to see what domain the visitor is browsing to and then to retrieve that domain’s specific values from the special table to filter the CakePHP application. You will need to to call this function in the AppController beforeFilter function. I have included an example function named ‘fetchSettings’ below.

It will also be very helpful to make a table that contains meta data for each domain such as keywords, analytics tracking code, etc. I didn’t do include a different CSS file for each domain but you could include CSS file variable to refer to separate CSS file for each domain.

 

Check Github repository for more information:

https://github.com/sitrucp/single_cakephp_mysql_multiple_domain_app

<?php

App::uses('Controller', 'Controller');

class AppController extends Controller {
    
    public function beforeFilter() {
        $this-> getDomainSettings(); 	
    }
    
    //function which reads settings from domain table  
    function getDomainSettings(){
        //get current domain from server name as variable
        $domain = preg_replace('/^www\./', '', $_SERVER['SERVER_NAME']);
        //Load domain model data
        $this->loadModel('Domain');
        //retrieve only current domain table record
        $domain_settings = $this->Domain->find('all', array('conditions' => 
        array('Domain.domain' => $domain)));  
        foreach($domain_settings as $value){
            //create Configure::Write variables from domain record to use elsewhere in application
            Configure::write('domain', $value['Domain']['domain']);
            Configure::write('domain_id', $value['Domain']['id']);
            Configure::write('ga_code', $value['Domain']['ga_code']);
            Configure::write('meta_title', $value['Domain']['meta_title']);
            Configure::write('meta_keywords', $value['Domain']['meta_keywords']);
            Configure::write('meta_description', $value['Domain']['meta_description']); 
            
            //etc retrieve as many domain specific values as required from database
        }
    }
}

?>

How a new salmon stream was carved from the land

Back in the days when I was working in environmental science, the consulting firm I was working for was doing storm water drainage planning in Surrey, BC, Canada.

The City of Surrey was planning for a new sub-division. The new sub-division would have lots of new hardened surfaces eg roads, roofs, and lawns instead of natural scrub and tree cover, which would require more frequent and higher volume water drainage to be channeled somewhere.

Our company proposed diverting this new water flow under a road and into a city right of way. The right of way was a huge section of land that was covered in natural grassland, dense shrubs, trees and a marshy area. The proposed new watercourse would flow about 1000 metres through the property and drain into the Serpentine River. Normally stormwater drainage engineers would simply channel drainage into ditches and/or covered storm water drainage that would eventually reaching the Serpentine River anyways.

The new channel was designed to mimic a natural watercourse that may have been created naturally in this field given the field’s topography, soil type and a water flow rate and volume that our engineers estimated from new sub-division.

Creating a new watercourse is a mix of science, art, physics and engineering. Water has a predictable pattern of flowing over a surface given the surface topography, soil type and  water flow rate and volume. Try putting some drops of water on one end of cookie baking sheet and then slowly lift that side up. The water drops will flow down the cookie sheet and form an s-shaped path. The higher you lift it the more straight the pattern. The same thing happens in nature.

There are other features of natural watercourses that need to be considered. The watercourse will have specific curve measurements in its s-shape. The size of curve, the depth of the outside bank, the gradient of the stream bed from the inside bank to the outside bank varies with flow.

One of the goals for this new watercourse was to provide natural habitat for fish and wildlife. So we wanted to include features that would mimic what could happen naturally such as providing stream cover in form of undercut banks, overhanging trees, grass, shrubs.

Also because we didn’t want nature to dramatically alter the stream’s course which is a natural occurrence we wanted to ensure that that shape would resist erosion. We provided bank protection in form of large tree root wads and other placements of bank hardening.

Our company had a specialist who had created scores of similar water courses. The advice I got was to use a large engineering plan drawing of the field and start sketching out the water course.

So I literally took pencil to drawings and started drawing what I thought the stream would look like. Then we worked backwards, accounting for soil type, expected water flow, curve of bend, depth of flow.

I left the company before the stream was constructed but the eventual shape was pretty close to what we had drawn. As construction progressed modifications were required to accommodate soil composition variances and some of the bank hardening was not used due to concerns about its long term integrity.

However within 2 years there was fish and wildlife habitat where there was none before. That is pretty cool!

Here is Google Maps aerial photo of the new stream. Below is a screenshot with dotted line showing where stream is located.

new steam

Tableau vizualization of Vancouver Car2Go vehicle locations

I happened to have an dataset of Car2Go vehicle locations in Vancouver and used Tableau to plot them by latitude and longitude.

 

 

There are a lot of observations over the time data was collected. There are limited physical number of parking spots so cars are parked in locations that are very close to other car locations. This makes for a very dense plot.

But if you zoom into the view the detail will resolve. For example below is a screenshot of a zoomed into downtown Vancouver. You can very clearly see that cars are parked along city streets. Fixed parking spots are identified by darker color.

 

car2go vehicle locations

 

How to vizualize a work schedule using a Qlikview Gantt chart

This is a Qlikview solution I posted on Qlikview’s forum in response to a user request for displaying employee work schedules in a Gantt chart style presentation.

https://community.qlik.com/thread/51554

The solution included setting up the data for the Qlikview chart in a table format as shown below.

Original:
Load WorkDate,
    Employee
    From [C:\BI\schedule demo - data.xls]
    (biff, embedded labels);       
 
Set IntervalStart = 700;
Set IntervalEnd = 800;
 
//Loop through each employee record to get hour intervals
For i = 1 to 11
Original2:
    Load $(i) as rowNum,
        WorkDate,
        Employee,
        DayStart,
        DayEnd,
        $(IntervalStart) &amp; '-' &amp; $(IntervalEnd) as Interval,
          //Check if hour interval is within emp start&lt;&gt; end and give it "1" (so it can be conditionally formatted later")
          //I didn't do the extra check to see if the interval is break interval but you could just add add'l If to IntervalValue
        If($(IntervalStart) &gt;= DayStart and $(IntervalEnd) &lt;= DayEnd, 1, 0) as IntervalValue
        From [C:\BI\schedule demo - data.xls]
        (biff, embedded labels);
    Let IntervalStart  = IntervalStart + 100;
    Let IntervalEnd = IntervalEnd + 100;                
Next i

The data for this table could come from any data source eg Excel, csv file, database query results, etc. In this example it is an Excel .xls file.

qlik solution data table

Use the Qlikview pivot table using WorkDate, Employee &  Interval as dimensions and IntervalValue as Expression.  Then conditionally format for fill with Expression so if 1 then it green. Note that the “0” values are still showing in my example but you could do conditional text format on 0 to turn them white eg to hide the 0’s.

 The pivot table can be changed to show daily employee coverage as well.

 

Many additional things could be added to this solution. For example if you are tracking employee schedules by another category such as event or department, these can be added to the data and then used to slice and dice by these other dimensions in the pivot tables.

How a UK tent rental company used Google analytics and Tableau to improve sales

UK wedding marquee rental website Google analytics data was analysed using Tableau Public.  The client’s target geographic area is UK Northwest centered around Manchester.

It was interesting to see significant number of site visitors from Pakistan, India and Phillippines.

A bit of customer research reveals that these site visitors are friends and family helping with UK wedding.

The client does have first hand information that his clients have family members offshore who might have helped do wedding planning. But getting hard data from website analytics and seeing this clearly highlighted in the Tableau analysis prompted a call to action for the client to do sales and marketing efforts to advertise to Pakistani, Indian and Phillippines offshore but also to specifically target advertising to these demographic groups inside the UK.

The result was increased bookings and a lift in word of mouth advertising within these demographic groups.

Well done analytical entrepreneur. Yes, analytics can be that easy and effective. Just use the tools, do the work, and listen to the analysis!

Time based SLA KPI created using MS Access VBA

A client had a time based activity that had a start and end datetime . The client wanted to be able to calculate a service level agreement (SLA) metric based on the duration of time to complete the activity.

This SLA included requirement that time only be included in the metric if it was during work hours defined as 8am to 4pm excluding weekends and holidays. Holidays were identified by listing them manually in a database table (below).

The SLA metric was calculated for each activity as follows:

SLA Actual = End Datetime – Start Datetime in hours decimal value, excluding weekends, holidays and non-work hours.

This SLA metric really changed people’s view of the policy. Without specific data people’s perceptions of the time elapsed from start to end were wildly variant.

For example if an activity started at 3.55 PM on a Friday with a long weekend and was finished the following Tuesday at 10 AM it would only have 5 minutes time on Friday and 2 hours on Tuesday for total of 2 hours and 5 minutes or 2.083 hours. However before the SLA metric some perceived it elapsed SLA time as much as 5 days processing time. Using the SLA to analyse activity by people and department involved, activity category, etc was invaluable in identifying business process, training and communication issues and improvements.

The system used was an MS Access database.

The metric was to be calculated in and presented to user in an Access form so agents could see their real-time performance based on the SLA metric. The form had an auto refresh so the metric (and others) would be current.

The SLA metric also had to be calculated in MS Access queries to generate SLA metrics values for real-time and historical reporting and analysis purposes.

The function was included in required queries and an current SLA metric was created when the query was run, normally when a report was refreshed.

To enable both of these requirements a VBA function was created that could be called by both an Access form and Access query. The function VBA code is shown below.

The function output is a decimal value representing the number of hours. You can get an hour and minute value from the hour decimal by using modulo formula.

A challenge for using function in queries was that reports often included long time spans so potentially 100,000+ records could be included in query and as this function would run on each record it would be very slow. To address this the query function would only be applied to records that were more recent, with business rule assuming that older records would not have changing SLA eg they would be complete.

This VBA code and process can be reused for other similar purposes as long as you provide the function the required start and end datetime values (as parameters) and you use a holiday table which is a simple two column table (see below). You could modify the hard coded work day start and end hours to suit your business needs.

Option Compare Database
Option Explicit
'*** This is the code that is used to help calculate SLA actual hours.  The code calculates
'*** the net hours (total hours excluding weekends, holidays &amp; hours outside of workday (8am-4pm)
'*** for the following:
'*** - net decision time (hours from IC Date/Time aka Application Received Date to Decision Date/Time)
'*** - net event time (for each event, hours from Start Date/Time to End Date/Time)
'*** The SLA actual hours = Net decision time - Sum(net event time) is calculated later
'*** in a set of queries.

Function NetHours(dteStart As Date, dteEnd As Date) As Single

    Dim GrossDays As Integer
    Dim WorkingDays As Integer
    Dim OneDayHours As Single

    Dim fixedEnd As Date
    Dim fixedStart As Date
    Dim dteStartEnd As Date
    Dim dteEndStart As Date

    Dim StartDayHours As Single
    Dim EndDayHours As Single

    Dim intCount As Integer
    Dim dteCurrDate As Date

    dteStartEnd = DateValue(dteStart) + TimeValue("4:00pm")
    dteEndStart = DateValue(dteEnd) + TimeValue("8:00am")

    'If dteStart or dteEnd before 8am, use 8 am, if after 4 pm use 4 pm otherwise use dteStart or dteEnd
        If dteStart &gt; DateValue(dteStart) + TimeValue("8:00am") And dteStart &lt;        DateValue(dteStart) + TimeValue("4:00pm") Then
            fixedStart = dteStart
        Else
            Select Case dteStart
            Case Is &lt;= DateValue(dteStart) + TimeValue("8:00am")
                fixedStart = DateValue(dteStart) + TimeValue("8:00am")
            Case Is &gt;= DateValue(dteStart) + TimeValue("4:00pm")
                fixedStart = DateValue(dteStart) + TimeValue("4:00pm")
            End Select
    End If

    If dteEnd &gt; DateValue(dteEnd) + TimeValue("8:00am") And dteEnd &lt; DateValue(dteEnd) + TimeValue("4:00pm") Then
        fixedEnd = dteEnd
    Else
        Select Case dteEnd
        Case Is &lt;= DateValue(dteEnd) + TimeValue("8:00am")
            fixedEnd = DateValue(dteEnd) + TimeValue("8:00am")
        Case Is &gt;= DateValue(dteEnd) + TimeValue("4:00pm")
            fixedEnd = DateValue(dteEnd) + TimeValue("4:00pm")
        End Select
    End If

    'Calculate hours on 1st day but check if they are weekends or holidays first
    'and if hours aren't within workday then assign hours
    StartDayHours = 0
    If Weekday(dteStart, vbMonday) &lt;= 5 And IsNull(DLookup("[Holiday_Date]", "A_Holidays", _
    "[Holiday_Date] = " &amp; Format(dteStart, "\#mm\/dd\/yyyy\#;;;\N\u\l\l"))) Then
        StartDayHours = (DateDiff("n", fixedStart, dteStartEnd)) / 60
    Else
        StartDayHours = 0
    End If

    'Calculate hours on last day but check if they are weekends or holidays first
    'and if hours aren't within workday then assign hours
    EndDayHours = 0
    If Weekday(dteEnd, vbMonday) &lt;= 5 And IsNull(DLookup("[Holiday_Date]", "A_Holidays", _
    "[Holiday_Date] = " &amp; Format(dteEnd, "\#mm\/dd\/yyyy\#;;;\N\u\l\l"))) Then
        EndDayHours = (DateDiff("n", dteEndStart, fixedEnd)) / 60
    Else
        EndDayHours = 0
    End If

    'Count workdays excluding weekends, holidays and first and last date
    WorkingDays = 0
    If DateDiff("d", dteStart, dteEnd) &gt; 1 Then
        intCount = 0
        dteCurrDate = DateValue(dteStart)
        Do While dteCurrDate &lt; DateValue(dteEnd)
            If Weekday(dteCurrDate, vbMonday) &lt;= 5 And dteCurrDate &lt;&gt; DateValue(dteStart) And dteCurrDate &lt;&gt; DateValue(dteEnd) _
            And IsNull(DLookup("[Holiday_Date]", "A_Holidays", _
            "[Holiday_Date] = " &amp; Format(dteCurrDate, "\#mm\/dd\/yyyy\#;;;\N\u\l\l"))) Then
                intCount = intCount + 1
            Else
                intCount = intCount
            End If
            dteCurrDate = dteCurrDate + 1
        Loop
        WorkingDays = intCount
    Else
        WorkingDays = 0
    End If

    'Calculate gross # days between start and end
    GrossDays = 0
    GrossDays = DateDiff("d", dteStart, dteEnd)

    'Calculate hours between start and end times on same day
    OneDayHours = 0
    'If fixedStart &lt; fixedEnd Then
    If Weekday(dteStart, vbMonday) &lt;= 5 And IsNull(DLookup("[Holiday_Date]", "A_Holidays", _
    "[Holiday_Date] = " &amp; Format(dteStart, "\#mm\/dd\/yyyy\#;;;\N\u\l\l"))) Then
        OneDayHours = (DateDiff("n", fixedStart, fixedEnd)) / 60
    Else
        OneDayHours = 0
    End If
    'Else
    '    OneDayHours = 0
    'End If

    'Depending on # GrossDays then give final NetHours result
    NetHours = 0
    Select Case GrossDays
        Case 0
        'Start and end time on same day
            NetHours = OneDayHours
        Case 1
        'Start and end time on consecutive days
            NetHours = NetHours + StartDayHours
            NetHours = NetHours + EndDayHours
        Case Is &gt; 1
        'Start and end time on non consecutive days
            NetHours = WorkingDays * 8
            NetHours = NetHours + EndDayHours
            NetHours = NetHours + StartDayHours
    End Select

End Function

 

The holidays table had these two columns, one for holiday date and another for the holiday name (which was not used by function just to identify the date).

Holiday_Date Holiday
01/01/2008 New Year’s Day
01/02/2008 Family Day (Markham)
21/03/2008 Good Friday
19/05/2008 Victoria Day
01/07/2008 Canada Day
04/08/2008 Civic Day
01/09/2008 Labour Day
13/10/2008 Thanksgiving Day
11/11/2008 Remembrance Day
25/12/2008 Christmas
26/12/2008 Boxing Day
01/01/2009 New Year’s Day
10/04/2009 Good Friday
18/05/2009 Victoria Day
01/07/2009 Canada Day
03/08/2009 Civic Holiday
07/09/2009 Labour Day
12/10/2009 Thanksgiving Day
11/11/2009 Remembrance Day
25/12/2009 Christmas
28/12/2009 Boxing Day lieu
01/01/2010 New Year’s Day
02/04/2010 Good Friday
24/05/2010 Victoria Day
01/07/2010 Canada Day
02/08/2010 Civic Holiday
06/09/2010 Labour Day
11/10/2010 Thanksgiving Day
11/11/2010 Remembrance Day
27/12/2010 Christmas Day lieu
28/12/2010 Boxing Day lieu
03/01/2011 New Year’s Day lieu
22/04/2011 Good Friday
23/05/2011 Victoria Day
01/07/2011 Canada Day
01/08/2011 Civic Holiday
05/09/2011 Labour Day
10/10/2011 Thanksgiving Day
11/11/2011 Remembrance Day
26/12/2011 Christmas Day lieu
27/12/2011 Boxing Day lieu

Visualization of Toronto’s 311 contact centre open data

This is based on City of Toronto’s Open Data website 311 call performance data for the 4 years 2010, 2011, 2012 and 2013.

The data is provided by the City in a Google Spread Sheet and has been downloaded and saved here in Excel 2013 format 311_daily_contact_centre_data.

I used Excel pivot tables and charts for analysis and visualization.

The call volume has remained relatively consistent year over year from 2010 to 2013. The chart below shows all daily calls from 2010 to 2013. On average there are about 5,000 calls per day. There are seasonal variations evident with peaks in summer and a few big spikes notably one at end of December 2013 where it spiked to over 20,000 calls per day. Not sure what that was.

311-chart-by day

Weekend calls volume is dramatically lower compared to weekday calls. That indicates that 311 calls are business related.

311-chart-wkday vs wkend

The 311 line keeps a fairly consistent average call talk time of 230 seconds or about 4 minutes  as represented by the black line in chart below.

The average speed of answer metric varies quite a bit (red line in chart below). We can see that answer time follows call volume.

When a call center gets more calls it takes longer to answer the call. This indicates the call center has same number of agents available regardless of season, day of week or special event. It is probably too expensive and or challenging to hire staff to work part time or on call for these surge events.  There are also some anomalously high call answer times that might be due to under staffing or equipment failures.

The grouping of calls, talk times and answer times by month in the chart below may obfuscate daily variations. Also daily outliers may skew the monthly totals but viewed month over month does a good job of showing trends.

311-chart-by day bar

The call center metrics around call abandonment are important measures. We will see near end of the post how this is used to create a ‘service level’ metric.  The chart below shows a breakdown of how many calls are actually answered and connected to the a 311 Call Center agent.

  • Only about 75% of the total calls that come into the 311 call centre are actually answered (blue part of bar).
  • The remaining 25% of the calls are abandoned by the caller.
    • On average, 15% are abandoned within 30 seconds (green part of bar). These people are busy, they won’t wait, and leave the hold queue relatively quickly.
    • On average, 10% wait longer than 30 seconds before hanging up (red part of bar). These people invested their time in waiting on hold.

311-call relative volume

As mentioned above, the call center creates a ‘service level’ metric that is a percentage value based on  abandoned calls and call answer time. When there are no abandoned calls the service level approaches 100%.  However, the Toronto 311 call center has not hit 100% very often as shown by the orange line in chart below which is average service level percent by month. In fact it has never been 100% for any month over the 4 years.

service level

Another way to look at service level is to look at how many days met service level categories.  The chart below groups daily service level counts into the following categories: 0-25%, 25-50%, 50-75% and 75-100%.  Roughly only 70% of the total days in 2012 and 2013 were in the 75-100% service level.

service level category year

Yet another way to look at the service level is to look at the service level daily attainment frequency over the 4 years. The chart below shows service level daily frequency. For example service level of 100% was attained only for 4 days from 2010 to 2013. This view provides more granularity on call center service levels.

service level daily freq

This analysis suggests that the call center needs to increase its service level metric. Since service level is largely dependent on call abandonment, something needs to be done decrease number of calls sent to hold which results in 25% of callers waiting but then hanging up.  What can a call center do about this?

  • Increase number of agents to answer calls is the most obvious action. The challenge is they should be part time or shift workers to handle spikes in call volume.
  • Reduce call talk time so agents can answer more calls and reduce calls sent to hold and abandoned calls. This can be done by improving training or more making information more accessible to the agents so they can answer questions faster and more efficiently.
  • Instead of putting callers on hold offer them the option to alternative information channels perhaps to an interactive automated voice or interactive website.
  • Do analysis of reasons people are calling 311 and endeavour to provide that information proactively in other channels eg on City websites, billboards, or other public communication channels.

The 311 call center data is available at the City of Toronto Open Data website with their catalogue name “311 Contact Centre Performance Metrics”.