Visualization of Toronto’s 311 contact centre open data

This is based on City of Toronto’s Open Data website 311 call performance data for the 4 years 2010, 2011, 2012 and 2013.

The data is provided by the City in a Google Spread Sheet and has been downloaded and saved here in Excel 2013 format 311_daily_contact_centre_data.

I used Excel pivot tables and charts for analysis and visualization.

The call volume has remained relatively consistent year over year from 2010 to 2013. The chart below shows all daily calls from 2010 to 2013. On average there are about 5,000 calls per day. There are seasonal variations evident with peaks in summer and a few big spikes notably one at end of December 2013 where it spiked to over 20,000 calls per day. Not sure what that was.

311-chart-by day

Weekend calls volume is dramatically lower compared to weekday calls. That indicates that 311 calls are business related.

311-chart-wkday vs wkend

The 311 line keeps a fairly consistent average call talk time of 230 seconds or about 4 minutes  as represented by the black line in chart below.

The average speed of answer metric varies quite a bit (red line in chart below). We can see that answer time follows call volume.

When a call center gets more calls it takes longer to answer the call. This indicates the call center has same number of agents available regardless of season, day of week or special event. It is probably too expensive and or challenging to hire staff to work part time or on call for these surge events.  There are also some anomalously high call answer times that might be due to under staffing or equipment failures.

The grouping of calls, talk times and answer times by month in the chart below may obfuscate daily variations. Also daily outliers may skew the monthly totals but viewed month over month does a good job of showing trends.

311-chart-by day bar

The call center metrics around call abandonment are important measures. We will see near end of the post how this is used to create a ‘service level’ metric.  The chart below shows a breakdown of how many calls are actually answered and connected to the a 311 Call Center agent.

  • Only about 75% of the total calls that come into the 311 call centre are actually answered (blue part of bar).
  • The remaining 25% of the calls are abandoned by the caller.
    • On average, 15% are abandoned within 30 seconds (green part of bar). These people are busy, they won’t wait, and leave the hold queue relatively quickly.
    • On average, 10% wait longer than 30 seconds before hanging up (red part of bar). These people invested their time in waiting on hold.

311-call relative volume

As mentioned above, the call center creates a ‘service level’ metric that is a percentage value based on  abandoned calls and call answer time. When there are no abandoned calls the service level approaches 100%.  However, the Toronto 311 call center has not hit 100% very often as shown by the orange line in chart below which is average service level percent by month. In fact it has never been 100% for any month over the 4 years.

service level

Another way to look at service level is to look at how many days met service level categories.  The chart below groups daily service level counts into the following categories: 0-25%, 25-50%, 50-75% and 75-100%.  Roughly only 70% of the total days in 2012 and 2013 were in the 75-100% service level.

service level category year

Yet another way to look at the service level is to look at the service level daily attainment frequency over the 4 years. The chart below shows service level daily frequency. For example service level of 100% was attained only for 4 days from 2010 to 2013. This view provides more granularity on call center service levels.

service level daily freq

This analysis suggests that the call center needs to increase its service level metric. Since service level is largely dependent on call abandonment, something needs to be done decrease number of calls sent to hold which results in 25% of callers waiting but then hanging up.  What can a call center do about this?

  • Increase number of agents to answer calls is the most obvious action. The challenge is they should be part time or shift workers to handle spikes in call volume.
  • Reduce call talk time so agents can answer more calls and reduce calls sent to hold and abandoned calls. This can be done by improving training or more making information more accessible to the agents so they can answer questions faster and more efficiently.
  • Instead of putting callers on hold offer them the option to alternative information channels perhaps to an interactive automated voice or interactive website.
  • Do analysis of reasons people are calling 311 and endeavour to provide that information proactively in other channels eg on City websites, billboards, or other public communication channels.

The 311 call center data is available at the City of Toronto Open Data website with their catalogue name “311 Contact Centre Performance Metrics”.

Introducing Cardivvy – a website showing Car2Go real time car locations, parking and service areas

Car2Go provides developer access to their real-time vehicle location and parking availability, and service area boundaries for all of their city locations around the world.

I was using the Car2Go API to power a website created for my own use after struggling with the official Car2Go map on their site when using my mobile phone. I wanted a way to find cars by street locations.

The site includes link to get vehicle location, parking lot location and service area boundaries on a Google Map. You can click into each city, view available cars alphabetically by street location or parking lot location, and then click through to car current location on a Google Map. Each city’s Car2Go service area can be viewed on a Google Map.

I am waiting for ZipCar to get their API up and running which should be available Fall 2015, then I will integrate that into the cardivvy site too so I can see where both cars are available.

This is example of Car2Go service area boundary for Vancouver area.


LinkedIn ‘People You May Know’ web scraping and analysis

A while back LinkedIn sneakily vacuumed up all of my contacts from my phone via the Android Cardmunch app. Turns out Cardmunch is owned by LinkedIn. There must be fine print somewhere that indicates they do this but I sure didn’t see it.

Many of my contacts that were imported were out of date and in most cases not someone I wanted to be LinkedIn with. Some had actually passed away.

It took a mini-campaign of customer service interaction to get LinkedIn to delete the imported contacts.

Anyways I discovered this had happened when suddenly large numbers of my contacts started showing up in my LinkedIn’s “People You May Know” page.

The PYMK page is a LinkedIn feature that identifies 1,000 people LinkedIn thinks you may know. LinkedIn identifies people you may know by matching contacts they vacuum up from everyone’s address books. They probably also do matching of people on company name, profession, city, LinkedIn groups, etc too.

When LinkedIn finally agreed to delete the contacts I monitored the PYMK page to make sure they were doing it and that it was permanent.

My monitoring was a mix of manual work and automation. I regularly manually downloaded and saved the PYMK webpage and extracted the names on the page to see if my stolen contacts were still on the page. The contacts were removed very quickly (thank you LinkedIn : )) but I continued downloading the PYMK page because I was curious to see how the names would change over time. I ended up downloading the page 29 times over a 3 month period.

I used Python and BeautifulSoup to process the downloaded PYMK html pages and scrape them for the data I wanted.

I used Excel add-in Power Query to shape the data and Excel Pivot tables and charts for the visualizations.

After I downloaded a new page I would run the code on a folder containing the PYMK web page files to produce a data file for analysis. I just wanted to see that my 2,000 imported contacts were deleted. Finally after a few weeks they were gone.

Here are some of the results.

Over the 3 month period about 6,300 unique people were on my PYMK page at least once.

The data I have is incomplete because it wasn’t a daily sample of the PYMK page. I downloaded the pages only 29 times over a 3 month period of time.

Even so it does give some relative information about people’s appearances on my PYMK page.

People’s appearances were not a contiguous series of days. There were gaps in appearances. LinkedIn appears to swap people in and out over a duration of days.

A Gantt chart style visualization made the pattern of people’s appearances obvious. The screenshot below shows an overview a huge 6,300 row Gantt chart that has one unique person per row. The columns are the 29 downloads. The records are sorted descending by 1st date of appearance eg so most recent are on top.

The pink cells indicate that a person appeared on that downloaded PYMK page. Blue cells cells are where they did not appear on the page. At a quick glance you can easily see the regular patterns of appearances on the PYMK web page.

Over time eg going from bottom of chart to top (it is sorted by date people are added to page eg most recently added on top) you can see people are always being introduced to the PYMK page. Some people added in past continue to appear on the page and some appear a few times never to reappear on page.

The varying patterns indicate that the methodology LinkedIn uses to select people to show on my PYMK page changed over time.

The two big columns of pink at the very bottom there are the 2,000 people that were imported from my contact book. Most appeared for only the first few downloads and then LinkedIn deleted them so they don’t ever appear again.

Gantt chart style presentation of all 6,300 people (one unique person per row). Records are sorted descending by 1st date of appearance eg so most recent are on top.
Click on image to open in new tab to view full size.


Using the 1st and last day people appeared on the 29 downloads I could calculate a ‘duration’ for each person. These are durations are shown in a frequency distribution chart below.

duration freq

Many people appeared over most of the 3 months. About 50% remained on the page for more than 2 months. This would have changed had the sampling continued eg more people may have remained on the page as long too.

However, the relative distribution does indicate that there is a split between people that stay on page and those that are just passing through.

The bulk of the people who appear only once on the PYMK page once are the 2,000 contacts that were imported and then deleted. Some of these appeared in the first PYMK page download and never again, some appeared in one or two subsequent downloads until they were all finally deleted.

What is interesting is that LinkedIn was not able to match these contacts to me using their other methods. That hints that the basic mechanism behind the PYMK matching is simple contact name matching between LinkedIn accounts.

Of the 29 downloaded PYMK pages most people were on the page less than 9 times as shown in the frequency distribution below. Daily sampling would likely see these counts increase though I expect the relative distribution would be similar.

days freq

I created a ‘presence’ metric that is the relative # days appearances people have over their entire # days from their 1st appearance to last appearance. This is shown in the frequency distribution below. The big spike at 100% are the imported contacts which showed up in only one download (and then were deleted from LinkedIn forever).

presence freq

Daily sampling would have seen the distribution shift to the right towards 100%. I guess that the peak of the distribution would shift up to around 30% eg most people appear about 30% of the time from when they 1st appear to their last appearance.

The Python code used to scrape the downnloaded PYMK web pages was the following:

  1. Go to LinkedIn PYMK page, scroll down until all 1000 contacts are shown. The page has ‘infinite scrolling’ that pages through entire 1000 contacts incrementally. I couldn’t find easy way to get Python to do this for me automatically which would have been nice.
  2.  Save resulting web page as html file on my computer.
  3.  Run Python script with BeautifulSoup (below) to get list of names.
  4.  Compare list of names day to day to see the changes made.

Python code to get names from a single html file:

from bs4 import BeautifulSoup
soup = BeautifulSoup (open("pymk.htm"))
    for card in soup.findAll('a', 'title'):
        for title in card.find_all('span', 'a11y-headline'):
            print str(card['title'].encode("utf-8")) + ' | ' + str(title.string.encode("utf-8"))


Python code to get names from a folder of html files:

import os
from bs4 import BeautifulSoup
    path = str('C:/PYMK Page Downloads/')
    for file in os.listdir(path):
        file_path = os.path.join(path, file)
        soup = BeautifulSoup(open(file_path))
        for card in soup.findAll('div', 'entityblock profile'):
            name2 = card.find("a", {"class":"title"})['title']
            name = name2.encode("utf-8")
            title2 = card.find("span", {"class":"a11y-headline"})
            title = title2.text.encode("utf-8")
            connections2 = card.find("span", {"class":"glyph-text"})
            if connections2 is None:
                connections = 0
                connections = connections2.text.encode("utf-8")
            print str(file) + ' | ' + str(name) + ' | ' + str(title) + ' | ' + str(connections)

Analysis of results

Visualizing human arterial blood flow with a D3.js Sankey chart

This Sankey chart represents human arterial blood flow from the heart down into the smallest named arteries.

A Sankey chart visualizes directional connections between nodes. In this case the nodes are artery names and the connections are flow of blood through the arteries.

The full sized chart is huge, so first take a look at the small size view below to get oriented.


The data used to create the visualization can also be viewed in tabular format here.

Two screenshots below, from the visualization’s top level branches, highlight where blood flows from heart to body and lungs. The grey connections represent blood flow. The colored ‘nodes’ represent the arteries.

You are able to click on the nodes which will highlight the connections and blood flow. This is a great way to see how blood flows from artery to artery.

The data was sourced from this UAMS web page

Quite a bit of cleaning was required to get the data ready to be used by D3 to create the Sankey charts. Some of the data transformations required are described below:

    • Edited artery names to remove commas eg, alveolar, anterior superior to anterior superior alveolar. This was done using Excel and VBA.
    • Creating new table rows to link arteries with their sources eg, Left Ventricle is source for Aorta.
    • Creating new table rows for both right and left arterial pairs. This was done by duplicating entire table, to create two sets (one for left, one for right) then manually editing to remove where there are no pairs eg Aorta.
    • Occasionally modified artery name by adding its source artery name to differentiate name from others. Eg Sigmoid Ascending Br. instead of Ascending Br. This was done because sankey.js requires unique node names and some arteries had same name in data source eg Ascending Br.
    • Created new table rows to include arterial branches. Used Excel and VBA to separate branches separated by comma into new rows.

Some interesting observations about the artery data used to create chart and possibilities to extend the data and chart:

    • There are missing arteries or errors in naming and linkages so please don’t study for your anatomy exam using this : )
    • The Sankey chart termination nodes are major arteries. However,  the UAMS arterial tables indicates that ‘unnamed branches’ are actually the terminating nodes. If I included these, I would have had to create unique names for each ‘unnamed’ arterial reference but I was too lazy to do that 🙂
    • The termination nodes are assumed to be capillaries eg other than the ‘unnamed’ arteries, the blood flows into capillaries, then goes back to heart via the Venous system.
    • I was surprised to learn that in reality there can be a large number of ‘arterial anastomoses’ where hierarchically unrelated arteries join together. I didn’t include any of these.
    • The ‘other half’ of the body’s blood flow eg the Venous is not included in this chart. One day I might revisit and include the Venous system.
    • It would be cool to add accurate blood flow ‘value’ to each artery-source connection which would be an additional column in the data table for blood flow volume for each artery.
      • The Sankey chart would represent these volumes by varying artery-source connection thickness.
      • These would be proportionate values. For example the blood flow from Left Ventricle to Aorta (Ascending Aorta) would be 100% or whatever relative numerical value we could assign (I used 10). From that point, say 30% goes to brachiocephalic trunk and 70% continues into Aortic Arch and down into the body, and so on with each further branching into other arteries.
    • It would be even more interesting to setup a dynamic system that changes these blood flow volumes based on heart rate.
      • Could also represent blood flow changes, for example, if a femoral artery were cut, what would be the effect of flow on rest of system?
      • The Sankey chart values could change to reflect changes. We could change artery-source link color eg red (increase), green (normal) or yellow(decrease) to indicate increase/decrease in blood flow resulting from these system perturbations.

The Sankey chart was created using a D3 JavaScript library Sankey plugin.

This arterial Sankey chart includes some additional D3 JavaScript that modified the chart formatting and includes feature to highlight node links and to allow node drag and drop re-positioning. You can view the JavaScript for my Sankey chart simply by viewing the Sankey chart’s page source.

The code used to create this is available in this Github repository.