Canadian Government First Nations long term water advisory data

The Government of Canada is working with Canadian First Nations communities to end long-term drinking water advisories which have been in effect for more than 12 months since November 2015. This includes projects to build or renew drinking water infrastructure in these communities.

The First Nation drinking water advisories were a big part of the 2021 Canadian federal election campaigns. The current government had made big promises to fix all of the advisories in 2015. The opposition criticized them for not having achieved that goal to-date. However there was poor communication about the actual number and status of the advisory projects. So I wanted to find some data to learn the actual status for myself.

TLDR the results can be seen on a Github.io hosted page.  A screenshot is provided below. The code to retrieve and transform the data to create that web page are in my Github https://github.com/sitrucp/first_nations_water repository.

Finding the data I needed only took a  few minutes of Google search from an Indigenous Services Canada  website that mapped the water advisories by counts of advisories by FN community, project status, advisory dates, and project type.

While the web page also included a link to download the map data, after a bit of web page scraping and inspection, I learned that the downloadable map data was created by web page Javascript from another larger data source referenced in the map page code https://www.sac-isc.gc.ca//DAM/DAM-ISC-SAC/DAM-WTR/STAGING/texte-text/lTDWA_map_data_1572010201618_eng.txt.

This larger data source is a text file containing JSON format records of all Canadian First Nations communities and is many thousands of records. As a side note this larger file appears to be an official Canadian government dataset used in other government websites. The map data is limited to only about 160 First Nations communities with drinking water issues.

So rather than download the 160 record map page data, I retrieved the larger JSON format text file and used similar logic as that for the map web page code to get the 160 records. This was done in two steps.

Step 1: retrieve JSON format text file using a Python script water_map_data.py to retrieve and save the data file locally. I may yet automate this using a scheduled task so the map gets regularly updated data as advisory status changes over time.

Step 2: process the saved data file and present it in an HTML web page as Plotly.js charts and an HTML tabular format using Javascript in this file  first_nations_water.js

Finally, I also separately created a Excel file with Pivot Table and Chart that you can download and use to do your own analysis. Download this Excel file from the Github repository. The file contains an Excel Power Query link to the larger text JSON file described above. You can simply refresh the query to get the latest data from the Indigenous Services Canada  website.

COVID-19 Data Analysis and Visualization Summary

This is a list of Canadian COVID-19 related data analysis and visualization that I created during 2020/21 pandemic.

Canada COVID-19 Case Map – COVID-19 new and total cases and mortalities by Canadian provincial health regions.
https://sitrucp.github.io/canada_covid_health_regions

Canada COVID-19 Vaccination Schedule – Canada’s current and historical vaccine doses administered and doses distributed. Also includes two distribution forecasts: 1) based on Canadian government vaccine planning and 2) based on Sep 30 2021 goal to vaccinate all 16+ Canadians. Uses COVID-19 Canada Open Data Working Group data.
https://sitrucp.github.io/covid_canada_vaccinations

Canada COVID-19 Vaccination vs World – Canada’s current and historical ranking vs all countries in the Our World in Data coronavirus data  by total doses, daily doses and total people vaccines adminstered.
https://sitrucp.github.io/covid_global_vaccinations

Global COVID-19 Vaccination Ranking – Ranking of all countries in the Our World in Data coronavirus data by daily vaccine dose administration. Includes small visualization of all time, population, vaccines used and trend. Can sort by these measures.
https://sitrucp.github.io/covid_world_vaccinations

COVID-19 New Case Rate by Canadian Health Regions Animation – SVG animation of new cases visualized as daily rate for each Canadian provincial health regions. Like a horse race, faster moving dot means higher daily rate.
https://sitrucp.github.io/covid_rate_canada

COVID-19 New Case Rate by Country Animation – SVG animation of new cases visualized as daily rate for each country in the Our World in Data dataset. Like a horse race, faster moving dot means higher daily rate.
https://sitrucp.github.io/covid_rate_world

COVID-19 New Case Rate by US State Animation – SVG animation of new cases visualized as daily rate for each US state. Like a horse race, faster moving dot means higher daily rate.
https://sitrucp.github.io/covid_rate_us

Apple Mobility Trends Reports – Canadian Regions Data – Apple cell phone mobility data tracking data used to create heat map visualizations of activity over time.
https://sitrucp.github.io/covid_canada_mobility_apple

WHO Draft landscape of COVID-19 Candidate Vaccines – AWS Textract used to extract tabular data from WHO pdf file. Python and Javascript code then used to create webpages from extracted data.
https://sitrucp.github.io/covid_who_vaccine_landscape

Montreal Confirmed COVID-19 Cases By City Neighbourhoods – Code and process used to scrape Quebec Health Montreal website to get COVID-19 case data for Montreal city boroughs.
https://github.com/sitrucp/covid_montreal_scrape_data

Use Excel Power Query to get data from Our World In Data – How to use Excel’s Power Query to get Our World in Data Github csv files automatically and update with simple refresh.
https://009co.com/?p=1491

 

Use Excel Power Query to get data from Our World In Data

Our World In Data (OWID) has been doing a hero’s job of collating the world’s covid vaccine distribution and administration data. They generate additional metrics and allow site visitors to analyse them in awesome interactive visualizations.

However, I wanted to do some analysis that went beyond what was possible on the site. No problem, because the OWID team also provides the data as csv files in their Github repository.

These csv files are easily manually downloaded from Github to be used. But as the OWID vaccine data is updated regularly, often twice per day, you will probably want to automate the retrieval of updated csv files from Github.

There are many tools and methods to automate this but in this blog post I want to quickly highlight how Excel users can use Excel’s Power Query Get Data->From Other Sources->From Web feature to link to the csv files in Github. (Read my other blog post to learn more about Power Query.)

Once implemented you only need to click refresh to get latest csv data from OWID’s Github repository.

First, you need to get a url for the csv file’s in the OWID Github repository.

The easiest way to get correct url is to open the csv file in Github in Raw format by clicking on the file name and then clicking on the Raw button on top right corner of page. This will open the csv file in native “raw” format. Then you can simply copy the url from browser location bar.

You can also manually create a url to any Github repository file as follows below. Let’s use the vaccinations.csv file in OWID’s Github repository as an example.

https://raw.githubusercontent.com/owid/COVID-19-data/master/public/data/vaccinations/vaccinations.csv

The vaccinations.csv Github repository file url has the following parts:

    • Base url: https://raw.githubusercontent.com
    • Repository owner: owid
    • Repository name: COVID-19-data
    • Repository branch: master
    • Repository base element: public
    • Repository folders: data/vaccinations
    • Repository file name: vaccinations.csv

Based on this example you should be able to manually create a url for any file in any Github repository.

Once you have your url, you can test if it is correct by simply copy pasting it into a browser location bar and verifying the url is good, and retrieves the data you are looking for.

After verifying the url is correct, go to your Excel file, find the Excel Power Query Get Data feature and select From Other SourcesFrom Web which will show input field for a url. Paste your url into the input field and simply click Load, which will load the csv file data into an Excel worksheet.

Now instead of manually downloading data from Github, you can simply click refresh and automatically retrieve the most recent csv file from the OWID Github repository.

Here is full M-code for a Query connected to the OWID Github “vaccinations.csv”. You can copy the code below and paste it into a blank Query Advanced Editor and it will return OWID data to your Excel file.

 let
Source = Csv.Document(Web.Contents("https://raw.githubusercontent.com/owid/COVID-19-data/master/public/data/vaccinations/vaccinations.csv"),[Delimiter=",", Columns=12, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"location", type text}, {"iso_code", type text}, {"date", type date}, {"total_vaccinations", Int64.Type}, {"people_vaccinated", Int64.Type}, {"people_fully_vaccinated", Int64.Type}, {"daily_vaccinations_raw", Int64.Type}, {"daily_vaccinations", Int64.Type}, {"total_vaccinations_per_hundred", type number}, {"people_vaccinated_per_hundred", type number}, {"people_fully_vaccinated_per_hundred", type number}, {"daily_vaccinations_per_million", Int64.Type}})
in
#"Changed Type"

 

Heat maps of Canadian activity changes due to COVID-19 using Google Community Mobility Reports

During the 2020 COVID-19 pandemic in Canada I wanted to get better understanding of the geographical distribution of COVID-19 related activity changes across Canada.

Google has helpfully provided freely available global “Community Mobility Reporting” which shows Google location history change compared to baseline by country, and country sub-regions. These provide changes in activity by location categories: Workplace, Retail & Recreation, Transit Stations, Grocery & Pharmacy and Parks locations, and Residential locations. For Canada it is available by province. As of April 19, data contained daily values from Feb 15 to Apr 11.

The Community Mobility Reporting data is available as a single csv file for all countries at Google Community Mobility Report site. In addition, Google provides feature to filter for specific country or country sub regions eg state or provinces, etc and download resulting PDF format.

As the COVID-19 lockdowns occurred across Canada you would expect that people were less likely to be in public spaces and more likely to be at home. The Community Mobility Reporting location history allows us to get some insight into whether or not this happened, and if it did, to what degree and how this changed over time.

I used the Community Mobility Report data to create a D3.js heat map visualization which is described in more detail below and in this Github repository.

I also created an Excel version of this heat map visualization using Pivot Table & Chart plus conditional formatting. This Excel file, described in more detail below, is available in the Github repository.

More detail and screenshots of visualizations is provided below:

Heatmaps
Heatmaps are grids where columns represent date and rows province/territory. Each heatmap is a grid representing a single mobility report category. The grid cell colors represent value of percent change which could be positive or negative. Changes can be observed as lockdowns occurred where locations in public areas decreased relative to baseline. Inversely, residential location increased relative to baseline as people sheltered in place at their homes.

1) Heatmap created using Excel / Power Query: For this heatmap visualization the global csv data was transformed using Excel Power Query. The Excel file has two Pivot Table and Chart combos. The Excel files and Power Query M Code are in the repository. Excel files are available in Github repository.

2) Heatmap created using D3.js: For this heatmap visualization the global csv data was transformed using Excel Power Query. The heatmap visualization was created using slightly modified code from ONSvisual.

Bar charts
These were created using Excel to visualize percent change by Province/Territory and location category using Excel / Power Query. These allow comparison between provinces by date and category. This Excel / Power Query file can be used for analytical purposes to slice and dice global data by date, country, sub region 1 & 2 and category. Excel files are available in Github repository.

Introduction to Excel Power Query aka Power BI Query

What is Power Query?

Power Query is an Microsoft tool for Excel that is used to do the following:

    • Connect to a data source
    • Extract data from source
    • Transform the extracted data
    • Load the transformed data into Excel

This functionality is called ETL (Extract, Transform and Load).

Why use Power Query?

Excel already has another tab of features called Get External Data, why not use that?

    • Excel’s other methods for connecting to and retrieving data may still be useful, but they are not able to transform the data.
    • Power Query connections are non-techie, user friendly, and provide much more versatile and robust functionality.

 

Where is Power Query in Excel?

You can find Power Query in the Data tab Get & Transform icon group in Excel 2016

Note that in previous versions of Excel, Power Query is an add-in that has its own tab. However, in Excel 2016 Power Query is built into Excel.

Get & Transform has 4 icons:

    • Show Queries – this hides/shows the Query pane docked on right side of Excel.
    • New Query – choose data source & modify Power Query settings and options.
    • From Table – choose data source as currently selected Excel table.
    • Recent Sources – choose from recently selected data sources.

Step 1 – Connect to a data source and retrieve data

Select the New Query icon to select which type of data you want to connect to

    • FromFile (csv, Excel)
    • FromDatabase (SQL Server db, Access db)
    • FromOnline Services (Facebook)
    • From Other Sources (webpage, SharePoint List, ODBC connection)

Screenshot shows example of connecting to an Excel Workbook.

    • Select From File – From Workbook
    • Popup dialogue allows you to select an Excel file.

Step 2 – Transform the data – common usage

    • “Calculated columns” replace any worksheet formula requirement
    • Power Query has many more formulas than plain old Excel, including VBA type functionality
    • Power Query has been called “vlookup on steroids”
    • “Pivot” and “unpivot” data
    • Group and aggregate data into sums, min, max, averages
    • Simple to combine many files contents in a folder into one table.
    • Filter, sort, replace with complex logic.
    • Merge two or more separate tables of data together on matching values. If you are familiar with SQL, this is a JOIN query.
    • Append two or more separate tables of data into one combined table of data. If you are familiar with SQL, this is a UNION query.

Step 3 – Load the data into Excel

    • Power Query queries are loaded to a worksheet by default.
    • However, you can choose to save a query as a connection only by selecting Load To – Only Create Connection. This query can be used in other Queries eg in Merge or Append, and by other Excel features eg it can be used as datasource for a Pivot Table.

Step 4 – Click and refresh to retrieve new data

    • When data source(s) are updated, just click Refresh to retrieve and transform new data.
    • No copy & paste of data from other files.
    • No worry about formulas being deleted or ranges being moved.
    • All transformations are safely contained inside the Query.

Bonus: Power Query is also in Microsoft Power BI

    • Power Query has been integrated into Power BI as the “Get Data” feature.
    • Has the same functionality and features as the Power Query in Excel.
    • Same programming language ‘M’.