D3.js SVG animation – COVID-19 rate “race” visualization

This visualization shows COVID-19 new cases as a “race” of dots moving from left to right.

The dot’s “speed” or how long it takes to move from left to right is based on the number of cases per day.

If a country has one case per day, it will take an entire day for the dot to move from left to right. Some countries have many 1000’s of new cases daily and the dot moves from left to right in minutes or seconds.

There are three  visualizations for following geographical regions. Click “viz” to view the visualization and “github code” to view the code for the visualization:

The screenshot below shows countries of world. Some countries have not had any new cases over past 7 days so show as gray. Those that have had new cases over past 7 days are shown as white circle (no change from prev 7 days), red (increase from prev 7 days) or green (decrease from prev 7 days).

The visualization is sorted by country by default but can change sorting by average new cases. In addition, you can toggle between showing new cases as actual count or new cases per million (population).

The visualization uses D3.js SVG to create a canvas for each location, the location name text & counts, and circle shape, and transitions, and to retrieve csv file and process data, including filtering to most recent 7 days, group by location to get case count means.

The most important aspect for this visualization was how to use D3.js to animate the movement of the white circle across the canvas, and how to repeat the movement in an ‘endless’ loop.

The code block below hightlights use of a function that uses D3.js .on(“end”, repeat);  to loop through repeat function ‘endlessly’ so that shape is moved across canvas, and then back to original position, to move across canvas again and again. See bl.ocks.org ‘Looping a transition in v5’ example.

The duration() value is the proxy for rate in this visualization and is calculated in another function separately for each location SVG. I also added a counter that would increment an SVG text value to show each loop’s count on canvas.

// repeat transition endless loop
function repeat() {
    svgShape
    .attr("cx", 150)
    .transition()
    .duration(cycleDuration)
    .ease(d3.easeLinear)
    .attr("cx", 600)
    .transition()
    .duration(1)
    .attr("cx", 150)
    .on("end", repeat);
    
    svgTextMetric
    .text(counter + ' / ' + metric);
    counter++;
  };

This visualization was inspired by Jan Willem Tulp’s COVID-19 spreading rates and Dr James O’Donoghue’s  relative rotation periods of planets, and uses same data as Tulp’s spreading rates.

Heat maps of Canadian activity changes due to COVID-19 using Google Community Mobility Reports

During the 2020 COVID-19 pandemic in Canada I wanted to get better understanding of the geographical distribution of COVID-19 related activity changes across Canada.

Google has helpfully provided freely available global “Community Mobility Reporting” which shows Google location history change compared to baseline by country, and country sub-regions. These provide changes in activity by location categories: Workplace, Retail & Recreation, Transit Stations, Grocery & Pharmacy and Parks locations, and Residential locations. For Canada it is available by province. As of April 19, data contained daily values from Feb 15 to Apr 11.

The Community Mobility Reporting data is available as a single csv file for all countries at Google Community Mobility Report site. In addition, Google provides feature to filter for specific country or country sub regions eg state or provinces, etc and download resulting PDF format.

As the COVID-19 lockdowns occurred across Canada you would expect that people were less likely to be in public spaces and more likely to be at home. The Community Mobility Reporting location history allows us to get some insight into whether or not this happened, and if it did, to what degree and how this changed over time.

I used the Community Mobility Report data to create a D3.js heat map visualization which is described in more detail below and in this Github repository.

I also created an Excel version of this heat map visualization using Pivot Table & Chart plus conditional formatting. This Excel file, described in more detail below, is available in the Github repository.

More detail and screenshots of visualizations is provided below:

Heatmaps
Heatmaps are grids where columns represent date and rows province/territory. Each heatmap is a grid representing a single mobility report category. The grid cell colors represent value of percent change which could be positive or negative. Changes can be observed as lockdowns occurred where locations in public areas decreased relative to baseline. Inversely, residential location increased relative to baseline as people sheltered in place at their homes.

1) Heatmap created using Excel / Power Query: For this heatmap visualization the global csv data was transformed using Excel Power Query. The Excel file has two Pivot Table and Chart combos. The Excel files and Power Query M Code are in the repository. Excel files are available in Github repository.

2) Heatmap created using D3.js: For this heatmap visualization the global csv data was transformed using Excel Power Query. The heatmap visualization was created using slightly modified code from ONSvisual.

Bar charts
These were created using Excel to visualize percent change by Province/Territory and location category using Excel / Power Query. These allow comparison between provinces by date and category. This Excel / Power Query file can be used for analytical purposes to slice and dice global data by date, country, sub region 1 & 2 and category. Excel files are available in Github repository.

Choropleth map of Canada COVID-19 cases by health region using Leaflet and D3.js

During the early days of the 2020 COVID-19 pandemic in Canada, I wanted to get better understanding of the geographical distribution of COVID-19 cases across Canada.

At the time, government or news agencies were only mapping case counts by province. However Canadian provinces are so big compared to population centers that it doesn’t accurately reflect actual geographic distribution of cases. It would be better to use the provincial “health regions” which correspond much better to population centers.

So I set about to create for myself a choropleth map visualization by health regions.

View the finished choropleth map at the following link. The source data is updated daily each evening.
https://sitrucp.github.io/canada_covid_health_regions/index.html

I used Leaflet.js open-source JavaScript mapping library to create the interactive choropleth map, D3.js to retrieve and transform the csv format data, and Javascript to retrieve the JSON geographic boundary files and also to manipulate and present the data with HTML and CSS.

The COVID-19 case count data are obtained as csv file format from the “COVID-19 Canada Open Data Working Group” who are an amazing group of volunteers.  They have been tirelessly collating data from the various provincial and territory government agencies daily since early March. This group saves the collated and cleaned data as csv files in a Github repository https://github.com/ishaberry/Covid19Canada.

The health region geographical boundary descriptions are from Statistics Canada’s Statscan ArcGIS Health region boundary Canada dataset. These had very detailed boundaries so I simplified them using QGIS which also dramatically reduced the dataset size.

However, there were some data issues that needed to be addressed first. The Statscan health regions shape file boundary names are different than those used by the provincial and territory government agencies reporting the data.

The Statscan seems to have full-form ,”official” health region names, while the provincial and territory government agency names are common, more familar, short-hand names. Also, names may change over time.

Provinces may also add or remove health regions from time to time due to administrative changes or population changes etc. So either set may have health regions that the other doesn’t have.

From a data governance perspective, in a perfect world, everyone uses a single set of health region boundary names. COVID-19 reporting has made a lot of people aware of this issue which is a silver lining in the COVID-19 dark cloud!

Addressing these name differences was actually quite simple, requiring creation of a lookup table with two columns, one for each dataset, to match the names in the boundary data files to the names in the counts data file. The lookup table can then be used dynamically when getting data each time the map is refreshed. This is described in more detail in Github repository README linked below.

Code for this project is maintained in Github:  github.com/sitrucp/canada_covid_health_regions.

I also created a separate choropleth map for Montreal, where I was living at the time, which was Canada’s COVID-19 “hotspot” with about 25-30% of Canada’s total COVID-19 cases. However, the Montreal data source has since been discontinued so the map is archived now.

View archived Montreal map here:
https://sitrucp.github.io/canada_covid_health_regions/montreal/index.html

AWS S3 csv file as D3 report data source

This is an example of how to read a csv file retrieved from an AWS S3 bucket as a data source for a D3 javascript visualization.

The D3 visualization would be an HTML document hosted on a web server. 

You will use the AWS SDK to get the csv file from the S3 bucket and so you need to have an AWS S3 bucket key and secret but I won’t cover that in this post.

The key point of this post is to highlight that the bucket.getObject function data is read into D3 using  d3.csv.parse(data.Body.toString());  

Another note is that d3.csv.parse is for D3 version 3. Older versions use d3.csvParse. 

Once implemented, whenever the webpage is refreshed it retrieves latest csv file from the S3 bucket and the D3 visualization is updated.

<script src="https://sdk.amazonaws.com/js/aws-sdk-2.6.3.min.js"></script>

<script type="text/javascript">

// aws key and secret (note these should be retrieved from server not put as plain text into html code)
AWS.config.accessKeyId = 'xxxxxxxxxxxxxxxxxxxxxxx';
AWS.config.secretAccessKey = 'xxxxxxxxxxxxxxxxxxxxxxx';
AWS.config.region = 'us-east-1';

// create the AWS.Request object
var bucket = new AWS.S3();

// use AWS SDK getobject to retrieve csv file
bucket.getObject({
    Bucket: 'my-S3-bucket', 
    Key: 'myfile.csv'
}, 

// function to use the data retrieve 
function awsDataFile(error, data) {
    if (error) {
        return console.log(error);
    }

        // this where magic happens using d3.csv.parse to read myCSVdata.Body.toString()
    myCSVdata = d3.csv.parse(data.Body.toString()); 

        // now loop through data and get fields desired for visualization 
    var counter = 0;
    myCSVdata.forEach(function(d) {
            d.field1= +d.field1;
            d.field2= +d.field2;
            countLoop = counter++;
    });

        // now you can create rest of D3 vizualization here 
        // for example like this example https://gist.github.com/d3noob/4414436

        my D3 vizualization code here

// this closes bucket.getObject 
});

</script>

 

 

Visualizing human arterial blood flow with a D3.js sankey chart

This Sankey chart represents human arterial blood flow from the heart down into the smallest named arteries. I couldn’t find anything else like this online so spent a few hours cleaning up data and creating this visualization.

A Sankey chart visualizes directional connections between “nodes”. In this case the nodes are the artery names and the connections are flow of blood through the arteries.

You can view the complete Sankey chart here.  It is quite big so take a look at the zoomed out screenshot of it below to get an overview of it first.  The data used to create the visualization can also be viewed in tabular format here.

arterial

Two snippets below from the visualization’s top level branches highlight where blood flows from heart to body and lungs. The grey connections represent blood flow. The colored ‘blocks’ represent the arteries.

You are able to click on the nodes which will highlight the connections and blood flow. This is a great way to see how blood flows from artery to artery.

The data was sourced from this UAMS web page 
http://anatomy.uams.edu/anatomyhtml/arteries_alpha.html.

Quite a bit of cleaning was required to get the data ready to be used by D3 to create the Sankey charts. Some of the data transformations required are described below:

    • Edited artery names to remove commas eg, alveolar, anterior superior to anterior superior alveolar. This was done using Excel and VBA.
    • Creating new table rows to link arteries with their sources eg, Left Ventricle is source for Aorta.
    • Creating new table rows for both right and left arterial pairs. This was done by duplicating entire table, to create two sets (one for left, one for right) then manually editing to remove where there are no pairs eg Aorta.
    • Occasionally modified artery name by adding its source artery name to differentiate name from others. Eg Sigmoid Ascending Br. instead of Ascending Br. This was done because sankey.js requires unique node names and some arteries had same name in data source eg Ascending Br.
    • Created new table rows to include arterial branches. Used Excel and VBA to separate branches separated by comma into new rows.

Some interesting observations about the artery data used to create chart and possibilities to extend the data and chart:

    • There are missing arteries or errors in naming and linkages so please don’t study for your anatomy exam using this : )
    • The Sankey chart termination nodes are major arteries. However,  the UAMS arterial tables indicates that ‘unnamed branches’ are actually the terminating nodes. If I included these, I would have had to create unique names for each ‘unnamed’ arterial reference but I was too lazy to do that 🙂
    • The termination nodes are assumed to be capillaries eg other than the ‘unnamed’ arteries, the blood flows into capillaries, then goes back to heart via the Venous system.
    • I was surprised to learn that in reality there can be a large number of ‘arterial anastomoses’ where hierarchically unrelated arteries join together. I didn’t include any of these.
    • The ‘other half’ of the body’s blood flow eg the Venous is not included in this chart. One day I might revisit and include the Venous system.
    • It would be cool to add accurate blood flow ‘value’ to each artery-source connection which would be an additional column in the data table for blood flow volume for each artery.
      • The Sankey chart would represent these volumes by varying artery-source connection thickness.
      • These would be proportionate values. For example the blood flow from Left Ventricle to Aorta (Ascending Aorta) would be 100% or whatever relative numerical value we could assign (I used 10). From that point, say 30% goes to brachiocephalic trunk and 70% continues into Aortic Arch and down into the body, and so on with each further branching into other arteries.
    • It would be even more interesting to setup a dynamic system that changes these blood flow volumes based on heart rate.
      • Could also represent blood flow changes, for example, if a femoral artery were cut, what would be the effect of flow on rest of system?
      • The Sankey chart values could change to reflect changes. We could change artery-source link color eg red (increase), green (normal) or yellow(decrease) to indicate increase/decrease in blood flow resulting from these system perturbations.

The Sankey chart was created using a D3 JavaScript library Sankey plugin.

This arterial Sankey chart includes some additional D3 JavaScript that modified the chart formatting and includes feature to highlight node links and to allow node drag and drop re-positioning. You can view the JavaScript for my Sankey chart simply by viewing the Sankey chart’s page source.

The code used to create this is available in this Github repository.