@SirPatStew #ASonnetADay dashboard

Sir Patrick Stewart @SirPatStew was doing Shakespeare Sonnet readings from his home during the COVID-19 lockdown and they were really good.

I wanted to track the sonnet reading tweets’ like and retweet counts over time and show this to other @SirPatStew fans. I also thought it would be very helpful to provide an automatically updated list of links to the sonnet tweets.

So I created a daily automated job that retrieved Sir Patrick’s Twitter data, saved it in the cloud. I also created a web page that included a visualization showing the #ASonnetADay hashtag tweets like and retweet counts (screenshot below) along with a table listing the tweets in chronological order  with like and retweet counts and as well as a link to the tweet.

@SirPatStew finished has long finished posting new #ASonnetADay tweets and his tweets continue to get visitors and like and retweet counts continue increasing.  The automated daily job is still ongoing and the visualization continues to be updated.

View visualization: https://sitrucp.github.io/sir_pat_sonnet_a_day_tweets/

Code hosted on Github: https://github.com/sitrucp/sir_pat_sonnet_a_day_tweets

Data was retrieved from Twitter using the Twitter API and Tweepy and the visualization was created using Plotly.js dot plot and is hosted on Github.io

The Tweet data was used to create the following:

    • A categorical dot plot of each sonnet’s tweet like and retweet counts.
    • A tabular list of #ASonnetADay tweets with links to tweet to allow others to easily find and watch them.

 

Twitter Followers and List Membership Tracking

I have created my own historical reporting of my Twitter account followers and list memberships. I have hosted the reporting on this open webpage.

Most people know what followers are but the list memberships aren’t as well known so here are definitions of them.

Twitter Followers – Twitter followers are other Twitter accounts who follow my account. A Twitter Follower may follow my account for a few days and then unfollow me at any time afterwards.

Twitter List Memberships – Twitter accounts can also create their own Lists and then add other Twitter accounts to the lists. For example a Twitter account may create a List called “Tweet about Big Data” to track Twitter accounts they believe “Tweet about Big Data”. Twitter accounts may add my account to their List and remove it at any time afterwards.

The Twitter data retrieval, writing to database, and data querying are all done on a web server.

In order to record changes in counts of these you need to have daily historical data. The Twitter API doesn’t provide historical data so you need create it yourself by retrieving and saving daily data somewhere.

Three Python scripts using Twitter API, Python Tweepy and AWS SDK are scheduled to run daily using cron jobs.

Two scripts retrieve followers and list memberships and insert the data into a PostgreSQL database. This daily retrieval builds the daily history.

Another script queries the database table to create reporting datasets of new, active and dropped followers and list memberships that are exported as csv files to a AWS S3 folder which also has files for a AWS S3 hosted static website.

The AWS S3 hosted static website uses Chart.js and D3.js Javascript charting libraries to read and visualize the data. This post does not describe how to read the csv files but I have written another post that describes this AWS S3 csv file as D3 report data source

Screenshots of visualizations are shown below.

Daily Follower Counts

Weekly New and Dropped Follow Counts

Follower Duration Days

The Python code to retrieve the Twitter data, transform it and create csv files and upload them to AWS is below.

The code to retrieve Twitter followers:

import sys, os
import csv
from datetime import datetime, date
import tweepy
from dateutil import tz
import psycopg2

## get secrets
sys.path.insert(0, '/home/secrets/')
from secrets import secrets
from twitter_keys_tokens import twitter_api_keys

conn = psycopg2.connect(
    host = secrets['host'],
    dbname = secrets['dbname'],
    user = secrets['user'],
    password = secrets['password']
    )
cursor = conn.cursor()

## twitter consumer key and secret
consumer_key = twitter_api_keys['consumer_key']
consumer_secret = twitter_api_keys['consumer_secret']

#get twitter auth
auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

today = datetime.now().date()

def get_followers():

    twitter_followers = []
    
    for user in tweepy.Cursor(api.followers, screen_name="twitterusername").items():
        twitter_followers.append(user)
        
    for user in twitter_followers:
        query = "INSERT INTO twitter_followers \
                (date, \
                follower_id, \
                name, \
                screen_name) \
                VALUES \
                (%s, %s, %s, %s)";
        data = (today,
                str(user.id).strip(),
                str(user.name.encode('utf8','ignore')).replace(',','').strip(),
                str(user.screen_name.encode('utf8','ignore')).strip()
                )
        cursor.execute(query, data)
        conn.commit()

    conn.close()
    
    ## print job status to log
    print str(datetime.now()) + ' twitter followers'    
            
if __name__ == '__main__':
	get_followers()

The code to retrieve Twitter list memberships:

import sys, os
import csv
from datetime import datetime, date
import tweepy
from dateutil import tz
import psycopg2

## get database creds
sys.path.insert(0, '/home/secrets/') 
from secrets import secrets
from twitter_keys_tokens import twitter_api_keys

conn = psycopg2.connect(
    host = secrets['host'],
    dbname = secrets['dbname'],
    user = secrets['user'],
    password = secrets['password']
    )
cursor = conn.cursor()

## twitter consumer key and secret
consumer_key = twitter_api_keys['consumer_key']
consumer_secret = twitter_api_keys['consumer_secret']

#get twitter auth
auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

today = datetime.now().date()

def get_list_memberships():

    twitter_list_memberships = []
    
    for list in tweepy.Cursor(api.lists_memberships, screen_name="twitterusername").items():
        twitter_list_memberships.append(list)
        #print list.full_name
        
    for list in twitter_list_memberships:

        query = "INSERT INTO twitter_list_memberships \
                (date, \
                id_str, \
                name, \
                full_name, \
                member_count, \
                subscriber_count, \
                mode, \
                following, \
                user_screen_name) \
                VALUES \
                (%s, %s, %s, %s, %s, %s, %s, %s, %s)";
        data = (today,
                list.id_str.encode('utf8','ignore'),
                list.name.encode('utf8','ignore'),
                list.full_name.encode('utf8','ignore'),
                list.member_count,
                list.subscriber_count,
                list.mode,
                list.following,
                list.user.screen_name.encode('utf8','ignore'))
        cursor.execute(query, data)
        conn.commit()
        
    conn.close()
    
    ## print status for log
    print str(datetime.now()) + ' twitter list memberships'    
            
if __name__ == '__main__':
	get_list_memberships()

The code to create csv files and upload to AWS S3 bucket:

from boto.s3.connection import S3Connection
from boto.s3.key import Key
from datetime import datetime
from os import listdir
from os.path import isfile, join
import sys
import csv
import psycopg2
import re
from collections import Counter

upload_path = '/home/twitter/aws_s3_site/'

sys.path.insert(0, '/home/secret/')
from aws_keys import aws_keys
from secrets import secrets

## create aws S3 connection
conn = S3Connection(aws_keys['AWS_KEY'], aws_keys['AWS_SECRET'])
bucket = conn.get_bucket('twitter-bucket', validate=False)

## create db connection
conn = psycopg2.connect(
    host = secrets['host'],
    dbname = secrets['dbname'],
    user = secrets['user'],
    password = secrets['password']
    )
cur = conn.cursor()

## get data sets from db

## followers
cur.execute ("select something here;")
followers = list(cur.fetchall())

## lists
cur.execute ("select something here;")
lists = list(cur.fetchall())

conn.close()

## write db data to csv files, save in upload folder

## followers
with open(upload_path + 'followers.csv', 'wb') as file:
    writer = csv.writer(file, delimiter = ',', lineterminator='\n')
    for row in followers:
        writer.writerow(row)

## lists
with open(upload_path + 'lists.csv', 'wb') as file:
    writer = csv.writer(file, delimiter = ',', lineterminator='\n')
    for row in lists:
        writer.writerow(row)
            	
## upload csv files to S3 twitter-bucket
    
upload_files = [f for f in listdir(upload_path) if isfile(join(upload_path, f))]

# delete existing bucket keys to reset
for key in bucket.list():
    if '.csv' in key:
        bucket.delete_key(key)
  
# complete uploading to AWS
for file in upload_files:
    k = Key(bucket)
    k.key = file
    k.set_contents_from_filename(upload_path + file)
    
## write success to cron log
print str(datetime.now()) + ' aws s3 twitter upload'

Canadian TCS FDI Officers Twitter list member analysis

Blog post updated to add the following:

    • Updated Python “TCS Members Details” code to get additional information from the List members’ profiles.
    • New Python code “TCS Members Tweets” to download all List members’ Tweets.

————————————————–

The Canadian Trade Commissioner Service maintains a Twitter List named CDN TCS FDI Officers that has a bunch of Canadian Trade Commissioners as members. Investment Canada also has a similar list.

I was interested to learn how many people were on these lists, how long they had been on the lists, what they were Tweeting about.  To undertake this analysis, I used the Twitter API and Python Tweepy to retrieve data about list members including:

    • screen_name
    • name
    • followers_count
    • friends_count
    • statuses_count (# of tweets)
    • favourites_count
    • created_at (Twitter account create date)
    • account_age_days (calculated as # days from July 9, 2016)
    • listed_count
    • verified
    • profile_image_url
    • profile_sidebar_fill_color
    • profile_text_color
    • profile_image_url_https
    • profile_use_background_image
    • default_profile_image
    • profile_sidebar_border_color
    • profile_background_color
    • profile_link_color

You can get find complete definitions about these fields over at Twitter.

The data was output as a simple csv file that I used to create a Tableau Public dashboard which is embedded below. The csv file is attached here.

The Tableau dashboard is interactive and can be sorted by any of the columns by using the sort icon which appears when you hover over the top of column as illustrated in screenshot below.

tcs_click_to_sort

 

 

 

 

It would be interesting to try to determine if this Twitter activity has measurable impacts on FDI and Canadian Trade.  For example perhaps foreign investment finds it way to Canada after reading Tweet by one of our Trade Commissioners.

This would require that TCS maintains a CRM (client relationship manager) system and process that records lead sources.

There is some disparity between use of Twitter by the CDN TCS FDI Officers list members as shown by Tweets/Day which total Tweets divided by # days since the account was created. If there is a measurable lift in lead generation by Twitter use then this would be actionable metric.

For the technically minded the Python code is shown below. Note that you need an API account to use with this code.

There is another file tweet_auth import not shown here that contains Twitter OAuth credentials that looks like the following. You just have to replace the ‘xxx…’ with your credentials.

Here is the “TCS Member Details” code:

 

 #twitter api oauth credentials - replace with yours
consumer_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
consumer_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token = 'xxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
  import sys, os
    import csv
    from datetime import datetime, date
    import tweepy
    from dateutil import tz

    ## get twitter auth key file
    sys.path.insert(0, 'path/to/folder/with/your/twitter_creds/')
    from ppcc_ca_app_key import keys

    ## this is consumer key and secret from the ppcc-ca app
    consumer_key = keys['consumer_key']
    consumer_secret = keys['consumer_secret']

    ## don't need to access token bc not tweeting on this timeline, just reading #access_token = keys['access_token']
    #access_token_secret = keys['access_token_secret']

    ## get twitter auth
     #auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
    #auth.set_access_token(access_token, access_token_secret)
    a
    pi = tweepy.API(auth)

    today = datetime.now().date()
            
    tcs_list_members = tweepy.Cursor(api.list_members, 'invest_canada', 'cdn-tcs-fdi-officers')

    member_details_csv = csv.writer(open('tcs_member_details.csv', 'wb'))
    member_details_csv.writerow([
        'screen_name',
        'name',
        'followers_count',
        'friends_count',
        'statuses_count',
        'favourites_count',
        'created_at',
        'account_age_days',
        'time_zone',
        'listed_count',
        'profile_image_url',
        'profile_sidebar_fill_color',
        'profile_text_color',
        'profile_image_url_https',
        'profile_use_background_image',
        'default_profile_image',
        'verified',
        'profile_sidebar_border_color',
        'profile_background_color',
        'profile_link_color'])

    members = []
    member_tweets = []
        
    for member in tcs_list_members.items():
        
        member_details_csv.writerow([
            member.screen_name.encode('utf-8', 'ignore'),
            member.name.encode('utf-8', 'ignore'),
            member.followers_count,
            member.friends_count,
            member.statuses_count,
            member.favourites_count,
            member.created_at,
            (today-member.created_at.date()).days,
            member.time_zone,
            member.listed_count,
            member.profile_image_url,
            member.profile_sidebar_fill_color,
            member.profile_text_color,
            member.profile_image_url_https,
            member.profile_use_background_image,
            member.default_profile_image,
            member.verified,
            member.profile_sidebar_border_color,
            member.profile_background_color,
            member.profile_link_color
            ])
            

It would be interesting to see what Tweet topics, other Twitter user mentions, links to webpages, etc. So the next step is to loop through each of the list member’s api.user_timeline to retrieve their Tweet content and do some analysis on them.  For now here is the code and some analysis and visualization in Tableau later.

Here is the “TCS Members’ Tweets” Python code:

    import sys, os
    import csv
    from datetime import datetime, date
    import tweepy
    from dateutil import tz

    ## get twitter auth key file
    sys.path.insert(0, '/path/to/your/folder/with/twitter_creds/')
    from ppcc_ca_app_key import keys

    ## this is consumer key and secret from the ppcc-ca app
    consumer_key = keys['consumer_key']
    consumer_secret = keys['consumer_secret']

    ## don't need to access token bc not tweeting on this timeline, just reading
    #access_token = keys['access_token']
    #access_token_secret = keys['access_token_secret']

    ## get twitter auth
    #auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
    #auth.set_access_token(access_token, access_token_secret)
    #api = tweepy.API(auth)

    api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

    today = datetime.now().date()

    def get_list_member_tweets():
        ## get invest_canada and cdn-tcs-fdi-officers list members
        tcs_list_members = tweepy.Cursor(api.list_members, 'invest_canada', 'cdn-tcs-fdi-officers')
      
        ## open csv file 
        member_tweets_csv = csv.writer(open('tcs_member_tweets.csv', 'wb'))
        
        ## write header row column names
        member_tweets_csv.writerow([
            'status_id',
            'date_time_UTC',
            'screen_name',
            'hashtags',
            'user_mentions',
            'tweet_text'
            ])
        
        ## loop through list members and get their tweets
        for member in tcs_list_members.items():
            ## get list member tweets
            member_tweets = get_member_tweets(member.screen_name)
            
            for status in member_tweets:
                
                ## check tweets for hashtags
                if status.entities['hashtags']:
                        hastags=[]
                        for hashtag in status.entities['hashtags']:
                            hastags.append(hashtag['text'].encode('utf8','ignore'))
                
                ## check tweets for user_mentions
                if status.entities['user_mentions']:
                        user_mentions=[]
                        for user_mention in status.entities['user_mentions']:
                            user_mentions.append(user_mention['screen_name'].encode('utf8','ignore'))
                
                ## write to csv file      
                member_tweets_csv.writerow([
                    status.id,
                    str(status.created_at.replace(tzinfo=tz.gettz('UTC')).astimezone(tz.gettz('America/Los_Angeles')).replace(tzinfo=None)),
                    member.screen_name,
                    hastags,
                    user_mentions,
                    status.text.replace('\n',' ').replace('\r',' ').encode('utf8','ignore')
                    ])

    def get_member_tweets(screen_name):
        ## get list member's tweets
        
        alltweets = []

        ## can only get max 200 tweets at a time
        new_tweets = api.user_timeline(screen_name = screen_name, count=200)
        alltweets.extend(new_tweets)

        ## get oldest tweet already retrieved
        oldest = alltweets[-1].id - 1

        ## iteratively get remaining tweets
        while len(new_tweets) > 0:
            new_tweets = api.user_timeline(screen_name = screen_name, count=200, max_id=oldest)
            alltweets.extend(new_tweets)
            oldest = alltweets[-1].id - 1
        
        ## print out member and # tweets retrieved to show progress
        print screen_name + " %s tweets downloaded" % (len(alltweets))
        
        ## return all tweets
        return alltweets

    if __name__ == '__main__':
        get_list_member_tweets()

Full code on Github:

https://github.com/sitrucp/tcs_fdi_list_twitter_data

Tracking Cuba Gooding Jr’s Twitter follower count

Happened to see Cuba Gooding Jr’s first tweet about 30 minutes or so after he created it.

Update: @cubagoodingjr is no longer active so not getting tweets from it any longer

At the time his profile said he had 559 followers. A few minutes later I refreshed his profile and saw the follower count had increased to 590 and every few minutes it kept increasing. By the end of the day he had about 4,000 followers.

I thought it would be interesting to track how his follower count changed going forward. So I used the Twitter API to get his follower count once per day, save the data, and then created a web page to visualize his follower count per day.

cubatweet

After 2 days Cuba had about 7,000 followers which averaged out to about 175 new followers per hour.  However, the number of new followers slowed down quickly to 30 or so new followers per day and after about 3 months he only gets about 10 new followers per day.  In fact, some days, he has net loss of followers, eg more people unfollow than him, than new follows on that day.

For the technically inclined, I setup an automatic retrieval of Cuba’s Tweets using Twitter’s API and the Tweepy Python module scheduled to run daily using a cron job.

The follower counts get put into a database. I created a PHP web page application to retrieve the data from the database, and create a web page that includes the Google Charts API to create a simple line chart to show Cuba’s regularly updated follower count by day.

You can get the cron job and PHP web page code from my  Github repository. 

If you want to run this code yourself you will need a Twitter developer account and an OAuth file.

How to setup and and use a Twitter API application

The Twitter API allows you to read and write tweets on your own Twitter account.

But you can also create an ‘app’ that can be granted permission by other Twitter accounts to access their information and even tweet on their behalf.

If you have ever granted this permission you will see the app in your Twitter account’s Settings, Apps options where you can see what apps have been granted permissions as well as revoke permissions.

This comes in handy if you have multiple Twitter accounts and you want to use the Twitter API to tweet to all of them but do not want to create multiple Twitter apps. Instead you can create one Twitter app and grant it permission to use all of the other Twitter accounts.

This is also very handy because a Twitter app must be associated with a mobile phone number. It is rare that you will have multiple mobile phone numbers.

Note there is nothing special about the ‘dev’ account. It is a normal Twitter account but you just choose it in which to create the Twitter app that will tweet on all of your accounts’ behalves.

Step 1. Make sure your ‘dev’ account has a mobile phone on the account otherwise you cannot proceed.

Step 2. While logged into your dev account, open your web browser in another tab to https://apps.twitter.com/ which will open the Twitter API app management page.

Step 3. If you haven’t already set up an API app there will be nothing to manage and it will simply show you a ‘create an app’ button. Click that button and follow instructions to setup the API app.

Step 4. Save your new API app’s consumer_key, consumer_secret, access_token, access_token_secret credentials for use in following steps.

Step 5. Log out of your dev account once the dev API app is setup and running properly.

Step 6. Log into another Twitter account that you want to add the dev account API app as a 3rd party app. You will successively log into each additional Twitter account that you want to use with dev API app with. Note you do not have to login in advance, as the code you run in next steps will popup web page to prompt you enter user and pw for Twitter account you want your new API app to work with.

Step 7. Go get this code which is what will do some magic to grant permission for your new Twitter API app to tweet on behalf of your other Twitter accounts. Save this code on your computer and keep it open in your code/text editor so you can make some changes.

Step 8. Get your app’s consumer_key, consumer_secret, access_token, access_token_secret credentials that you got from Twitter API management and paste them into the appropriate place in code you just downloaded and saved.

Step 9. Now run the code using command line interface and it will generate a URL in the command line window and show a URL that contains tokens that were generated specifically for your app and the other Twitter account. You just have to copy and paste the url into a web browser. Copy the url from command line window.

command-line-run-python-code1

 

Step 10. Paste it into a browser which also has your other twitter account logged in.  This is a crucial part of this process. The page that opens will ask you to click button to authorize this the dev app to use your other Twitter account. Click Authorize App button.

paste-url-into-browser1-authorize-app

 

Step 11.  After clicking Authorize App button you will see a web page that shows a special PIN code generated just for this one time use. I have hidden the PIN in screenshot but you will have a number where it says “PIN number is here”. Copy this PIN.

paste-url-into-browser2-get-code

Step 12. Now go back to the command line window. Python code command line also has a prompt to select ‘y’ once you have copied the PIN code and then prompt you to paste/enter the PIN code in the command line. Enter ‘y’ and hit enter and paste the PIN into the next command line.

Entering PIN code automatically registers/adds your dev API app as a 3rd party app in the Twitter account you are currently logged into and generates two strings of characters in the command line window. These are tokens you need to copy and save so you can use them later. You will need to enter them as credentials for the dev API app to read and write Tweets for the currently logged in Twitter account.

command-line-run-python-code2-paste-pin

 

If you go to your Twitter account and select Settings – Apps you will see that your dev App is now registered as an authorized app that can Tweet and read Tweets on behalf of your Twittter account.

twitter-settings-apps-new-app-just-added

 

Now its up to you to create Twitter API code to use these new authorizations. Try using Tweepy or any of the other many Twitter Python modules that are available. Happy Tweeting.

Your dev app can be added as authorized app to multiple other Twitter accounts. You just have to repeat process for each of the other Twitter accounts one by one. Simply log into each in succession (log out of previous account and then log into the next).

You will end up with a bunch of tokens that you can use to read and tweet for all of your Twitter accounts.

So just to summarize what you have done is authorized one of your Twitter accounts, which has a verified mobile phone number, to read and write Tweets for any number of other Twitter accounts you have, using the Twitter API.

For more information on Twitter API check out their developer site at https://dev.twitter.com/