The Province of British Columbia publishes public agency executive compensation each year. The data is grouped by:
- Sector – Employers are categorized in one of six sectors – Colleges, Crown Corporations, Education, Health, Public Service, Universities
- Employer – Employers belong to one Sector. There are about 190 employers such as BC Assessment Authority, BC Housing Management Commission, BC Hydro, etc
- Employee – An employee belongs to one Employer and they publish the full name of employee.
The employee’s salary is given as:
- Base Salary
- Total Compensation
The data is provided each year as pdf and csv data files. I downloaded csv data files for each year from 2012 to 2016 and combined them into one dataset.
Here is link to Tableau Public Dashboard visualizing the data and the chart is embedded below.
I used Python to combine the data. Some of the Sector values where missing spaces, one of the years had a “$” symbol in salary column values which had to be removed, and I added a new year column based on the data file’s name.
Here is the Python code used to combine and clean up the data:
import csv import pandas as pd from datetime import date, datetime import glob import os def main(): ## path for data files data_path = 'C:/bc_salary_data/' allFiles = glob.glob(data_path + "*.csv") frame = pd.DataFrame() list_ =  for file_ in allFiles: df = pd.read_csv(file_,index_col=None, header=0) df['filename'] = os.path.basename(file_).split('.') df['Sector'] = df['Sector'].str.replace(' ','').str.replace('CrownCorporations', 'Crown Corporations').str.replace('CrownCorporation', 'Crown Corporations').str.replace('PublicService', 'Public Service') df['year'] = df['filename'].str[-4:] df['Base Salary'] = df['Base Salary'].apply(str).str.replace('$','') df['Total Compensation'] = df['Total Compensation'].apply(str).str.replace('$','') list_.append(df) frame = pd.concat(list_) ## Create csv file to write results to frame.to_csv('merged.csv') print str(datetime.now()) + ' data files merged' return if __name__ == '__main__': main()