Introduction to Excel Power Query aka Power BI Query

What is Power Query?

Power Query is an Microsoft tool for Excel that is used to do the following:

    • Connect to a data source
    • Extract data from source
    • Transform the extracted data
    • Load the transformed data into Excel

This functionality is called ETL (Extract, Transform and Load).

Why use Power Query?

Excel already has another tab of features called Get External Data, why not use that?

    • Excel’s other methods for connecting to and retrieving data may still be useful, but they are not able to transform the data.
    • Power Query connections are non-techie, user friendly, and provide much more versatile and robust functionality.

 

Where is Power Query in Excel?

You can find Power Query in the Data tab Get & Transform icon group in Excel 2016

Note that in previous versions of Excel, Power Query is an add-in that has its own tab. However, in Excel 2016 Power Query is built into Excel.

Get & Transform has 4 icons:

    • Show Queries – this hides/shows the Query pane docked on right side of Excel.
    • New Query – choose data source & modify Power Query settings and options.
    • From Table – choose data source as currently selected Excel table.
    • Recent Sources – choose from recently selected data sources.

Step 1 – Connect to a data source and retrieve data

Select the New Query icon to select which type of data you want to connect to

    • FromFile (csv, Excel)
    • FromDatabase (SQL Server db, Access db)
    • FromOnline Services (Facebook)
    • From Other Sources (webpage, SharePoint List, ODBC connection)

Screenshot shows example of connecting to an Excel Workbook.

    • Select From File – From Workbook
    • Popup dialogue allows you to select an Excel file.

Step 2 – Transform the data – common usage

    • “Calculated columns” replace any worksheet formula requirement
    • Power Query has many more formulas than plain old Excel, including VBA type functionality
    • Power Query has been called “vlookup on steroids”
    • “Pivot” and “unpivot” data
    • Group and aggregate data into sums, min, max, averages
    • Simple to combine many files contents in a folder into one table.
    • Filter, sort, replace with complex logic.
    • Merge two or more separate tables of data together on matching values. If you are familiar with SQL, this is a JOIN query.
    • Append two or more separate tables of data into one combined table of data. If you are familiar with SQL, this is a UNION query.

Step 3 – Load the data into Excel

    • Power Query queries are loaded to a worksheet by default.
    • However, you can choose to save a query as a connection only by selecting Load To – Only Create Connection. This query can be used in other Queries eg in Merge or Append, and by other Excel features eg it can be used as datasource for a Pivot Table.

Step 4 – Click and refresh to retrieve new data

    • When data source(s) are updated, just click Refresh to retrieve and transform new data.
    • No copy & paste of data from other files.
    • No worry about formulas being deleted or ranges being moved.
    • All transformations are safely contained inside the Query.

Bonus: Power Query is also in Microsoft Power BI

    • Power Query has been integrated into Power BI as the “Get Data” feature.
    • Has the same functionality and features as the Power Query in Excel.
    • Same programming language ‘M’.

 

How to use Hadoop HDFS as a datasource for Power BI

I have a local Hadoop server running on my Windows laptop. To connect Power BI to this Hadoop file system I used Hadoop’s WebHDFS REST API which is very well documented.

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html

The WebHDFS API specifies a url in which you specify the Hadoop root file system directory that you want to get files from as follows. In my case I had folder named “myhdfsfolder“:

http://localhost:50070/webhdfs/v1/myhdfsfolder?op=LISTSTATUS

The myhdfsfolder folder contained a csv file which had previously been imported into Hadoop.

In Power BI, select “Get Data”, “Other”, then “Hadoop File (HDFS)” which will pop a window that asks you to enter a url. The url requested is the WebHDFS REST API url specified above.

After entering your url and clicking ok you should see a list of objects that are in the Hadoop directory. In my case I can see my csv file along with bunch of Hadoop log files.

Since I only want the csv file, I clicked the Power BI Edit button so I could filter the list to only my csv file. This part is just standard Power BI/Power Query stuff. If you had multiple csv files, you could filter to csv file types, or some other feature of the file name, or file attribute such as create date etc.

After filtering, only the csv file I am interested in shows in the list. There is one more step to do which is also just standard Power BI/Power Query stuff. Since the file is a binary file you first need to extract its contents. To do this, select the Content column double down arrow selector. This will extract the binary file content’s into a new table which is the data you want.

Now you can continue transforming the csv file data in Power BI/Power  Query or just load the query to start using the file’s data in Power BI data modeling and reporting!

How to use Python as a datasource for Power BI

Step 1 – Download latest version of Power BI to get new features (version that came out in early August 2018)

Step  2 – Enable Python in the “Options – Preview Features” check the “Python support” option. Close and restart Power BI.

Step 3 – Click “Get Data”, select “Other” and “Python script” and click “Connect” button.

Step 4 – Write or paste Python script in the popup Python script window and click Ok button.

I chose to use Pandas to get csv file:

import numpy as np
import pandas as pd
data = pd.read_csv(‘C:/mydatafile.csv’, delimiter=’,’)

Note if you are using Windows as I am, you have to write file path so Windows can recognize it otherwise you will get an error. I used foreslashes but you could also escape the backslashes.

You could also write any Python script that gets or generates data from any data source eg a database, from AWS S3 buckets, etc. Of course this also includes transforming data. The important thing to note is your script must output a named dataset(s). The named dataset(s) will be thing Power BI will use when you are finished coding.

Step 5 – After clicking ok, you get the familiar Power Query data source selection window. Select Python and you will see your named data set(s). Just check the one(s) you want and click ok.

Now just continue working with Power BI as normal! You can always go back to edit the Python script by select “Source” in query editor.

How to update Office 365 password in Power BI dataset refresh

I recently changed my Office 365 user password for an account that I was using for a Power BI Dataset Scheduled Refresh.

The result was that my Power BI Refresh failed which looked like screenshot below.

powerbi sharepoint dataset - update password-01

So all I had to do was update the authentication Power BI was using to access my Office 365 Sharepoint folder.

It was clear that the Edit credentials link was where I needed to update the password.

That link got me the following page where I selected oAuth2 which is referring to the authentication that Power BI uses with my Office 365 user credentials.

powerbi sharepoint dataset - update password-02

Selecting oAuth2 popped a new browser window where I could enter my Office 365 user and new password and authenticate Power BI.

powerbi sharepoint dataset - update password-1

After clicking Sign In I was returned to the Power BI page and the credential errors above were gone and I could successfully refresh the dataset from my Office 365 Sharepoint files.

powerbi sharepoint dataset - update password-4

The Refresh Schedule log showed the previous failed refresh attempt and the just completed successful refresh. Back in business!

powerbi sharepoint dataset - update password-3

How to avoid wide margins on a Power BI dashboard

A Power BI Report with multiple charts or other objects can be added to a Dashboard in Power BI Online using the pin to dashboard feature.

However this results in a dashboard with very wide margins. This is especially problematic on a mobile device as the screenshot from the Power BI Android application shows. There is a lot of wasted white space.

wide margins

The Desktop app view is a bit better but there is still a lot of white space around the edges.

wide margins desktop

 

The resolution, until Power BI team make the margins smaller or add feature to change margin width, is to pin charts one by one to the dashboard in order to have them fill out width.

The screenshot below highlights where you can pin your report to the dashboard using different pins.

You can select an individual report’s pin (the one to the right) which will give you dashboard without the wide margins.

Using the pin on the top toolbar will add the entire report with the multiple reports to the dashboard and results in the wide margins seen above.
pin to dashboard