Power Query is an Microsoft tool for Excel that is used to do the following:
Connect to a data source
Extract data from source
Transform the extracted data
Load the transformed data into Excel
This functionality is called ETL (Extract, Transform and Load).
Why use Power Query?
Excel already has another tab of features called Get External Data, why not use that?
Excel’s other methods for connecting to and retrieving data may still be useful, but they are not able to transform the data.
Power Query connections are non-techie, user friendly, and provide much more versatile and robust functionality.
Where is Power Query in Excel?
You can find Power Query in the Data tab Get & Transform icon group in Excel 2016
Note that in previous versions of Excel, Power Query is an add-in that has its own tab. However, in Excel 2016 Power Query is built into Excel.
•Get & Transform has 4 icons
Show Queries – this hides/shows the Query pane docked on right side of Excel.
New Query – choose data source & modify Power Query settings and options.
From Table – choose data source as currently selected Excel table.
Recent Sources – choose from recently selected data sources.
Step 1 – Connect to a data source and retrieve data
Select the New Query icon to select which type of data you want to connect to
FromFile (csv, Excel)
FromDatabase (SQL Server db, Access db)
FromOnline Services (Facebook)
From Other Sources (webpage, SharePoint List, ODBC connection)
Screenshot shows example of connecting to an Excel Workbook.
Select From File – From Workbook
Popup dialogue allows you to select an Excel file.
Step 2 – Transform the data – common usage
“Calculated columns” replace any worksheet formula requirement
Power Query has many more formulas than plain old Excel, including VBA type functionality
Power Query has been called “vlookup on steroids”
“Pivot” and “unpivot” data
Group and aggregate data into sums, min, max, averages
Simple to combine many files contents in a folder into one table.
Filter, sort, replace with complex logic.
Merge two or more separate tables of data together on matching values. If you are familiar with SQL, this is a JOIN query.
Append two or more separate tables of data into one combined table of data. If you are familiar with SQL, this is a UNION query.
Step 3 – Load the data into Excel
Power Query queries are loaded to a worksheet by default.
However, you can choose to save a query as a connection only by selecting Load To – Only Create Connection. This query can be used in other Queries eg in Merge or Append, and by other Excel features eg it can be used as datasource for a Pivot Table.
Step 4 – Click and refresh to retrieve new data
When data source(s) are updated, just click Refresh to retrieve and transform new data.
No copy & paste of data from other files.
No worry about formulas being deleted or ranges being moved.
All transformations are safely contained inside the Query.
Power Query is in Microsoft Power BI
Power Query has been integrated into Power BI as the “Get Data” feature.
Has the same functionality and features as the Power Query in Excel.
The myhdfsfolder folder contained a csv file which had previously been imported into Hadoop.
In Power BI, select “Get Data”, “Other”, then “Hadoop File (HDFS)” which will pop a window that asks you to enter a url. The url requested is the WebHDFS REST API url specified above.
After entering your url and clicking ok you should see a list of objects that are in the Hadoop directory. In my case I can see my csv file along with bunch of Hadoop log files.
Since I only want the csv file, I clicked the Power BI Edit button so I could filter the list to only my csv file. This part is just standard Power BI/Power Query stuff. If you had multiple csv files, you could filter to csv file types, or some other feature of the file name, or file attribute such as create date etc.
After filtering, only the csv file I am interested in shows in the list. There is one more step to do which is also just standard Power BI/Power Query stuff. Since the file is a binary file you first need to extract its contents. To do this, select the Content column double down arrow selector. This will extract the binary file content’s into a new table which is the data you want.
Now you can continue transforming the csv file data in Power BI/Power Query or just load the query to start using the file’s data in Power BI data modeling and reporting!
Step 1 – Download latest version of Power BI to get new features (version that came out in early August 2018)
Step 2 – Enable Python in the “Options – Preview Features” check the “Python support” option. Close and restart Power BI.
Step 3 – Click “Get Data”, select “Other” and “Python script” and click “Connect” button.
Step 4 – Write or paste Python script in the popup Python script window and click Ok button.
I chose to use Pandas to get csv file:
import numpy as np import pandas as pd data = pd.read_csv(‘C:/mydatafile.csv’, delimiter=’,’)
Note if you are using Windows as I am, you have to write file path so Windows can recognize it otherwise you will get an error. I used foreslashes but you could also escape the backslashes.
You could also write any Python script that gets or generates data from any data source eg a database, from AWS S3 buckets, etc. Of course this also includes transforming data. The important thing to note is your script must output a named dataset(s). The named dataset(s) will be thing Power BI will use when you are finished coding.
Step 5 – After clicking ok, you get the familiar Power Query data source selection window. Select Python and you will see your named data set(s). Just check the one(s) you want and click ok.
Now just continue working with Power BI as normal! You can always go back to edit the Python script by select “Source” in query editor.
A Power BI Report with multiple charts or other objects can be added to a Dashboard in Power BI Online using the pin to dashboard feature.
However this results in a dashboard with very wide margins. This is especially problematic on a mobile device as the screenshot from the Power BI Android application shows. There is a lot of wasted white space.
The Desktop app view is a bit better but there is still a lot of white space around the edges.
The resolution, until Power BI team make the margins smaller or add feature to change margin width, is to pin charts one by one to the dashboard in order to have them fill out width.
The screenshot below highlights where you can pin your report to the dashboard using different pins.
You can select an individual report’s pin (the one to the right) which will give you dashboard without the wide margins.
Using the pin on the top toolbar will add the entire report with the multiple reports to the dashboard and results in the wide margins seen above.