Use Excel Power Query to scrape & combine Wikipedia tables

Power Query is quick and easy way to scrape HTML tables on web pages. Here is step by step on getting multiple tables from Wikipedia article and appending them into one Power Query Excel table.

The Wikipedia article List of national and international statistical services has multiple tables with lists of countries’ statistics agencies and their website urls. These are separated by world region.

https://en.wikipedia.org/wiki/List_of_national_and_international_statistical_services

Step 1: Click the Power Query From Web menu icon and paste the url above into the URL address field and click OK.

stat_scrape_1

Step 2: Power Query finds and shows you a list of tables that are in this web page. Select the tables you want and click Load.

By default you can only select one of the tables.

stat_scrape_2

However you can check the Select multiple items checkbox and you can then select more than one of the tables.

stat_scrape_3

You can also select any of the tables to get a preview of the data they contain. After you have selected all the tables you want, click Load.

stat_scrape_4

After you click Load Power Query will create a new query for each of the selected tables.

stat_scrape_5

Step 3: Append all of the tables into one new combined dataset. You do this with the Table.Combine feature which is confusingly also called Append in the Power Query menu icons.  The menu icon feature combines only two tables by default, however you can simply manually edit the resulting code in the address bar to include all of the tables in the Table.Combine formula.

stat_scrape_6

After you update the Table.Combine formula you will have a new query with all of the tables combined into one dataset.

stat_scrape_7

You can refresh the queries to get any changes to the Wikipedia tables.

Here is M code for the Power Query that gets the Wikipedia table

 

let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/List_of_national_and_international_statistical_services")),
Data0 = Source{0}[Data]
in
Data0

Here is M code for the Power Query that combines the tables into one dataset:

 
let
Source = Table.Combine({#"Africa[edit]",#"Americas[edit]",#"Asia[edit]",#"Europe[edit]",#"Oceania[edit]"})
in
Source

2 thoughts on “Use Excel Power Query to scrape & combine Wikipedia tables”

  1. Hi,

    Web.Page(Web.Contents(“https://en.wikipedia.org/wiki/List_of_national_and_international_statistical_services”)),

    My source range A1 and how to I get link put Web.Page(Web.Contents. Thank.

    1. Hey Ninh,

      Did you follow steps in blog?

      If yes which step are you stuck on?

      It looks like you might not have Power Query add-in installed. You need to go to Microsoft and download Power Query add-in and install it.

      Then you can follow steps in blog post.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.