How to use Hadoop HDFS as a datasource for Power BI

I have a local Hadoop server running on my Windows laptop. To connect Power BI to this Hadoop file system I used Hadoop’s WebHDFS REST API which is very well documented.

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html

The WebHDFS API specifies a url in which you specify the Hadoop root file system directory that you want to get files from as follows. In my case I had folder named “myhdfsfolder“:

http://localhost:50070/webhdfs/v1/myhdfsfolder?op=LISTSTATUS

The myhdfsfolder folder contained a csv file which had previously been imported into Hadoop.

In Power BI, select “Get Data”, “Other”, then “Hadoop File (HDFS)” which will pop a window that asks you to enter a url. The url requested is the WebHDFS REST API url specified above.

After entering your url and clicking ok you should see a list of objects that are in the Hadoop directory. In my case I can see my csv file along with bunch of Hadoop log files.

Since I only want the csv file, I clicked the Power BI Edit button so I could filter the list to only my csv file. This part is just standard Power BI/Power Query stuff. If you had multiple csv files, you could filter to csv file types, or some other feature of the file name, or file attribute such as create date etc.

After filtering, only the csv file I am interested in shows in the list. There is one more step to do which is also just standard Power BI/Power Query stuff. Since the file is a binary file you first need to extract its contents. To do this, select the Content column double down arrow selector. This will extract the binary file content’s into a new table which is the data you want.

Now you can continue transforming the csv file data in Power BI/Power Query or just load the query to start using the file’s data in Power BI data modeling and reporting!

Django forms have some complexity when first encountered. Recommend following any basic Django forms tutorial, and applying to your specific…

Ah yes that does help to clear it up. However, I’m facing an issue now that whenever my form is…

Hey NS, The data is saved to database by the form post action eg starting with: if request.method == 'POST':.…

Hi Curtis, I’m relatively new to django and I’m trying to create a Django application where i can get a…

Looks like it is working fine but my code was missing a line break at the end of if then…

Related Posts

Leave a Comment Cancel Reply