It turned out to be quite simple to install Hadoop on Windows 8.1
I have been using Hadoop Virtual Machine images from Cloudera, Hortonworks or MapR when I wanted to work with Hadoop on my laptop. However these VMs are big files and take up precious hard drive space. So I thought I would be nice to install Hadoop on my Windows’s 8.1 laptop.
After some searches I found excellent instructions on installing Hadoop on Windows 8.1 on Mariusz Przydatek’s blog.
Note that these instructions are not for complete beginners. There are lots of assumptions that the reader understands basic Windows environment configuration, Java SDK, Java applications, and building binaries from source files. So beginners will also want to look at more detailed step by step instructions as well and use Mariusz’s instructions as a guide to make sure they are doing things correctly.
The most important thing to note is that you do not need Cygwin to install Hadoop on Windows. Other tutorials and how-to blog posts that insist Cygwin is required.
Also because you need to build Hadoop on your computer you need to have MS Visual Studio so you can use trial version as you don’t need it after you build Hadoop binaries. Other tutorials and how-to blog posts have some variation on what version of MS Visual Studio you need but this blog makes it clear.
At high level, the Hadoop installation follows these steps:
- Download and extract Hadoop source files
- Download and install packages and helper software
- Make sure System Environment Path and Variables are correct
- Make sure configuration files and package paths are ok
- Execute command that uses packages and helper software to build Hadoop binaries
- Copy newly created Hadoop binaries into new Hadoop production folder
- Execute command to run the new Hadoop binaries
- Hadoop is running and ready to be used
Its pretty obvious but worth stating again that this tutorial installs only a single node Hadoop cluster which is useful for learning and development. I quickly found out that I had to increase memory limits so it could successfully run medium sized jobs.
After Hadoop is successfully installed, you can then install Hive, Spark, and other big data ecosystem tools that work with Hadoop.