Hadoop single node cluster setup in Ubuntu

36.9k Increase Timeout in SoapUI

27.3k Export from a list containing

25.5k How to create DLL file from ja

22.1k Tomcat and Eclipse Integration

20.4k java.io.FileNotFoundException:

18.9k How to create a custom Functio

prev 1 2 3 … 104 next

Hadoop single node cluster setup in Ubuntu

@kumar.abhishek

Follow Recommendations Offline Message

over 9 years ago
- 0
- 1
  Positive Vote
- 0
  Negative Vote
- 0
  Save Favourite
- 0
- 0
- 0
- 0
- 837
Comment on it
Hi, this blog is to help you to set up a single node Hadoop environment on your Linux machine.

To know about Hadoop follow these links
https://en.wikipedia.org/wiki/Apache_Hadoop
http://www.tutorialspoint.com/hadoop/
https://www.mapr.com/products/apache-hadoop

http://findnerd.com/list/view/What-is-Hadoop/14171/

You must have Java 6 (Java 7 or greater version recommended), ssh, rsync installed in order to install and use Hadoop.

Here is a link below for detailed information about which Java version to be used with Hadoop.

https://wiki.apache.org/hadoop/HadoopJavaVersions

Confirm that, correct Java version is properly installed to your system. To confirm this execute the following command in terminal.
```
java -version
```
And you will get response like the below image if java is installed to your system and ensure the version of Java, else you need a fresh installation of java.

For installing Java version of you choice please follow the below link.

http://findnerd.com/account#url=/list/view/Install-Oracle-JDK-with-apt-get/2944/

If ssh is not installed in your machine you need to install it.

To install ssh
```
sudo apt-get install ssh
```
Install rsync using following command
```
sudo apt-get install rsync
```
To allow SSH public key authentication.

First, we have to generate an SSH key
```
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
```
The above command will create an RSA key pair with an empty password. As we don't want to enter passphrase every time Hadoop interacts with its nodes, so we are going to create it using an empty password.

After this you have to enable SSH access to your local machine with this newly created key.
```
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
```
Download your desired version of Hadoop binary tarball from Apache Hadoop website.

http://hadoop.apache.org/releases.html

Now go to the directory in which tarball is downloaded using terminal.

In my case I downloaded hadoop-2.6.3.tar.gz in ~/Downloads/
```
cd ~/Downloads/
```
You need to extract the Hadoop package and put the extracted content to a location of your choice. In my case I choose /usr/local/hadoop.
```
sudo tar -zxvf hadoop-2.6.3.tar.gz
```
```
sudo mv hadoop-2.6.3 /usr/local/hadoop
```
Copy the current Java path, for this you can opt to run the given command
```
update-alternatives --config java
```
To edit your bashrc run the following command
```
sudo gedit ~/.bashrc
```
Append the following lines at last in your .bashrc file.
```
#Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-7-oracle 
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
```
Your .bashrc should look like this

Put the Java path without quotes where the above arrow indicates.
Executes the content of the file .bashrc in terminal.
```
source ~/.bashrc
```
Now comes the Hadoop's Configuration part.

Go to Hadoop's configuration directory
```
cd /usr/local/hadoop/etc/hadoop
```
Now update your hadoop-env.sh
```
 sudo gedit hadoop-env.sh
```
You have to put the path of your current Java home which you had copied in earlier step inside double quotes.

Update core-site.xml
```
sudo gedit core-site.xml
```
replace the <configuration></configuration> tag with given updated tags and save the file.
```
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:9000</value>
	</property>
</configuration>
```
The content of core-site.xml should look like the following image.

Update yarn-site.xml
```
sudo gedit yarn-site.xml
```
replace the <configuration></configuration> tag with given updated tags and save the file.
```
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value> org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
</configuration>
```
Make a copy of mapred-site.xml.template with a name of mapred-site.xml
```
sudo cp mapred-site.xml.template mapred-site.xml
```
Now edit your mapred-site.xml
```
sudo gedit mapred-site.xml
```
replace the <configuration></configuration> tag with given updated tags and save the file.
```
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>
```
Edit hdfs-site.xml
```
sudo gedit hdfs-site.xml
```
replace the <configuration></configuration> tag with given updated tags and save the file.
```
<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
	</property>
</configuration>
```
Go back to your home directory
```
cd
```
You need to create a directory for your namenode and datanode. It is used to specify the directories which will be used as the namenode and the datanode on that host.
```
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
```
```
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
```
Change ownership of the directory
```
sudo chown abhishek:abhishek -R /usr/local/hadoop
```
follow this link to learn more about changing ownership http://www.techonthenet.com/linux/commands/chown.php

Hadoop file system needs to be formatted so that we can start to use it. The format command should be issued with write permission since it creates current directory under /usr/local/hadoop/hadoop_data/hdfs/namenode folder
```
hdfs namenode -format
```
Now we can start Hadoop services.to start Hadoop services run this command.
```
 start-all.sh
```
To list the process running after excuting the above command. Run jps in your terminal
```
jps
```
You must have SecondaryNameNode, NodeManager, ResourceManager, NameNode and DataNode should be running to ensure that the installation is fine and would work for our further tasks.

Now go to the following urls to get the GUI of Hadoop
```
http://localhost:8088/
http://localhost:50070/
http://localhost:50090/
http://localhost:50075/
```
Port 8088 is for All Application on your Hadoop system,

Port 50070 is for NameNode Information

Port 50075 is for DataNode Information

Port 50090 is for Secondary NameNode Information

To stop hadoop execute the command
```
stop-all.sh
```
We can change the password which was left blank in earlier step by following these links
https://www.sophos.com/en-us/support/knowledgebase/115708.aspx
http://www.cyberciti.biz/faq/ssh-password-less-login-with-dsa-publickey-authentication/

To run word count program on your single node cluster search for Word Count program on Hadoop on FindNerd.
Tags
Hadoop hadoop single node hadoop single node installation hadoop installation on ubuntu big data
Comment on it

Hadoop single node cluster setup in Ubuntu

0 Comment(s)

Comment on it

Unable to start Java!! Mr. Nerd figure out why..

Positive Votes

good

Negative Votes

Delete Comment

Post Projects

Manage Company