Hi this blog is to help you to learn that how to run application on Hadoop system. For this we are going to take an example of wordcount program on a data.
So first of all we need our data on which wordcount process has to be run. You can also generate it for testing purpose in gedit or any editor of your choice. I choose to get text data from here http://www.lipsum.com/
Put that file to a location of your choice let say put it in a directory hadoopTestData on Desktop.
cd ~/Desktop
create directory with name hadoopTestData
mkdir hadoopTestData
Then put your file in hadoopTestData directory. My file name is loremIpsum.
Here is a snapshot below for my file loremIpsum, whose content will be input for my wordcount appliaction.
Now execute start-all.sh to run Hadoop services.
start-all.sh
Go to Hadoop's installation directory
cd /usr/local/hadoop
Now create a directory to put data into it
bin/hdfs dfs -mkdir /avish
Now to put data into your hdfs (hadoop distributed file system)
bin/hdfs dfs -put ~/Desktop/hadoopTestData/LoremIpsum /avish/input
you need to put your local source and destination respectively as
bin/hdfs dfs -put <local src> ... <destination>
Go to localhost:50070 in your browser.
Click on Utilities and then click Browse the file system
You will find your data put into hdfs with name input.
Now you need to perform wordcount on your data.
Hadoop package provides some jars that can be useful to us. We are going to use one of them.
Input this command in your terminal to perform word count operation on your data.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount /avish/data/input /avish/data/output
Now in All Application Dashboard you will see your application running with naming of wordcount.
After completion of the process you can view the output using the command
bin/hdfs dfs -cat /avish/output/*
And you will get your output as depicted in below image
or you can also view your answer by downloading the output from the GUI utility to browse file system where you will get option to download the output file. Download it and view the answer in any text editor of your choice.
In the output you will get the words present in your input file with their frequencies.
0 Comment(s)