Word Count program on Hadoop

36.9k Increase Timeout in SoapUI

27.3k Export from a list containing

25.5k How to create DLL file from ja

22.1k Tomcat and Eclipse Integration

20.4k java.io.FileNotFoundException:

18.9k How to create a custom Functio

prev 1 2 3 … 104 next

Word Count program on Hadoop

@kumar.abhishek

Follow Recommendations Offline Message

over 9 years ago
- 0
- 0
  Positive Vote
- 0
  Negative Vote
- 0
  Save Favourite
- 0
- 0
- 0
- 0
- 365
Comment on it
Hi this blog is to help you to learn that how to run application on Hadoop system. For this we are going to take an example of wordcount program on a data.
So first of all we need our data on which wordcount process has to be run. You can also generate it for testing purpose in gedit or any editor of your choice. I choose to get text data from here http://www.lipsum.com/

Put that file to a location of your choice let say put it in a directory hadoopTestData on Desktop.
```
cd ~/Desktop
```
create directory with name hadoopTestData
```
mkdir hadoopTestData
```
Then put your file in hadoopTestData directory. My file name is loremIpsum.

Here is a snapshot below for my file loremIpsum, whose content will be input for my wordcount appliaction.

Now execute start-all.sh to run Hadoop services.
```
start-all.sh
```
Go to Hadoop's installation directory
```
cd /usr/local/hadoop
```
Now create a directory to put data into it
```
bin/hdfs dfs -mkdir /avish
```
Now to put data into your hdfs (hadoop distributed file system)
```
bin/hdfs dfs -put ~/Desktop/hadoopTestData/LoremIpsum /avish/input
```
you need to put your local source and destination respectively as

bin/hdfs dfs -put <local src> ... <destination>

Go to localhost:50070 in your browser.

Click on Utilities and then click Browse the file system

You will find your data put into hdfs with name input.

Now you need to perform wordcount on your data.

Hadoop package provides some jars that can be useful to us. We are going to use one of them.

Input this command in your terminal to perform word count operation on your data.
```
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount /avish/data/input /avish/data/output
```
Now in All Application Dashboard you will see your application running with naming of wordcount.

After completion of the process you can view the output using the command
```
bin/hdfs dfs -cat /avish/output/*
```
And you will get your output as depicted in below image

or you can also view your answer by downloading the output from the GUI utility to browse file system where you will get option to download the output file. Download it and view the answer in any text editor of your choice.

In the output you will get the words present in your input file with their frequencies.
Tags
wordcount wordcount on hadoop wordcount example on hadoop
Comment on it

Word Count program on Hadoop

0 Comment(s)

Comment on it

Unable to start Java!! Mr. Nerd figure out why..

Positive Votes

Negative Votes

Delete Comment

Post Projects

Manage Company