What is Hadoop?

15.4k Sort and align duplicates valu

12.6k Different Techniques Used for

12.4k Grouping columns and rows in E

10.1k How can Internet of Things imp

9.98k 10 Awesome High-Tech Gadgets f

7.38k Whats EIP, do I need it?

prev 1 2 3 … 41 next

What is Hadoop?

@rohit.arora

Follow Recommendations Offline Message

about 10 years ago
- 0
- 1
  Positive Vote
- 0
  Negative Vote
- 0
  Save Favourite
- 1
- 0
- 0
- 0
- 522
Comment on it
Hadoop is a free, Java-based programming framework that is used for the processing of large data sets in a distributed computing environment.

Apache Hadoop at the core consists of a storage part, known as Hadoop Distributed File System (HDFS), and MapReduce as the processing part. Hadoop divides files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.

Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was an Yahoo! employee.

Hadoop enables applications to run on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates fast data transfer rates among nodes and allows the system to continue running in case of a node failure. It lowers the risk of massive system failure, even if a significant number of nodes goes down.

Applications of Hadoop:
- Analytics for marketting
- Machine learning and data mining
- Image processing
- XML Message processing
Hadoop framework is used by several companies including Google, Yahoo and IBM.
Tags
Hadoop Distributed File System Hadoop Analytics Map Reduce
Comment on it

1 Comment(s)

@Datflair

Follow Recommendations Offline Message

almost 8 years ago
- 0
- 0
- 0
  Report spam
- 0
- 0
Hadoop is an open-source tool from the ASF – Apache Software Foundation. Open source project means it is freely available and we can even change its source code as per the requirements. If certain functionality does not fulfill your need then you can change it according to your need. Most of Hadoop code is written by Yahoo, IBM, Facebook, Cloudera.

It provides an efficient framework for running jobs on multiple nodes of clusters. Cluster means a group of systems connected via LAN. Apache Hadoop provides parallel processing of data as it works on multiple machines simultaneously.

By getting inspiration from Google, which has written a paper about the technologies. It is using technologies like Map-Reduce programming model as well as its file system (GFS). As Hadoop was originally written for the Nutch search engine project. When Doug Cutting and his team were working on it, very soon Hadoop became a top-level project due to its huge popularity.

Apache Hadoop is an open source framework written in Java. The basic Hadoop programming language is Java, but this does not mean you can code only in Java. You can code in C, C++, Perl, Python, ruby etc. You can code the Hadoop framework in any language but it will be more good to code in java as you will have lower level control of the code.

Big Data and Hadoop efficiently processes large volumes of data on a cluster of commodity hardware. Hadoop is for processing huge volume of data. Commodity hardware is the low-end hardware, they are cheap devices which are very economical. Hence, Hadoop is very economic.

Hadoop can be setup on a single machine (pseudo-distributed mode, but it shows its real power with a cluster of machines. We can scale it to thousand nodes on the fly ie, without any downtime. Therefore, we need not make any system down to add more systems in the cluster. Follow this guide to learn Hadoop installation on a multi-node cluster.

Hadoop consists of three key parts –
- Hadoop Distributed File System (HDFS) – It is the storage layer of Hadoop.
- Map-Reduce – It is the data processing layer of Hadoop.
- YARN – It is the resource management layer of Hadoop.
For more learning Hadoop

What is Hadoop?

Hadoop is a free, Java-based programming framework that is used for the processing of large data sets in a distributed computing environment.

1 Comment(s)

Comment on it

Unable to start Java!! Mr. Nerd figure out why..

Positive Votes

No Comments

Negative Votes

Delete Comment

Post Projects

Manage Company