-
What is Hadoop?
almost 9 years ago
-
over 6 years ago
Hadoop is an open-source tool from the ASF – Apache Software Foundation. Open source project means it is freely available and we can even change its source code as per the requirements. If certain functionality does not fulfill your need then you can change it according to your need. Most of Hadoop code is written by Yahoo, IBM, Facebook, Cloudera.
It provides an efficient framework for running jobs on multiple nodes of clusters. Cluster means a group of systems connected via LAN. Apache Hadoop provides parallel processing of data as it works on multiple machines simultaneously.
By getting inspiration from Google, which has written a paper about the technologies. It is using technologies like Map-Reduce programming model as well as its file system (GFS). As Hadoop was originally written for the Nutch search engine project. When Doug Cutting and his team were working on it, very soon Hadoop became a top-level project due to its huge popularity.
Apache Hadoop is an open source framework written in Java. The basic Hadoop programming language is Java, but this does not mean you can code only in Java. You can code in C, C++, Perl, Python, ruby etc. You can code the Hadoop framework in any language but it will be more good to code in java as you will have lower level control of the code.
Big Data and Hadoop efficiently processes large volumes of data on a cluster of commodity hardware. Hadoop is for processing huge volume of data. Commodity hardware is the low-end hardware, they are cheap devices which are very economical. Hence, Hadoop is very economic.
Hadoop can be setup on a single machine (pseudo-distributed mode, but it shows its real power with a cluster of machines. We can scale it to thousand nodes on the fly ie, without any downtime. Therefore, we need not make any system down to add more systems in the cluster. Follow this guide to learn Hadoop installation on a multi-node cluster.
Hadoop consists of three key parts –
- Hadoop Distributed File System (HDFS) – It is the storage layer of Hadoop.
- Map-Reduce – It is the data processing layer of Hadoop.
- YARN – It is the resource management layer of Hadoop.
-
1 Comment(s)