A Closer Look on Hadoop Architecture

Hadoop Architecture
Hadoop Architecture

In today’s computerized world, Hadoop has become the driving force behind the development of the big data industry. But what functions do they have? Previously, we have discussed few important things about Hadoop Tutorial and let us discuss a closer look of Hadoop Architecture in this article.

What is Hadoop Architecture?

Hadoop Architecture enables distributed processing of big data sets across clusters of computers and can scale easily from a single machine to thousands of servers. The Hadoop architecture is robust in nature and it is designed to withstand failure at the application layer and to continue operating even when individual servers or clusters fail. And also, since it’s incredibly efficient, it doesn’t require applications to move large volumes of data across a network.

Hadoop Architecture Diagram

Hadoop Architecture
Hadoop Architecture

Based on the above given Hadoop Architecture Diagram, the main components of Hadoop are mentioned below:

Name Node: This option controls the operation of the data jobs

Data Node: This writes data in blocks to the local storage. And also it replicated data blocks to other data nodes.

Secondary Name Node: This one comes into control when the primary name node goes offline

JobTracker: This one sends MapReduce jobs to nodes in the cluster.

TaskTracker: This accepts tasks from the Job Tracker.

YARN This is a resource manager that can also run as a stand-alone component to provide other applications the ability to run in distributed Hadoop architecture. In YARN, there are at least three actors:

  • the Job Submitter (the client)
  • the Resource Manager (the master)
  • the Node Manager (the slave)

Client Application: This application is whatever program you have written or some other client like Apache Pig.

Application Master: This command runs shell command in a container as directed by the option YARN

Hadoop Architecture Processes and Properties

Type “jps” to see what processes are running. It should list NameNode and SecondaryNameNode on the master and the backup master. DataNodes run on the data nodes.

When you first install it there is no need to change any config. But there are many configuration options to tune it and set up a cluster. Here are a few.

On Final Recap,

A very small team was able to build the Hadoop architecture, i.e. Hadoop distributed file system and make it stable and robust enough to use it in production.

Get the FREE Hadoop Tutorial Guide from Tutorial Chat! Stay ahead of the competition by learning how to use Big Data Solutions and to overcome Hadoop implementation hurdles! Join Tutorial Chat now!