Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Hadoop Introduction
Hadoop Hbase
Hadoop HDFS
Hadoop Hive
Hadoop Map Reduce
In this time big data is a big problem and to solve the problem we use hadoop as solution.
So hadoop is a framework that can handle a huge amount of data on a low cost and simple hardware cluster. Here
cluster means collection of multiple computers and low cost means here we will use commodity server and the
hardware cost will be low and that's why we can get a low cost solution by hadoop for the big data problems.
It is scalable. Here we can divide the workflow into multiple servers. Hadoop is fault-tolerant. It means if a
server or node goes down then other servers or nodes can process the data . Hadoop is a storage system and we
can also processed data using hadoop.
1. Hadoop capture minimum or at least 90% of the big data market.
2.Here the cost is low. To work with hadoop we can use cheap hardware.
3.Hadoop is fault-tolerant and scalable.
4.Here we can store structure data, unstructured and semi-structure data.
Hadoop Distributed File System(HDFS):
HDFS is used as the primary storage system for hadoop. So whatever the data we are going to store on hadoop,
will be stored in the HDFS.
Map Reduce v2:
The hadoop map reduce is used to process different data of the HDFS and we can write different types of
application that can manage the data stored in HDFS. Here the data will be splitted into multiple different
portions and the data will process parallelly in the distributed environment.
Yet Another Resource Negotiator(YARN):
It is the resource management and job scheduling technology in Hadoop . It used for allocating system
resources to the various applications which are running in the Hadoop cluster. It also make schedule all the
tasks which are going to be executed on different cluster nodes.
Common Utilities:
Some modules or libraries will be here which will access from other different module of the hadoop system.