Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Hadoop Introduction
Hadoop Hbase
Hadoop HDFS
Hadoop Hive
Hadoop Map Reduce
HBase is non-relational NoSQL distributed and column-oriented database. Here we will have tables and tables
have multiple rows and column.HBase provide random real-time read/write access to data.
Here the tables are sorted by row. A table can have multiple columns families. Each column family can have
multiple columns.
We can say that in HBase, table is a collection of rows-->row is a collection of multiple column
family-->column family is a collection of multiple columns--> column is a collection of multiple key:value
pair.
HBase has three main parts:
1. HMaster:
HMaster helps to assign region to region servers. The load balancing is done in HBase by assigning regions.
Region means vertical cross section of our HBase table. It means that it consists of multiple column family.
It also manage the hadoop clusters, create modify and delete tables.
2. Region Server:
Region server means working nodes. So we can say that region servers are the main working nodes. It is used to
handle client read, write and modify request. Region server runs on every node. Region server stores actual
data on a disk. In region server there is a parts named read cache or block cache. Client read data is stored
in read cache or block cache and when the cache is full it removes t the recently used data. There is another
cache name memstore. This cache is for perform write operations. Column families has different memstore and
this memstore stored the new data which is not stored in the disk.
3. Zookeeper
This is a open source server. Zookeeper maintains the configuration information and distributed
synchronization, etc. Zookeeper track HBase all region servers.
RDBMS | HBase |
---|---|
This row oriented | This is column oriented |
Need to upgrade server to add more processing power, space | Don't need to upgrade the machine to add more processing power, space. |
Can't easily add columns | Can easily add columns |
Data size depends on disk size of server. | Data size depends on machine numbers. |
Only structure data is supported | structure and unstructured data is supported. |
To work with HBase, HBase has a shell.
Let's see some HBase commands:
To start HBase shell type: hbase shell
Command | Description |
---|---|
version | Shows the current version of HBase |
status | Used to see the status of HBase |
version | Shows the current version of HBase |
list | Used to see all the tables |
create | Used to create table |
list | Used to see all the tables |
disable | Used to disable table |
enable | Used to enable table |
disable | Used to disable table |
drop | Used to delete a table from HBase |
describe | Used to see description of a table |
alter | Used to update a table |
drop | Used to delete a table from HBase |
drop_all | Used to drop a table by matching condition |
put | Used to put a cell value |
deleteall | Used to delete all the cells in a given row |
get | Used to fetches content from a row or cell |
delete | Used to delete a cell value in a table |
count | Used to count the number of rows in a table |
deleteall | Used to delete all the cells in a given row |
scan | Used to show the table data |
count | Used to count the number of rows in a table |
table_help | Used to see the table related command |
To create table, create command is used and here we have to specify the table name and the column family
name.
Syntax:
create create '<table_name>',<column_family_name>
Suppose you want to create a student table with student information data and school data column family.
Code:
create 'student','student_information','school_data'
Let's insert data:
To insert put command is used.
Syntax:
put'<table_name>','<row_number>','<column_family:column_name>', '<value>'
Now suppose you want to insert student name in row 2.
Syntax:
put 'student','1','student_information:name','Rafsun'
To view or get table data scan command is used.
Syntax:
scan '<table>'
Suppose you want to view student table.
Syntax:
scan 'student'
Let's read
To read get command is used and using get command you can get data of a single row at a time.
Syntax:
get '<table_name>','<row_number>'
You can also get single column.
Syntax:
get '<table_name>','<row_number>',{COLUMN =>'column_family_name:column_name'}
How to disable?
If you want to delete a table or want to change a table settings, at first you need to disable the table. To
do this disable command is used. If the table is disabled you can't perform scan or insert operation on that
table.
Syntax:
disable '<table_name>'
If you want to check a table is disable or not use is_disabled command.
Syntax:
is_disabled '<table_name>'
How enable a table?
To enable a disable table enable command is used.
Syntax:
enable '<table_name>'
If you want to check that is a table is enable or not use is_enable command.
Syntax:
is_enable '<table_name>'
To delete a specific cell from a table delete command is used.
Syntax:
delete '<table_name>','<row_number>','<column_family:column_name>','<time_stamp>'
Here time_stamp command is optional
Let's delete a entire row,
Syntax:
deleteall'<table_name>','<row_number>'