Saturday, October 22, 2016

HBase Table and Storage Design

HBase is a set of tables defined by Key-Values

Definitions:

  • Row KeyEach row has a unique Row Key
  • Column FamilyUse to group different columns, defined when building the table
  • Column QualifierIdentify each column, can be added in run-time
    Expandable
    , may have different numbers of Qualifiers in each row
  • timestampA cell can have different versions of data based on different timestamps(for consistency)

A Row is referred by a Row Key
So a Column is referred by a Column Family and a Column Qualifier
A Cell is referred by a Row and a Column
A Cell will contain multiple version of values based on different timestamps

The above four things combines together to become a Key in the following format:
    [Row Key]/[Column Family]:[Column Qualifier]/[timestamp]

With this Key, we can find the unique value we want to read.

Every value we get will be in byte[] format, developer need to know the structure of value themselves.

Rows are strongly consistency.






How does HBase db stores in file system?

  1. Group continues rows into the same Region
  2. Region is divided by each Column Family to be several units
  3. Each one of these divided unit stored as a single file in file system
  4. The values in each file are in lexicographical order(I think order by Column Qualifier)
So if we open one HBase db file, it will only have
  1. Continues rows
  2. Values in the same Column Family
  3. (I think)Values are ordered by Column Qualifier




Special Table

Table .META.
This table is using the same format as regular tables.
It can be in any one node of HBase cluster, it stores information for where to find the target region.











No comments:

Post a Comment