4.3 HBase实现原理_工具

用于连接每个客户端，即客户端可调用所提供的这些函数访问HBase的数据

充当管家，主要作用：

a) 获知整个集群中哪些Region服务器在工作、哪些有故障

b) 一个表会被分为多个Region，每个Region被分配到哪个Region服务器由Master服务器决定

一整张表会被分为多个Region，它们由Region服务器负责维护和管理。

2006年以前常设计为100-200MB，现在一般配置为1~2GB。具体设计应取决于单台服务器的处理能力（存取速度、内存等）。

前面提到过一个Region增大到一定程度会被Master服务器拆分成多个小Region。但在存储方面，一个完整的Region（拆分前的）不会被存储到不同的Region服务器上。

三层寻址对应了三层表：

(1) -ROOT-表 ：存储元数据表，即MEAT表的信息。它被“写死”在ZooKeeper文件中，是唯一的、不能再分裂的

(2) META表 ：存储用户数据具体存储在哪些Region服务器上。它会随存储数据的增多而分裂成更多个。

(3) 用户数据表 ：具体存储用户数据。它是最底层的、可分裂的

HBase采用 三级寻址 ：

(1) ZooKeeper找到-ROOT-表地址

(2) -ROOT-表中找到需要的META表地址

(3) META表找到所需的用户数据表地址

(4) 最后从用户数据表取出目标数据

另外，为了加速寻址，客户端会 缓存已查数据的位置信息 （在客户端自己的缓存中），下次取相同的数据就可以快速访问。——但随Region的更新，缓存记录可能失效。对于这个问题，HBase采用惰性解决机制，即首先使用缓存的位置，如果在那个位置查不到目标数据，则按三级寻址重新查询，再更新缓存。

假设一个Region最大128MB（注意-ROOT-、META、用户表都是以Region形式存储的），一条映射条目大小1KB：

Reference:

Spark SQL就是shark ，也就是SQL on Spark。如果没记错的话，shark的开发利用了hive的API，所以支持读取HBase。而且Spark的数据类型兼容范围大于Hadoop，并且包含了Hadoop所支持的任何数据类型。

用happybase进行hbase中数据的增删改查

前提：已经安装happybase库(pip install happybase)，已有hbase环境并开启thrift通讯端口(nohup hbase thrift start &)，thrift默认端口为9090，101030200为hbase主机ip

scan方法：

参数：

row_start、row_stop：起始和终止rowkey，查询两rowkey间的数据

row_prefix：rowkey前缀。注：使用row_prefix的时候，row_start和row_stop不能使用

filter：要使用的过滤器(hbase 092版本及以上生效)

timestamp：按指定时间戳查询

reverse：默认为False。为True时，scan结果按rowkey倒序排列

eg：

put方法：

eg：

△ 如put中的rowkey已存在，则为修改数据

delete方法：

row：删除rowkey为row的数据

columns：指定columns参数时，删除

eg：

删除rowkey为student2的name数据：

删除成功：

batch方法：

1、批量 *** 作

2、使用with管理批量

row方法及rows()方法，检索指定rowkey的数据

检索一条：

检索多条：

返回结果：

eg：

结果：

暂时就这些0v0

给你一个类的代码，你看看就知道怎么连接的了

import javaioIOException;

import javautilMap;

import orgapachehadoopconfConfiguration;

import orgapachehadoophbaseHBaseConfiguration;

import orgapachehadoophbaseHColumnDescriptor;

import orgapachehadoophbaseHTableDescriptor;

import orgapachehadoophbaseclientHBaseAdmin;

import orgapachehadoophbaseclientHTable;

import orgapachehadoophbaseclientPut;

import orgapachehadoophbaseclientResult;

public class Htable {

public static void main(String[] args) throws IOException {

// Configuration hbaseConf = HBaseConfigurationcreate();

Configuration HBASE_CONFIG = new Configuration();

//与hbase/conf/hbase-sitexml中hbasemaster配置的值相同

HBASE_CONFIGset("hbasemaster", "91868927:60000");

//与hbase/conf/hbase-sitexml中hbasezookeeperquorum配置的值相同

HBASE_CONFIGset("hbasezookeeperquorum", "91868927,91868929,91868931,91868933,91868934");

//与hbase/conf/hbase-sitexml中hbasezookeeperpropertyclientPort配置的值相同

HBASE_CONFIGset("hbasezookeeperpropertyclientPort", "2181");

Configuration hbaseConf = HBaseConfigurationcreate(HBASE_CONFIG);

HBaseAdmin admin = new HBaseAdmin(hbaseConf);

// set the name of table

HTableDescriptor htableDescriptor = new HTableDescriptor("test11"getBytes());

// set the name of column clusters

htableDescriptoraddFamily(new HColumnDescriptor("cf1"));

if (admintableExists(htableDescriptorgetName())) {

admindisableTable(htableDescriptorgetName());

admindeleteTable(htableDescriptorgetName());

}

// create a table

admincreateTable(htableDescriptor);

// get instance of table

HTable table = new HTable(hbaseConf, "test11");

// for is number of rows

for (int i = 0; i < 3; i++) {

// the ith row

Put putRow = new Put(("row" + i)getBytes());

// set the name of column and value

putRowadd("cf1"getBytes(), (i+"col1")getBytes(), (i+"vaule1")getBytes());

putRowadd("cf1"getBytes(), (i+"col2")getBytes(), (i+"vaule2")getBytes());

putRowadd("cf1"getBytes(), (i+"col3")getBytes(), (i+"vaule3")getBytes());

tableput(putRow);

}

// get data of column clusters

for (Result result : tablegetScanner("cf1"getBytes())) {

// get collection of result

for (MapEntry<byte[], byte[]> entry : resultgetFamilyMap("cf1"getBytes())entrySet()) {

String column = new String(entrygetKey());

String value = new String(entrygetValue());

Systemoutprintln(column + "," + value);

}

client整个HBase集群的访问入口；使用HBase RPC机制与HMaster和HRegionServer进行通信；client与HMaster进行通信进行管理表的 *** 作；client与HRegionServer进行数据读写类 *** 作；包含访问HBase的接口，并维护cache来加快对HBase的访问。

RPC（远程过程调用协议）是不同主机进程间通讯的一种方式，协议采用客户机-服务器模式的架构，请求程序为客户机，服务提供程序为服务器，hbase在client与server通信上采用的也是RPC协议，并在client端与server端实现了具体的RPC协议内容。

以上就是关于4.3 HBase实现原理全部的内容，包括:4.3 HBase实现原理、HBASE 1.0、phoenix，impala，spark sql访问hbase数据库哪种工具性能最优等相关内容解答，如果想了解更多相关内容，可以关注我们，你们的支持是我们更新的动力！

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/sjk/10136213.html

4.3 HBase实现原理

发表评论

评论列表（0条）