TOP

hbase集群负载均衡与高性能的关键——region分割与合并

2018-12-11 17:21:17 【大中小】浏览:120次

HBase通过对表的Region数量实现简单的负载均衡，官方认为这种实现是简洁而高效的，能满足绝大部分的需求。接下来和大家和分享一下自动的负载均衡和手动控制的负载均衡。如下几种情况时，当前集群不会执行负载均衡：
1、master没有被初始化
2、当前已有负载均衡方法在运行
3、有region处于splitting状态
4、集群中有死掉的region server

第一部分、分割(split)

split是切分、切割、分裂的意思，用来描述region的切分行为。

【与region有关的存储结构介绍】
hbase中的Region是一张表的子集，也就是说把一张表在水平方向上切割成若干个region。一张表一开始的时候只有一个region(区域)，随着数据量的增长，会自动(或手动)切分出来越来越多的region。HBase中针对表采用”Range分区”，把rowkey的完整区间切割成一个个的”Key Range” ，每一个”Key Range”称为一个Region，所以说region其实是按照连续的rowKey存储的区间。
不同Region分布到不同Region Server上，region是Hbase集群分布数据的最小单位，或者说region是HBase中分布式存储和负载均衡的最小单元，但不是存储的最小单元。存储的最小单元是store file（也叫Hfile）。Store File是存放数据的地方，里面存的是一个列簇的数据，每一条数据都是key-value，store file的内部是按照rowkey有序排列的，但是store file之间是无序的。

【为什么要切分region】
更大的Region使得我们集群上的Region的总数量更少，而数量更少的region能让集群运行更顺畅。如果region数目太多就会造成读写性能下降，也会增加ZooKeeper的负担。但是region数目太少会妨碍可扩展性，降低读写并发性能，会导致压力不够分散。综合权衡集群的负载和性能，需要把过大的region做切分，切成更小的region分散到更多regionServer上去，以缓解region server过大的压力，从而均衡每一台region server负载。根据写请求量的情况，region数量在20~200个之间可以提高集群稳定性、提升读写性能。

【与切分region有关的配置】
对于在线型应用来说，store file（hbase.hregion.max.filesize）的大小一般是线下型应用设置参数的两倍。
监控Region Server中所有Memstore的大小总和是否达到了上限（hbase.regionserver.global.memstore.upperLimit ＊ hbase_heapsize，默认40%的JVM内存使用量），超过会导致不良后果，如服务器反应迟钝或compact风暴。
hbase.regionserver.global.memstore.upperLimit 的缺省值为0.4，即JVM内存的40%.

【region切分策略】
常用的有三种策略：
（1）ConstantSizeRegionSplitPolicy：小于0.94版本的hbase只有这个分裂策略。当region中的一个store（对应的是一个列簇的一个 Hfile ）超过了配置参数hbase.hregion.max.filesize 的阈值时拆分成两个。region拆分线是最大storefile的中间rowkey。
（2）IncreasingToUpperBoundRegionSplitPolicy：这个是0.94~1.x版本默认的分裂策略。按固定长度分割region，固定长度取值优先获取table的”MAX_FILESIZE” 值，若没有设定该属性，则采用在hbase-site.xml中配置的hbase.hregion.max.filesize值，从0.94版本开始，这个值的缺省值已经被调整为：10 * 1024 * 1024 * 1024L 也就是10G，网上很多关于 hbase.hregion.max.filesize 默认值 1G的文章应该都是基于0.92的hbase的。采用该策略后，当table的某一region中的某一store大小超过了预定的最大固定长度时，对该region进行split。切割点(splitPoint)算法的选择是依据“数据对半”原则找到该region的最大store的中间长度的rowkey进行切割。不论是哪种切分策略，ConstantSizeRegionSplitPolicy、IncreasingToUpperBoundRegionSplitPolicy 或 SteppingSplitPolicy，对切分点的定义都是一致的。但是用户手动执行切分时是可以指定切分点进行切分的。
（3）SteppingSplitPolicy: 2.x版本的默认切分策略。这种策略的切分阈值发生了变化，相比IncreasingToUpperBoundRegionSplitPolicy简单了一些，依然和待分裂region所属表在当前RegionServer上的region个数有关系：如果region个数等于1，切分阈值为flush size * 2，否则为MaxRegionFileSize。这种策略对于大集群中的大表、小表会比IncreasingToUpperBoundRegionSplitPolicy更加友好，小表不会再产生大量的小region，而是适可而止。

【拆分region的步骤】
Region的拆分操作是不可见的，因为Master不会参与其中，REGION SERVER负责拆分。RegionServer拆分region的步骤是：
将待切割的region下线 ==> 拆分成2个子region ==> 将子region加入到hbase:meta中，再将它们加入到原本的RegionServer中 ==> RegionServer将本机最新的region信息汇报给Master

[官方文档的切分过程描述]

这里写图片描述
1. RegionServer decides locally to split the region, and prepares the split. As a first step, it creates a znode in zookeeper under /hbase/region-in-transition/region-name in SPLITTING state.
2. The Master learns about this znode, since it has a watcher for the parent region-in-transition znode.
3. RegionServer creates a sub-directory named “.splits” under the parent’s region directory in HDFS.
4. RegionServer closes the parent region, forces a flush of the cache and marks the region as offline in its local data structures. At this point, client requests coming to the parent region will throw NotServingRegionException. The client will retry with some backoff.
5. RegionServer create the region directories under .splits directory, for daughter regions A and B, and creates necessary data structures. Then it splits the store files, in the sense that it creates two Reference files per store file in the parent region. Those reference files will point to the parent regions files.
6. RegionServer creates the actual region directory in HDFS, and moves the reference files for each daughter.
7. RegionServer sends a Put request to the .META. table, and sets the parent as offline in the .META. table and adds information about daughter regions. At this point, there won’t be individual entries in .META. for the daughters. Clients will see the parent region is split if they scan .META., but won’t know about the daughters until they appear in .META.. Also, if this Put to .META. succeeds, the parent will be effectively split. If the RegionServer fails before this RPC succeeds, Master and the next region server opening the region will clean dirty state about the region split. After the .META. update, though, the region split will be rolled-forward by Master.
8. RegionServer opens daughters in parallel to accept writes.
9. RegionServer adds the daughters A and B to .META. together with information that it hosts the regions. After this point, clients can discover the new regions, and issue requests to the new region. Clients cache the .META. entries locally, but when they make requests to the region server or .META., their caches will be invalidated, and they will learn about the new regions from .META..
10. RegionServer updates znode /hbase/region-in-transition/region-name in zookeeper to state SPLIT, so that the master can learn about it. The balancer can freely re-assign the daughter regions to other region servers if it chooses so.
11. After the split, meta and HDFS will still contain references to the parent region. Those references will be removed when compactions in daughter regions rewrite the data files. Garbage collection tasks in the master periodically checks whether the daughter regions still refer to parents files. If not, the parent region will be removed.

再附一个国外的大神对region切分的描述——

How Region Split works in HBase. Short Description:
Article explains a region split in HBase through the following sequence of events.
Article
A region is decided to be split when store file size goes above hbase.hregion.max.filesize or according to defined region split policy.
At this point this region is divided into two by region server.
Region server creates two reference files for these two daughter regions.
These reference files are stored in a new directory called splits under parent directory.
Exactly at this point, parent region is marked as closed or offline so no client tries to read or write to it.
Now region server creates two new directories in splits directory for these daughter regions.
If steps till 6 are completed successfully, Region server moves both daughter region directories under table directory.
The META table is now informed of the creation of two new regions, along with an update in the entry of parent region that it has now been split and is offline. (OFFLINE=true , SPLIT=true)
The reference files are very small files containing only the key at which the split happened and also whether it represents top half or bottom half of the parent region.
There is a class called “HalfHFileReader”which then utilizes these two reference files to read the original data file of parent region and also to decide as which half of the file has to be read.
Both regions are now brought online by region server and start serving requests to clients.
As soon as the daughter regions come online, a compaction is scheduled which rewrites the HFile of parent region into two HFiles independent for both daughter regions.
As this process in step 12 completes, both the HFiles cleanly replace their respective reference files. The compaction activity happens under .tmp directory of daughter regions.
With the successful completion till step 13, the parent region is now removed from META and all its files and directories marked for deletion.
Finally Master server is informed by this Region server about two new regions getting born. Master now decides the fate of the two regions as to let them run on same region server or have them travel to another one.

【禁用自动切分region，改用手动切分】
上面说的都是自动切分来实现的负载均衡。建议在生产环境中禁用自动的split（或者将Region设置的大一点），然后手动切割region，可以在晚上或者业务不忙（例如没有大量写操作）的时候手动执行split。
执行split操作后，Hbase内部的逻辑是：
第一轮split，遍历所有region的信息，如果region size大于某值（比如4G）则split该region;
如果一轮后没有大于该值的region则结束，如果还有大于该值的region则继续新一轮split，直到没有region大于该阈值为止。
如何判断split已完成：检查hdfs上老的region目录是否已被清除来判断split是否已完成。

【合并region简介】
既然有切分region的，那么就有合并region的。
缘由：如果region数目太多了，会导致性能下降，也会增加ZooKeeper的负担
hbase shell上的reqion合并操作：
hbase> merge_region ‘ENCODED_REGIONNAME’, ‘ENCODED_REGIONNAME’
hbase> merge_region ‘ENCODED_REGIONNAME’, ‘ENCODED_REGIONNAME’, true # 强制合并
# 其中’ENCODED_REGIONNAME’表示region Id中的hashcode，他是全局唯一的分区名，比如本文案例中的 353a385f28af52ed47e675f18242bbf8
# 能合并startkey endkey相邻的region，不相邻的只能强制合并(加参数true即可)

【手动切割region案例】
在hbase shell中手动切割region来实现负载均衡。
hbase shell中的split语法：
split ‘tableName’
split ‘namespace:tableName’
split ‘regionName’ # format: ‘tableName,startKey,id’
split ‘tableName’, ‘splitKey’
split ‘regionName’, ‘splitKey’
—— splitKey 表示从哪一行开始切分

下面以 defualt:test_tony 表为例演示手动切分此表。此表在hbase的webUI上面查看到，有4个region,分别分布在4个Region Server上。

hbase(main):022:0> list_namespace
NAMESPACE
default
hbase
users
3 row(s) in 0.0100 seconds

hbase(main):023:0> list_namespace_tables ‘hbase’
TABLE
meta
namespace
2 row(s) in 0.0100 seconds

手动切割以前，记录数有231条。执行手动切割后会有变化，请往下看

hbase(main):024:0> count ‘hbase:meta’, INTERVAL => 23
Current count: 23, row: users:usertb,7FF70F148964809C8475FEE655005CE3,1523397818177.8c487f00838ae5e9e19ffced7f8914e8.
Current count: 46, row: users:usertb,2FFD4264F7EC7DED66CACB1A2D3AE8EA,1511941432094.8b142ad572b93d65e303a6a5ec5da5a3.
Current count: 69, row: users:usertb,7BF84538F6A96E274651966AF3A6EA87,1515183873848.695d98fb0a9cac2d640c2906444a32e2.
Current count: 92, row: users:usertb,BC0049FADE2ABE2F54CBD8244A861B85,1519967361809.8a7c89d7063118fe9b0e0dd1c9eafd2e.
Current count: 115, row: users:my_table,001|,1511873110944.69e71f4ba7a0f420eb287431b925b377.
Current count: 138, row: users:my_table,0DF7986,1515671013211.2157bae66326fc85b88d2f796f59613a.
Current count: 161, row: users:my_table,59F81618,1515652416016.28d28f08bef06894870ef39ac72482b6.
Current count: 184, row: users:my_table,93FBFF13995C6E8BF2C5ABB244E16B98,1512522023490.9fb718ce9e15a1bf8e4824248b761b45.
Current count: 207, row: users:my_table,E8128B65E40AA34CDC10918689DB0839,1512136535400.74654b61bb4257f7b4fdabc08dceecdd.
Current count: 230, row: test_tony,6E79D9382F405FB9132FA20EDF15CB0B,1524732581444.48f77b080e02cb8e9e652ec7467bfa76.
231 row(s) in 0.0590 seconds
=> 231

在hbase shell上执行手动切割。在本案中，不指定切割哪个region或按照哪个rowkey来做切割点，只是粗粒度地切分表

hbase(main):001:0> split ‘default:test_tony’
0 row(s) in 1.0350 seconds

马上查看记录数

hbase(main):002:0> count ‘hbase:meta’, INTERVAL => 23
Current count: 23, row: users:usertb,7FF70F148964809C8475FEE655005CE3,1523397818177.8c487f00838ae5e9e19ffced7f8914e8.
Current count: 46, row: users:usertb,2FFD4264F7EC7DED66CACB1A2D3AE8EA,1511941432094.8b142ad572b93d65e303a6a5ec5da5a3.
Current count: 69, row: users:usertb,7BF84538F6A96E274651966AF3A6EA87,1515183873848.695d98fb0a9cac2d640c2906444a32e2.
Current count: 92, row: users:usertb,BC0049FADE2ABE2F54CBD8244A861B85,1519967361809.8a7c89d7063118fe9b0e0dd1c9eafd2e.
Current count: 115, row: users:my_table,001|,1511873110944.69e71f4ba7a0f420eb287431b925b377.
Current count: 138, row: users:my_table,0DF7986,1515671013211.2157bae66326fc85b88d2f796f59613a.
Current count: 161, row: users:my_table,59F81618,1515652416016.28d28f08bef06894870ef39ac72482b6.
Current count: 184, row: users:my_table,93FBFF13995C6E8BF2C5ABB244E16B98,1512522023490.9fb718ce9e15a1bf8e4824248b761b45.
Current count: 207, row: users:my_table,E8128B65E40AA34CDC10918689DB0839,1512136535400.74654b61bb4257f7b4fdabc08dceecdd.
Current count: 230, row: test_tony,0E88B0027716FDCB9EBCC25EB7BA81E5,1527665424956.751c339edf3a9e508ed2d607ab6d286e.
239 row(s) in 0.0590 seconds
=> 239

再次查看记录数

hbase(main):003:0> count ‘hbase:meta’, INTERVAL => 23
Current count: 23, row: users:usertb,7FF70F148964809C8475FEE655005CE3,1523397818177.8c487f00838ae5e9e19ffced7f8914e8.
Current count: 46, row: users:usertb,2FFD4264F7EC7DED66CACB1A2D3AE8EA,1511941432094.8b142ad572b93d65e303a6a5ec5da5a3.
Current count: 69, row: users:usertb,7BF84538F6A96E274651966AF3A6EA87,1515183873848.695d98fb0a9cac2d640c2906444a32e2.
Current count: 92, row: users:usertb,BC0049FADE2ABE2F54CBD8244A861B85,1519967361809.8a7c89d7063118fe9b0e0dd1c9eafd2e.
Current count: 115, row: users:my_table,001|,1511873110944.69e71f4ba7a0f420eb287431b925b377.
Current count: 138, row: users:my_table,0DF7986,1515671013211.2157bae66326fc85b88d2f796f59613a.
Current count: 161, row: users:my_table,59F81618,1515652416016.28d28f08bef06894870ef39ac72482b6.
Current count: 184, row: users:my_table,93FBFF13995C6E8BF2C5ABB244E16B98,1512522023490.9fb718ce9e15a1bf8e4824248b761b45.
Current count: 207, row: users:my_table,E8128B65E40AA34CDC10918689DB0839,1512136535400.74654b61bb4257f7b4fdabc08dceecdd.
Current count: 230, row: test_tony,0E88B0027716FDCB9EBCC25EB7BA81E5,1527665424956.751c339edf3a9e508ed2d607ab6d286e.
243 row(s) in 0.4780 seconds
=> 243

查看webUI的此表的情况：表的compaction状态值、region数量

Table test_tony
Table Attributes
Attribute Name Value Description
Enabled true Is the table enabled
Compaction MINOR Is the table compacting

可以看出compaction状态值为MONIOR，此次split触发了小型合并(minor compaction)，
原因:被切割的region对应的HFile产生2个更小的 Hfile(俩hfile也对应着2个新的子region),
导致该region中的某一个HStore中的store file数量超过了 hbase.hstore.blockingStoreFiles的值(可配置)，
一旦超过这个值，就会触发minor compaction，更详细原因请看第二部分的“【延伸】compaction有关的参数介绍 ”

在手动执行split之前是4个region，在hdfs上也能查到有4个region

[root@tony-master-1-001 ~]# hdfs dfs -ls /apps/hbase/data/data/default/test_tony
drwxr-xr-x - hbase hdfs 0 2018-04-26 16:12 /apps/hbase/data/data/default/test_tony/.tabledesc
drwxr-xr-x - hbase hdfs 0 2018-04-26 16:12 /apps/hbase/data/data/default/test_tony/.tmp
drwxr-xr-x - hbase hdfs 0 2018-05-07 09:32 /apps/hbase/data/data/default/test_tony/2b470c5b3b37105443d50b2300711a46
drwxr-xr-x - hbase hdfs 0 2018-05-07 09:09 /apps/hbase/data/data/default/test_tony/48f77b080e02cb8e9e652ec7467bfa76
drwxr-xr-x - hbase hdfs 0 2018-05-07 09:32 /apps/hbase/data/data/default/test_tony/d80b7f116883a820608e7c63880a4c20
drwxr-xr-x - hbase hdfs 0 2018-05-07 09:32 /apps/hbase/data/data/default/test_tony/ef52212a815676f394278729e47748fc

结果是region数量增多了，执行split成功完成以后变成10个

[root@tony-master-1-001 ~]# hdfs dfs -ls /apps/hbase/data/data/default/test_tony
Found 12 items
drwxr-xr-x - hbase hdfs 0 2018-04-26 16:12 /apps/hbase/data/data/default/test_tony/.tabledesc
drwxr-xr-x - hbase hdfs 0 2018-04-26 16:12 /apps/hbase/data/data/default/test_tony/.tmp
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:31 /apps/hbase/data/data/default/test_tony/20f73f44db08fd7279515c8f89a99a35
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:31 /apps/hbase/data/data/default/test_tony/41b72f8f61c425072cd75f988f865f54
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:31 /apps/hbase/data/data/default/test_tony/5d830c556ffb9a36bd80e7a5e80c3c02
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:30 /apps/hbase/data/data/default/test_tony/751c339edf3a9e508ed2d607ab6d286e
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:31 /apps/hbase/data/data/default/test_tony/c76afae02e1978cae057402a6dbc2948
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:31 /apps/hbase/data/data/default/test_tony/cf3fcc71b31cfafb59d902ed84713bea
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:31 /apps/hbase/data/data/default/test_tony/e34634b07b5d37a7adc867b0c36151b4
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:30 /apps/hbase/data/data/default/test_tony/e94599b92dee9874d9dd76dbdbc2aee3
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:31 /apps/hbase/data/data/default/test_tony/fb40ea9d9d0f08a2c880fa6eadeb4814
drwxr-xr-x - hbase hdfs 0 2018-05-30 15:30 /apps/hbase/data/data/default/test_tony/fbc22fc4b6b7f4ca348623468303ae28

顺便查看webUI页面的Regions by Region Server一栏，发现原先的4个regionServer(RS)中的每个RS分配一个region，切割后hmaster给每个RS分配了2~3个region

Region Server Region Count
tony-host-001.ksc.com:16020 3
tony-host-003.ksc.com:16020 2
tony-host-005.ksc.com:16020 3
tony-host-006.ksc.com:16020 2

到hbase shell查看hbase:meta表的相关region信息，可查到10条 info:regioninfo 记录

hbase(main):011:0> scan ‘hbase:meta’, {STARTROW => ‘test_tony’,COLUMN => ‘info:regioninfo’ ,LIMIT => 20}
ROW COLUMN+CELL
test_tony,,1527665424956.e94599b92dee9874d9dd76dbdbc2aee3. column=info:regioninfo, timestamp=1527665426209, value={ENCODED => e94599b92dee9874d9dd76dbdbc2aee3, NAME => ‘test_tony,,1527665424956.e94599b92dee9874d9dd76dbdbc2aee3.’, START
KEY => ”, ENDKEY => ‘0E88B0027716FDCB9EBCC25EB7BA81E5’}
test_tony,0E88B0027716FDCB9EBCC25EB7BA81E5,1527665424956.7 column=info:regioninfo, timestamp=1527665426209, value={ENCODED => 751c339edf3a9e508ed2d607ab6d286e, NAME => ‘test_tony,0E88B0027716FDCB9EBCC25EB7BA81E5,1527665424956.751c339
51c339edf3a9e508ed2d607ab6d286e. edf3a9e508ed2d607ab6d286e.’, STARTKEY => ‘0E88B0027716FDCB9EBCC25EB7BA81E5’, ENDKEY => ‘23FBF01’}
test_tony,23FBF01,1527665425124.e34634b07b5d37a7adc867b0c3 column=info:regioninfo, timestamp=1527665426209, value={ENCODED => e34634b07b5d37a7adc867b0c36151b4, NAME => ‘test_tony,23FBF01,1527665425124.e34634b07b5d37a7adc867b0c36151b4
6151b4. .’, STARTKEY => ‘23FBF01’, ENDKEY => ‘493C2C’}
test_tony,493C2C,1527665467717.5d830c556ffb9a36bd80e7a5e80 column=info:regioninfo, timestamp=1527665468342, value={ENCODED => 5d830c556ffb9a36bd80e7a5e80c3c02, NAME => ‘test_tony,493C2C,1527665467717.5d830c556ffb9a36bd80e7a5e80c3c02.
c3c02. ‘, STARTKEY => ‘493C2C’, ENDKEY => ‘5BDAEA91’}
test_tony,5BDAEA91,1527665467717.41b72f8f61c425072cd75f988 column=info:regioninfo, timestamp=1527665468342, value={ENCODED => 41b72f8f61c425072cd75f988f865f54, NAME => ‘test_tony,5BDAEA91,1527665467717.41b72f8f61c425072cd75f988f865f5
f865f54. 4.’, STARTKEY => ‘5BDAEA91’, ENDKEY => ‘6E79D9382F405FB9132FA20EDF15CB0B’}
test_tony,6E79D9382F405FB9132FA20EDF15CB0B,1527665425251.c column=info:regioninfo, timestamp=1527665426209, value={ENCODED => cf3fcc71b31cfafb59d902ed84713bea, NAME => ‘test_tony,6E79D9382F405FB9132FA20EDF15CB0B,1527665425251.cf3fcc7
f3fcc71b31cfafb59d902ed84713bea. 1b31cfafb59d902ed84713bea.’, STARTKEY => ‘6E79D9382F405FB9132FA20EDF15CB0B’, ENDKEY => ‘944CAF’}
test_tony,944CAF,1527665467939.fb40ea9d9d0f08a2c880fa6eade column=info:regioninfo, timestamp=1527665468183, value={ENCODED => fb40ea9d9d0f08a2c880fa6eadeb4814, NAME => ‘test_tony,944CAF,1527665467939.fb40ea9d9d0f08a2c880fa6eadeb4814.
b4814. ‘, STARTKEY => ‘944CAF’, ENDKEY => ‘A6AD2E4’}
test_tony,A6AD2E4,1527665467939.c76afae02e1978cae057402a6d column=info:regioninfo, timestamp=1527665468183, value={ENCODED => c76afae02e1978cae057402a6dbc2948, NAME => ‘test_tony,A6AD2E4,1527665467939.c76afae02e1978cae057402a6dbc2948
bc2948. .’, STARTKEY => ‘A6AD2E4’, ENDKEY => ‘B914483’}
test_tony,B914483,1527665425463.20f73f44db08fd7279515c8f89 column=info:regioninfo, timestamp=1527665426209, value={ENCODED => 20f73f44db08fd7279515c8f89a99a35, NAME => ‘test_tony,B914483,1527665425463.20f73f44db08fd7279515c8f89a99a35
a99a35. .’, STARTKEY => ‘B914483’, ENDKEY => ‘DCEF239’}
test_tony,DCEF239,1527665425463.fbc22fc4b6b7f4ca3486234683 column=info:regioninfo, timestamp=1527665426209, value={ENCODED => fbc22fc4b6b7f4ca348623468303ae28, NAME => ‘test_tony,DCEF239,1527665425463.fbc22fc4b6b7f4ca348623468303ae28
03ae28. .’, STARTKEY => ‘DCEF239’, ENDKEY => ”}
10 row(s) in 0.0520 seconds

成功切分后查看’hbase:meta’表的记录数，发现由切割前的231条增加到切割后的237条(增加了6条)，与切割前的4个region变成切割后的10个region(增加了6个)相吻合。

hbase(main):004:0> count ‘hbase:meta’, INTERVAL => 23
Current count: 23, row: users:usertb,7FF70F148964809C8475FEE655005CE3,1523397818177.8c487f00838ae5e9e19ffced7f8914e8.
Current count: 46, row: users:usertb,2FFD4264F7EC7DED66CACB1A2D3AE8EA,1511941432094.8b142ad572b93d65e303a6a5ec5da5a3.
Current count: 69, row: users:usertb,7BF84538F6A96E274651966AF3A6EA87,1515183873848.695d98fb0a9cac2d640c2906444a32e2.
Current count: 92, row: users:usertb,BC0049FADE2ABE2F54CBD8244A861B85,1519967361809.8a7c89d7063118fe9b0e0dd1c9eafd2e.
Current count: 115, row: users:my_table,001|,1511873110944.69e71f4ba7a0f420eb287431b925b377.
Current count: 138, row: users:my_table,0DF7986,1515671013211.2157bae66326fc85b88d2f796f59613a.
Current count: 161, row: users:my_table,59F81618,1515652416016.28d28f08bef06894870ef39ac72482b6.
Current count: 184, row: users:my_table,93FBFF13995C6E8BF2C5ABB244E16B98,1512522023490.9fb718ce9e15a1bf8e4824248b761b45.
Current count: 207, row: users:my_table,E8128B65E40AA34CDC10918689DB0839,1512136535400.74654b61bb4257f7b4fdabc08dceecdd.
Current count: 230, row: test_tony,23FBF01,1527665425124.e34634b07b5d37a7adc867b0c36151b4.
237 row(s) in 0.0770 seconds
=> 237

第二部分、合并(compaction)

【HFile在hdfs上的位置】
以表users:my_table 为例说明，它有2个列簇，分别是action和basic_info，可以在hdfs上看到他们各自列蔟下面分别有唯一的 HFile ：
4a8c4ae1868d4a4bb98590c78ce388f9 和 cfa59c72c8b44790870345e53169760d
[root@tony-host-master-1-001 ~]# hdfs dfs -ls -h /apps/hbase/data/data/users/my_table/c8faec091a4b87a6b1c7270840fd3b5f/action
Found 1 items
-rw-r–r– 3 hbase hdfs 4.2 G 2018-04-27 16:55 /apps/hbase/data/data/users/my_table/c8faec091a4b87a6b1c7270840fd3b5f/action/4a8c4ae1868d4a4bb98590c78ce388f9
[root@tony-host-master-1-001 ~]# hdfs dfs -ls -h /apps/hbase/data/data/users/my_table/c8faec091a4b87a6b1c7270840fd3b5f/basic_info/
Found 1 items
-rw-r–r– 3 hbase hdfs 5.0 G 2018-04-27 17:01 /apps/hbase/data/data/users/my_table/c8faec091a4b87a6b1c7270840fd3b5f/basic_info/cfa59c72c8b44790870345e53169760d

hbase的HFile在hdfs上的路径释义：{hbase.rootdir}/名称空间/表名/region/列簇名/hfile

【客户端从Hbase查询(GET)数据的流程】
1. 客户端连接zookeeper，查找元数据表hbase:meta的位置
2. 查询元数据表 hbase:meta，根据要get的rowkey去对比每个region的start key和end key，找到特定的region，再获取该region所在的region server.
3. 到所在的region server找到该region，再由该region server向客户端返回查到的数据。
4. 若在Memstore中没有查到匹配的数据，接下来会读已持久化的StoreFile文件中的数据。StoreFile也是按key排序的树形结构的文件——并且是专门为范围查询或block查询优化过的；HBase读取磁盘文件是按其基本I/O单元(即HBase Block)读数据的。具体就是过程就是：
如果在BlockCache中能查到要造的数据则这届返回结果，否则就读去相应的StoreFile文件中读取一block的数据，如果还没有读到要查的数据，就将该数据block放到HRegion Server的blockcache中，然后接着读下一个block块儿的数据，这样循环读block数据直到找到要get的数据并返回结果；如果在该Region中的数据都没有查到，最后直接返回null表示没有找到匹配数据。那么blockcache会在其size大于某个阀值（heapsize * hfile.block.cache.size * 0.85）后启动基于LRU算法的淘汰机制，将最老最不常用的block删除。
5. hbase:meta表中的信息会被缓存起来，以便下次查找。

【每台regionServer的内存结构】
每台regionServer中的内存由BlockCache和MemStore组成。每台region server只有一个blockCache，可以有多个memStore。
客户端查询数据的顺序是 memStore –> blockCache –> HFile，先在metStore找，找不到就到blockCache,再找不到就读取HFile文件到内存中。从HFile中读到的数据保存在blockCache，每个列族都有自己的blockCache（block是建立索引的最小数据单元，block大小可调整，默认64kb）。

【为什么要合并(compaction)】
由于一整行的信息可能存放在多个HFile中，为了读出完整行，Hbase可能需要读取包含该行信息的所有HFile，所以需要compaction来减少读取HFile数量，从而提高查询效率。
每个region由1个或多个HStore组成。
每个列簇对应一个 HStore ，每1个 HStore = 1个memStore + 0到多个StoreFile

【为什么能提高读性能】
不论是小型的还是大型的合并操作，都会对HBase的数据做多次重新读写，整个过程会产生大量I/O，
直接的结果是减少了store file的数量，从而让读请求需要的磁盘I/O频次、网络I/O频次更少，也就是说客户端只需访问更少的文件就能获取数据了。

【compaction的作用】
1，清除已标记为删除的记录、过期的记录、多余版本的记录。
2，合并小的store file文件成为更大的store file文件，减少store file的数量，显著提升集群的性能，吞吐量。

【compact操作的类型】
（1）小型操作(minor compaction)（2）大型操作(major_compaction).
（1）Minor Compaction
它会把读取后写成一个新的更大的文件。
小合并是将相邻的几个Store File小的HFile文件内容读取出来合并生成一个更大的HFile, 把新文件设置成激活状态，然后删除小的HFile。
我们通过hbase shell或者客户端的delete操作来删除一条记录，就会在该记录上打上标记，被打上标记的记录就成了墓碑记录，该记录使用get和scan查询不到，但还是在HFile中。
minor compaction会清理掉minVersion=0以及设置了TTL的过期数据,但是不会做任何删除数据，也不会清理多版本数据（major_compact可以）。

（2）Major Compaction
通常管理技术是手动管理主合并(major compactions)，而不是让HBase 来做。可通过设置HConstants.MAJOR_COMPACTION_PERIOD 的值为0来关闭自动主合并。
合并一个Region中每一个列族的所有HFile写到一个HFile中，此操作会删除掉那些标记为删除和过期的cells。
具体操作的话，在hbase shell中的命令是major_compact，它会对region下的每个 HStore 里面的所有StoreFile执行合并，最终的结果每个HStore只有一个HFile.
(当一个Region内的所有存储文件中最大的那个HFile大于hbase.hregion.max.filesize所设置的大小触发split region, 即1个region ==> 2个新的region）
业务上的做法是通过设置参数hbase.hregion.majorcompaction=0来关闭自动major comapction，在业务不忙的时候手动进行major_compact
常用的major_compact操作示例:
Compact某一个表的所有region:
hbase> major_compact ‘table_name’
hbase> major_compact ‘namespace:table_name’
Compact一整个region:
hbase> major_compact ‘region_name’
Compact一个region中的某一个列簇:
hbase> major_compact ‘region_name’, ‘cf_name’
Compact一个表中的某一个列簇:
hbase> major_compact ‘table_name’, ‘cf_name’

【hbase的合并compaction到底做了什么】
只有进行大合并(major_compact)的时候才会删除HFile中的墓碑记录。major_compact是针对region的一个列簇的所有HFile做合并，大合并完成后，这个列簇的所有HFile文件合并成一个HFile文件。
大合并可以在hbase shell中手动触发，但此操作很耗资源，建议在hbase集群不忙的情况下(例如凌晨)用脚本执行。

【延伸】compaction有关的参数介绍
hbase.hstore.compaction.max：默认值为10
// minor_compact级别的合并，一次最多合并几个storefile，避免OOM

hbase.hstore.blockingStoreFiles ：默认为7
// 如果region中的任何一个Hstore(非.META.表里的store)里面的StoreFile文件数量大于该值(7)，则在flush memstore前必须先进行 split(或者compaction),即阻塞memStore的数据flush到磁盘中，
同时把该region添加到flushQueue，延时刷新(flush)，这期间会阻塞写操作直到compact(或者split region)完成，或者超过hbase.hstore.blockingWaitTime(默认90s)配置的时间，可以设置为30，避免memstore不及时flush。
当regionserver运行日志中出现大量的“Region has too many store files; delaying flush up to 90000ms”时，说明这个值需要调整了。
// 也就是说，如果region中的任意一个HStore中的StoreFile数量超过hbase.hstore.blockingStoreFiles的值(7)，
则会阻塞从memStore flush到StoreFile的操作，为什么阻塞呢——因为sotre file太多了，会降低读性能(因为数据在store file中，store file位于HDFS(磁盘)中而不是内存中的blockCache或memStore)。
阻塞时长为hbase.hstore.blockingWaitTime，在这段时间内如果compact storeFile或者split region操作使得HStore文件数下降到回这个值(7)，则停止阻塞。
阻塞超出时长后,会恢复执行flush操作。这样做就可以有效地控制大量写请求的速度，但同时这也是影响写请求速度的主要原因之一。

好了就写到这里，看官们觉得涨知识了，请在文章左侧点个赞 ^_^


【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：关于Hbase认证失效问题的解决	下一篇：hbase的数据写入方式