TOP

HBase-Regions in Transition 问题
2018-12-30 01:49:45 】 浏览:393
Tags:HBase-Regions Transition 问题

一. 问题描述

发现hbase集群中有一个表的region在SPLITING 状态持续很久不结束,HMaster节点进行full gc 可以回收 。

同时在此期间提交的创建表和drop表的操作无效,显示在ENABLING和DISABLING状态,无法成功完成创建表和删除表的操作

test_mgq        ENABLING
test_userfriend       DISABLING
test_userfriendlwj    ENABLING

在HMaster报错信息如下

2014-04-09 00:17:52,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:17:59,706 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1 regions.
2014-04-09 00:18:02,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:18:12,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:18:22,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:18:32,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:18:39,602 DEBUG org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. starting at key ''
2014-04-09 00:18:39,602 DEBUG org.apache.hadoop.hbase.client.ClientScanner: Advancing internal scanner to startKey at ''
2014-04-09 00:18:40,220 DEBUG org.apache.hadoop.hbase.client.ClientScanner: Finished with scanning at {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,}
2014-04-09 00:18:41,549 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 16924 catalog row(s) and gc'd 0 unreferenced parent region(s)
2014-04-09 00:18:42,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ-134.opi.com,60020,1395937706714
2014-04-09 00:18:52,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ-134.opi.com,60020,1395937706714
2014-04-09 00:19:02,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:19:12,008 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:19:22,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:19:32,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:19:42,007 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
2014-04-09 00:19:47,451 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {50aa699775f00b753aaaa6e029fed887=mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714}
2014-04-09 00:19:52,008 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  mau_selecteduser,2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF2\xF1QQQQQQQQQ,1396415717740.50aa699775f00b753aaaa6e029fed887. state=SPLITTING, ts=1396415877142, server=YZ18-134.opi.com,60020,1395937706714
^C



产看相应的ts时间为

Unix时间戳(Unix timestamp)

1396415877142

转换为北京时间2014年4月2日 下午1:17:57
查看YZ18-134.opi.com 这台机器上的regionserver日志发现:


处理办法:

1.简单粗暴:

直接重启对应机器134的regionserver,让这台机器的上的region信息重新汇报,SPLITING状态消失。

恢复后mau_selectuser表状态恢复,并且测试的两个表的状态也很快正常

test_userfriend_community       DISABLED
test_userfriend_communitylwj    ENABLED

HBase Region Split过程详解

1、检查是否需要进行Region Split的时机:

每次flush或者compact之后,regionserver都会去检查是否满足了Split的条件。

2、Region Split的过程如下:

(1)RegionServer在Zookeeper上创建一个/hbase/region-in-transition/region-name结点,并设置结点的内容为SPLITTING

(2)由于Master监听/hbase/region-in-transition,所以(1)发生时,Master会收到相应的通知。

(3)RegionServer在HDFS上的parent’s region 目录下创建一个.splits目录

(4)RegionServer关闭ParentRegion,同时强制执行flush操作,并在RegionServer的本地数据结构中将该Region标记为offline状态。此时当客户端再请求该ParentRegion时,会抛出NotServingRegionException的异常,客户端会不断的进行尝试。

(5)RegionServer在.splits目录下创建daughter regions A and B两个子目录,并创建对应的数据结构。然后,RegionServer开始对ParentRegion中所有StoreFile执行Split的操作。此阶段RegionServer只会为ParentRegion中的每一个StoreFile创建两个索引文件。

(6)RegionServer在HDFS上分别为daughterA Region和daughterB Region创建实际的存储目录

(7)RegionServer向.META.表发送一个Put请求。此请求首先将.META.表中的ParentRegion标记为offline,然后将daughterA Region和daughterB Region的信息添加到.META.表中。但是此时在.META.表并不存在代表daughterA和daughterB的单独实体。此时查询.META.表,我们将看到ParentRegion正在进行Split,但是看不到daughter的信息。如果RegionServer 执行Put操作执行成功,那么ParentRegion将会被成功的Split。如果RegionServer执行Put操作失败,Master和下一个打开ParentRegion的RegionServer会将关于ParentRegion的Split操作的脏数据删除掉。

(8)RegionServer打开daughterA Region和daughterB Region,然后daughter Region开始接受写请求。

(9)RegionServer将daughterAdaughterB的信息添加到.META.表中。之后,客户端才能够发现daughterAdaughterB region,并向daughter Region发送请求。

(10)RegionServer 将zookeeper上的/hbase/region-in-transition/region-name结点的状态更新为SPLIT,此时Master会收到状态更新的通知,然后Balanceer可以将daughter Region指定到其他的RegionServer上。

(11) Split过程结束之后,HDFS和META中还会保留着指向parent region的索引文件的信息。这些索引文件会在daughter Region执行Major Compact来对StoreFile进行重写时删除掉。Master中的Garbage collection任务会周期性的检查daughter regions中是否还包含指向parentsRegion的索引文件,如果不包含,Master会将parentsRegion删除掉。



HBase-Regions in Transition 问题 https://www.cppentry.com/bencandy.php?fid=118&id=201680

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Hadoop和Hbase动态扩展 下一篇HBase(2.5)-LSM树(基于日志结构的..