本文基于HBase-0.94.1分析HMaster的启动流程。
1.HMaster命令行启动简述
HMaster的启动流程可以概括为:
将用户所要执行的"hbase-daemon.sh start master" 操作封装成一个HMasterCommandLine对象(一个tool实例),交给org.apache.hadoop.util.ToolRunner的静态方法run(conf,tool,args) 去执行;其中args为"start". 具体流程如下:
通过$HBASE_HOME/bin/hbase-daemon.sh start master 启动master时,会调用$HBASE_HOME/bin/hbase start master
$HBASE_HOME/bin/hbase start master
可以看下$HBASE_HOME/bin/hbase的内容:
"$JAVA" -XX:OnOutOfMemoryError="kill -9 %p" $JAVA_HEAP_MAX $HBASE_OPTS -classpath "$CLASSPATH" $CLASS "$@"
也即执行了如下方法:
org.apache.hadoop.hbase.master.HMaster.main("start")
HMaster的main方法创建了一个HMasterCommandLine对象,执行该对象的doMain(args)方法。
/**
* @see org.apache.hadoop.hbase.master.HMasterCommandLine
*/
public static void main(String [] args) throws Exception {
VersionInfo.logVersion();
new HMasterCommandLine(HMaster.class).doMain(args);
}
ServerCommandLine是HMasterCommandLine的父类,它实现了Tool接口,通过Hadoop中的ToolRunner机制执行启动/停止等各种命令
/**
* Parse and run the given command line. This may exit the JVM if
* a nonzero exit code is returned from <code>run()</code>.
*/
public void doMain(String args[]) throws Exception {
int ret = ToolRunner.run(
HBaseConfiguration.create(), this, args);
if (ret != 0) {
System.exit(ret);
}
}
2.HMaster的启动采用了ToolRunner机制
ToolRunner的run方法如下:
(1). 将conf和args封装成GenericOptionsParser对象parser, 根据parser获取toolArgs
(2). 返回tool.run(toolArgs);
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);
//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
在HMaster启动过程中,tool.run(toolArgs)也即HMasterComandLine.run(toolArgs),代码如下:
public int run(String args[]) throws Exception {
.......
if ("start".equals(command)) {
return startMaster();
} else if ("stop".equals(command)) {
return stopMaster();
} else {
usage("Invalid command: " + command);
return -1;
}
}
也即,执行HMasterCommandLine的startMaster()方法
private int startMaster() {
Configuration conf = getConf();
try {
// If 'local', defer to LocalHBaseCluster instance. Starts master
// and regionserver both in the one JVM.
if (LocalHBaseCluster.isLocal(conf)) {
....
} else {
HMaster master = HMaster.constructMaster(masterClass, conf);
...
master.start();
master.join();
...
return 0;
}
这里调用了HMaster.constructMaster(masterClass,conf)方法构建一个master线程,然后执行master.start()和master.join().
至此,我们已经将启动HMaster的命令和启动HMaster的代码对应起来了。
3.HMaster启动的内部细节
首先看看HMaster的构造函数,它所做的事情可以归纳为以下几点:
1.初始化相关配置 2.初始化rpcServer 3.初始化zk监控类
public HMaster(final Configuration conf)
throws IOException, KeeperException, InterruptedException {
this.conf = new Configuration(conf);//1.配置设置
// (1.1) HMaster端需要禁用block cache.
this.conf.setFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY, 0.0f);
// (1.2) 设置尝试次数//Set how many times to retry talking to another server over HConnection.
HConnectionManager.setServerSideHConnectionRetries(this.conf, LOG);
//2.初始化hostname参数
...
//3.初始化rpcServer ,将HMaster自己封装成RPC Server.
...
this.rpcServer = HBaseRPC.getServer(this,
new Class<>[]{HMasterInterface.class, HMasterRegionInterface.class},
initialIsa.getHostName(), // This is bindAddress if set else it's hostname
initialIsa.getPort(),
numHandlers,
0, // we dont use high priority handlers in master
conf.getBoolean("hbase.rpc.verbose", false), conf,
0); // this is a DNC w/o high priority handlers
//4.zookeeper安全相关
ZKUtil.loginClient(this.conf, "hbase.zookeeper.client.keytab.file",
"hbase.zookeeper.client.kerberos.principal", this.isa.getHostName());
...
//5. 更改master的配置,以添加副本相关的特性.
Replication.decorateMasterConfiguration(this.conf);
if (this.conf.get("mapred.task.id") == null) {
this.conf.set("mapred.task.id", "hb_m_" + this.serverName.toString());
}
//6.启动zk client和rpcserver
this.zooKeeper = new ZooKeeperWatcher(conf, MASTER + ":" + isa.getPort(), this, true);
this.rpcServer.startThreads();
this.metrics = new MasterMetrics(getServerName().toString());
//7.启动健康检查线程
...
}
接下来,看看HMaster的run方法,首先启动infoServer,然后就一直阻塞在becameActiveMaster()处。
(PS:如果当前HMaster成功的由backupMaster变成activeMaster了,则进行finishInitialization操作)
@Override
public void run() {
MonitoredTask startupStatus =
TaskMonitor.get().createStatus("Master startup");
startupStatus.setDescription("Master startup");
masterStartTime = System.currentTimeMillis();
try {//
this.registeredZKListenersBeforeRecovery = this.zooKeeper.getListeners();
//1.启动info server // Put up info server.
int port = this.conf.getInt("hbase.master.info.port", 60010);
...
this.infoServer.start();
//2.尝试成为active master. 整个HMaster的生命周期都在becomeActiveMaster()里
becomeActiveMaster(startupStatus);
//3.如果我们是active master 或者我们被要求shutdown ,finishInitialization
if (!this.stopped) {
//成为activeMaster后,完成Master初始化工作 finishInitialization(startupStatus, false);
// 进入主循环
loop();
}
} catch (Throwable t) {
} finally {
startupStatus.cleanup();
...
}
}
becomeActiveMaster方法通过当前HMaster构造一个ActiveMasterManager对象,调用blockUntilBecomingActiveMaster(startupStatus)方法,阻塞直至成为ActiveMaster
private boolean becomeActiveMaster(MonitoredTask startupStatus)
throws InterruptedException {
...
this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
this);
this.zooKeeper.registerListener(activeMasterManager);
//阻塞,直至成为active master 才返回.
return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus);
}
ActiveMasterManager的代码片段
private boolean becomeActiveMaster(MonitoredTask startupStatus)
throws InterruptedException {
...
this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
this);
this.zooKeeper.registerListener(activeMasterManager);
//阻塞,直至成为active master 才返回.
return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus);
}
/**阻塞等待,直到自己成为active master
*/
boolean blockUntilBecomingActiveMaster(MonitoredTask startupStatus) {
while (true) {
...
try {
//1.获取backupZNode,默认/hbase/backup-master/${SERVER-NAME}
String backupZNode = ZKUtil.joinZNode( this.watcher.backupMasterAddressesZNode, this.sn.toString());
//2.尝试创建master ZNode,默认/hbase/master.
if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, this.sn.getVersionedBytes())) {//创建成功,表示当前HMaster成为active master
//2.1成为active master后要删除该hmaster在/hbase/backup-master/下建的znode
ZKUtil.deleteNodeFailSilent(this.watcher, backupZNode);
...
this.clusterHasActiveMaster.set(true);
...
return true;
}
// 3.当前hmaster无法创建/hbase/master(因为已经有别的active master创建了). 说明当前集群有active master,将标志值true
this.clusterHasActiveMaster.set(true);
// 4.因为当前hmaster没有成为active master, 则在/hbase/backup-master下创建znode,表示自己是backup-master
ZKUtil.createEphemeralNodeAndWatch(this.watcher, backupZNode, this.sn.getVersionedBytes());
//5.获取当前active master的znode数据.
String msg;
byte [] bytes = ZKUtil.getDataAndWatch(this.watcher, this.watcher.masterAddressZNode);
if (bytes == null) {
//(4.1)active master的znode数据为空,表示active master挂掉了
...
} else {
// (4.2)active master的znode正常
ServerName currentMaster = ServerName.parseVersionedServerName(bytes);
if (ServerName.isSameHostnameAndPort(currentMaster, this.sn)) {
//(4.2.1) active master的地址和当前hmaster的相同,说明master可能刚刚进行了重启
// 将原来active master的znode删掉.保证所有的backup-master继续竞选master
this.watcher.masterAddressZNode); } else {
//(4.2.2) active master 正常
...
}
}
LOG.info(msg); startupStatus.setStatus(msg);
}catch (KeeperException ke) {
...
return false;
}
//6.同步访问clusterHasActiveMaster,如果为true且当前hmaster没有被stop,则释放锁,等待被唤醒.
synchronized (this.clusterHasActiveMaster) {
//注:nodeCreated和nodeDeleted,stop方法可能会唤醒该方法.
while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
try {
this.clusterHasActiveMaster.wait();
}catch (InterruptedException e) {
}
} if(clusterShutDown.get()) {
this.master.stop("...");
}
if (this.master.isStopped()) {
return false;
}
// Try to become active master again now that there is no active master } } }
至此,HMaster启动的整个流程也就分析完啦。