本节开始,将对 ResourceManager 中一些常见行为进行分析探究,看某些具体关键的行为,在 RM 中是如何流转的。本节将深入源码探究「启动 ApplicationMaster」的具体流程。
一、整体流程
本小节介绍从应用程序提交到启动 ApplicationMaster 的整个过程,期间涉及 Client、RMService、 RMAppManager、RMApplmpl、RMAppAttemptImpl、RMNode、ResourceScheduler 等几个主要组件。当客户端调用 RPC 函数 ApplicationClientProtocol#submitApplication
后, ResourceManager 端的处理过程如下图所示。
二、具体流程分析
接下来跟随上面的流程图,我们深入源码具体分析每一步都是如何执行的:
最开始由客户端发起任务提交 submitApplication()
,经过 ClientRMService
和 RMAppManager
发送 RMAppEventType.START
事件,之后交由 RMAppImpl
处理。
protected void submitApplication(
ApplicationSubmissionContext submissionContext, long submitTime,
String user) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId();
RMAppImpl application =
createAndPopulateNewRMApp(submissionContext, submitTime, user, false);
Credentials credentials = null;
try {
credentials = parseCredentials(submissionContext);
if (UserGroupInformation.isSecurityEnabled()) {
this.rmContext.getDelegationTokenRenewer()
.addApplicationAsync(applicationId, credentials,
submissionContext.getCancelTokensWhenComplete(),
application.getUser());
} else {
// Dispatcher is not yet started at this time, so these START events
// enqueued should be guaranteed to be first processed when dispatcher
// gets started.
// 这里发送 RMAppEventType.START 事件
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.START));
}
RMAppImpl
这东西是个状态机,收到事件之后会自己转换状态并且处理相应的逻辑。
(状态机还不熟悉的同学,可翻到我前面的文章进行学习《2-4 Yarn 基础库 - 状态机库》)
截取一部分状态转换代码:
private static final StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent> stateMachineFactory
= new StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent>(RMAppState.NEW)
// Transitions from NEW state
.addTransition(RMAppState.NEW, RMAppState.NEW,
RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
// 收到 RMAppEventType.START 事件
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
.addTransition(RMAppState.NEW, EnumSet.of(RMAppState.SUBMITTED,
RMAppState.ACCEPTED, RMAppState.FINISHED, RMAppState.FAILED,
RMAppState.KILLED, RMAppState.FINAL_SAVING),
RMAppEventType.RECOVER, new RMAppRecoveredTransition())
.addTransition(RMAppState.NEW, RMAppState.KILLED, RMAppEventType.KILL,
new AppKilledTransition())
.addTransition(RMAppState.NEW, RMAppState.FINAL_SAVING,
RMAppEventType.APP_REJECTED,
new FinalSavingTransition(new AppRejectedTransition(),
RMAppState.FAILED))
一)RMAppImpl - START
收到 RMAppEventType.START
事件之后,会执行 RMAppNewlySavingTransition()
。
private static final class RMAppNewlySavingTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
// If recovery is enabled then store the application information in a
// non-blocking call so make sure that RM has stored the information
// needed to restart the AM after RM restart without further client
// communication
LOG.info("Storing application with id " + app.applicationId);
app.rmContext.getStateStore().storeNewApplication(app);
}
}
跟下去会发现它发出 RMStateStoreEventType.STORE_APP
事件,去 RMStateStore
中找一下对应的事件处理。发现也是个状态机:
.addTransition(RMStat