Hadoop源码浅析——Job提交相关 - linux编程基础

Configuration类首先会通过静态代码段加载hadoop的配置文件core-default.xml和和core-site.xml，相关代码如下：

static{
//print deprecation warning if hadoop-site.xml is found in classpath
ClassLoader cL = Thread.currentThread().getContextClassLoader();
if (cL == null) {
cL = Configuration.class.getClassLoader();
}
if(cL.getResource("hadoop-site.xml")!=null) {
LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
"Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, "
+ "mapred-site.xml and hdfs-site.xml to override properties of " +
"core-default.xml, mapred-default.xml and hdfs-default.xml " +
"respectively");
}
addDefaultResource("core-default.xml");
addDefaultResource("core-site.xml");
}

defaultResources是一个ArrayList，用来保存默认的配置文件路径。如果一个默认的配置文件路径不在defaultResource里面，就添加进去，这个逻辑是在

addDefaultResource方法中实现的。

properties是一个Properties对象，保存从配置文件中解析出来的配置属性，如果多个配置文件有相同的key，后者会覆盖前者的值。

JobConf类用来配置Map/Reduce作业信息的，继承自Configuration类。

JobConf类首先会通过静态代码段加载mapred-default.xml和mapred-site.xml配置属性文件。

DEFAULT_MAPRED_TASK_JAVA_OPTS=“-Xmx200m”，默认情况下Map/Reduce任务的JAVA命令行选项指定的JAVA虚拟机最大内存是200M。

JobClient类是用户与JobTracker交互的主要接口，通过它可以提交jobs，追踪job的进度，访问task组件的日志，查询集群的状态信息等。

提交job是通过runJob方法实现的，相关代码如下：

public static RunningJob runJob(JobConf job) throws IOException {
JobClient jc = new JobClient(job);
RunningJob rj = jc.submitJob(job);
try {
if (!jc.monitorAndPrintJob(job, rj)) {
LOG.info("Job Failed: " + rj.getFailureInfo());
throw new IOException("Job failed!");
}
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
}
return rj;
}