设为首页 加入收藏

TOP

Hadoop Commands是怎样被执行的?
2019-02-19 12:35:54 】 浏览:81
Tags:Hadoop Commands 怎样 执行

在命令行执行 hadoop fs -ls / 这种 Hadoop Commands 时,系统内部是怎样处理的呢?

1. bash 处理

可以用 linux 的whichll命令查看hadoop命令的源头:

$ which hadoop
/usr/bin/hadoop
$ ll /usr/bin/hadoop
lrwxrwxrwx 1 root root 24 11-19 18:55 /usr/bin/hadoop -> /etc/alternatives/hadoop
$ ll /etc/alternatives/hadoop
lrwxrwxrwx 1 root root 64 11-19 18:55 /etc/alternatives/hadoop -> /app/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin/hadoop

可以发现,在CDH集群中,hadoop命令最终指向了/app/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin/hadoop文件,使用vim查看该文件,可以发现命令又被指向了/app/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/bin/hadoop,使用vim查看该文件,以下截取其核心部分:

  #core commands
  *)
    # the core commands
    if [ "$COMMAND" = "fs" ] ; then
      CLASS=org.apache.hadoop.fs.FsShell
    elif [ "$COMMAND" = "version" ] ; then
      CLASS=org.apache.hadoop.util.VersionInfo
    elif [ "$COMMAND" = "jar" ] ; then
      CLASS=org.apache.hadoop.util.RunJar
    elif [ "$COMMAND" = "key" ] ; then
      CLASS=org.apache.hadoop.crypto.key.KeyShell
    elif [ "$COMMAND" = "checknative" ] ; then
      CLASS=org.apache.hadoop.util.NativeLibraryChecker
    elif [ "$COMMAND" = "distcp" ] ; then
      CLASS=org.apache.hadoop.tools.DistCp
      CLASSPATH=${CLASSPATH}:${TOOL_PATH}
    elif [ "$COMMAND" = "daemonlog" ] ; then
      CLASS=org.apache.hadoop.log.LogLevel
    elif [ "$COMMAND" = "archive" ] ; then
      CLASS=org.apache.hadoop.tools.HadoopArchives
      CLASSPATH=${CLASSPATH}:${TOOL_PATH}
    elif [ "$COMMAND" = "credential" ] ; then
      CLASS=org.apache.hadoop.security.alias.CredentialShell
    elif [ "$COMMAND" = "s3guard" ] ; then
      CLASS=org.apache.hadoop.fs.s3a.s3guard.S3GuardTool
      CLASSPATH=${CLASSPATH}:${TOOL_PATH}
    elif [ "$COMMAND" = "trace" ] ; then
      CLASS=org.apache.hadoop.tracing.TraceAdmin
    elif [ "$COMMAND" = "classpath" ] ; then
      if [ "$#" -eq 1 ]; then
        # No need to bother starting up a JVM for this simple case.
        echo $CLASSPATH
        exit
      else
        CLASS=org.apache.hadoop.util.Classpath
      fi
    elif [[ "$COMMAND" = -*  ]] ; then
        # class and package names cannot begin with a -
        echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'"
        exit 1
    else
      CLASS=$COMMAND
    fi
    shift

    exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
    ;;

可以发现,shell脚本对不同的命令行输入(如,fsdistcp)指定了不同的$CLASS变量,映射到不同的类,并最终用Java执行这些类(后续的参数作为Java参数被传入应用中)。

2. Java 处理

hadoop fs命令为例:
a. fs映射到org.apache.hadoop.fs.FsShell类,该类对应的main方法如下:

  public static void main(String argv[]) throws Exception {
    FsShell shell = newShellInstance();
    Configuration conf = new Configuration();
    conf.setQuietMode(false);
    shell.setConf(conf);
    int res;
    try {
      res = ToolRunner.run(shell, argv);
    } finally {
      shell.close();
    }
    System.exit(res);
  }

在main方法中初始化了Configuration,并通过 ToolRunner 的run()方法执行命令。
b. ToolRunner的run()方法代码如下

  public static int run(Configuration conf, Tool tool, String[] args) 
    throws Exception{
    if(conf == null) {
      conf = new Configuration();
    }
    GenericOptionsParser parser = new GenericOptionsParser(conf, args);
    //set the configuration back, so that Tool can configure itself
    tool.setConf(conf);
    
    //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs();
    return tool.run(toolArgs);
  }

ToolRunner的run方法会实例化一个GenericOptionsParser,用于解析通用配置,如果命令行中包括fsjtconflibjarsfilesarchivesDtokenCacheFile这些关键字,就会在这个通用配置解析阶段被解析,并进行相应的配置。
之后,利用getRemainingArgs()方法获取其它参数。
最后,调用tool.run(),这里的run方法是工具类(即FsShell)本身实现的。
c. FsShell的run方法代码如下:

  public int run(String argv[]) throws Exception {
    // initialize FsShell
    init();
    int exitCode = -1;
    ...
        try {
          exitCode = instance.run(Arrays.copyOfRange(argv, 1, argv.length));
        } 
    ...
    return exitCode;
  }

沿着代码可以继续深入,研究fs命令对应的执行逻辑,这篇文章主要是讲述命令是如何被提交执行的,更细节的过程就不继续展开了。

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Hadoop 2.7.2   linux分布.. 下一篇Hadoop提交作业------>hadoop..

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目