Hadoop Java程序-files功能测试 - linux编程基础

后来发现GenericOptionsParser能解析一些特有命令参数，并且做相应处理，例如：遇到-files参数时，将文件上传到mapper节点。经过测试，-files命令参数必须在hadoop jar后紧接着，这个可以通过streaming来查看使用规范，如下：

Usage: $HADOOP_HOME/bin/hadoop jar \
$HADOOP_HOME/hadoop-streaming.jar [options]
Options:
-input DFS input file(s) for the Map step
-output DFS output directory for the Reduce step
-mapper The streaming command to run
-combiner The streaming command to run
-reducer The streaming command to run
-file File/dir to be shipped in the Job jar file.
Deprecated. Use generic option "-files" instead
-inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
-outputformat TextOutputFormat(default)|JavaClassName Optional.
-partitioner JavaClassName Optional.
-numReduceTasks Optional.
-inputreader Optional.
-cmdenv = Optional. Pass env.var to streaming commands
-mapdebug Optional. To run this script when a map task fails
-reducedebug Optional. To run this script when a reduce task fails
-io Optional.
-lazyOutput Optional. Lazily create Output
-verbose

Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a job tracker
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

hadoop 执行java程序也需要遵循该命令参数规范，特别是-D -libjars -files等参数。

测试代码：

package wordcount.com.cn;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

@SuppressWarnings("deprecation")
public class WordCount {

static class SimpleMapper extends Mapper
{
BufferedReader reader = null;
List lines = new ArrayList(); //简单测试，没有任何业务逻辑

public void setup(Context context) throws IOException
{
FileReader fr = new FileReader("test_upload_file"); //必须和上传文件名一致
reader = new BufferedReader(fr);

String line = null;
while((line = reader.readLine()) != null)
lines.add(line);
System.out.println(lines);
}
@Override
public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException
{
for(String line:lines)
context.write(new Text("key"),new Text(line));
}
}

static class SimpleReducer extends Reducer
{
public void reduce(Text key, Iterable values,, Context context)throws IOException, InterruptedException
{
for(Text value: values)
{
context.write(key, value);
}
}

Hadoop Java程序-files功能测试(一)