Number of reduce tasks not specified. Defaulting to jobconf value of: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201307151509_15499, Tracking URL = http://mwtec-50:50030/jobdetails.jsp jobid=job_201307151509_15499
Kill Command = /home/hadoop/hadoop-0.20.2/bin/hadoop job -Dmapred.job.tracker=mwtec-50:9002 -kill job_201307151509_15499
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2013-08-05 18:37:14,681 Stage-1 map = 0%, reduce = 0%
2013-08-05 18:37:16,691 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
2013-08-05 18:37:17,697 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
2013-08-05 18:37:18,703 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
2013-08-05 18:37:19,710 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
2013-08-05 18:37:20,717 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
2013-08-05 18:37:21,727 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
2013-08-05 18:37:22,733 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
2013-08-05 18:37:23,739 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 3.1 sec
2013-08-05 18:37:24,745 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.89 sec
2013-08-05 18:37:25,751 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.89 sec
2013-08-05 18:37:26,757 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.89 sec
MapReduce Total cumulative CPU time: 4 seconds 890 msec
Ended Job = job_201307151509_15499
Copying data to local directory /tmp/hivetest/distributeby
Copying data to local directory /tmp/hivetest/distributeby
7 Rows loaded to /tmp/hivetest/distributeby
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 2 Cumulative CPU: 4.89 sec HDFS Read: 458 HDFS Write: 112 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 890 msec
OK
Time taken: 16.785 seconds
查看写入的查询数据:
结果说明:distribute by采用hash算法,将查询的结果写入不同的reduce文件中。数据分配到哪个reduce文件中,是在map端控制的。
--使用job_time进行排序且set mapred.reduce.tasks=2;
hive> set mapred.reduce.tasks=2;
hive> insert overwrite local directory '/tmp/hivetest/distributeby' select id,devid,job_time from tb_in_base where job_time=030729 distribute by job_time;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201307151509_15500, Tracking URL = http://mwtec-50:50030/jobdetails.jsp jobid=job_201307151509_15500
Kill Command = /home/hadoop/hadoop-0.20.2/bin/hadoop job -Dmapred.job.tracker=mwtec-50:9002 -kill job_201307151509_15500
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2013-08-05 18:42:07,764 Stage-1 map = 0%, reduce = 0%
2013-08-05 18:42:10,778 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.61 sec
201