设为首页 加入收藏

TOP

hive排序特性研究(二)
2014-11-24 07:25:19 来源: 作者: 【 】 浏览:5
Tags:hive 排序 特性 研究
scrict模式,且未在order by 后面加limit ,且指定分区
hive> select * from tb_in_base where job_time=030729 order by devid limit 2;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201307151509_15432, Tracking URL = http://mwtec-50:50030/jobdetails.jsp jobid=job_201307151509_15432
Kill Command = /home/hadoop/hadoop-0.20.2/bin/hadoop job -Dmapred.job.tracker=mwtec-50:9002 -kill job_201307151509_15432
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-08-05 16:47:32,900 Stage-1 map = 0%, reduce = 0%
2013-08-05 16:47:34,920 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:35,927 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:36,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:37,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:38,946 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:39,953 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:40,959 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:41,965 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2013-08-05 16:47:42,971 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.05 sec
2013-08-05 16:47:43,977 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.05 sec
2013-08-05 16:47:44,983 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.05 sec
MapReduce Total cumulative CPU time: 3 seconds 50 msec
Ended Job = job_201307151509_15432
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.05 sec HDFS Read: 458 HDFS Write: 44 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 50 msec
OK
1 121212 test1 030729
2 131313 test2 030729
Time taken: 17.597 seconds
结果说明:严格模式下,使用order by 不仅需要指定limit 数量,若有表分区还需要指定表分区。
2.2 sort by
sort可以控制每个reduce产生的文件都是排序,再对多个排序的好的文件做二次归并排序。sort by 特点如下:
1) . sort by 基本受hive.mapred.mode是否为strict、nonstrict的影响,但若有分区需要指定分区。
2). sort by 的数据在同一个reduce中数据是按指定字段排序。
3). sort by 可以指定执行的reduce个数,如:set mapred.reduce.tasks=5 ,对输出的数据再执行归并排序,即可以得到全部结果。
-- 在hive.mapred.mode为nonstrict时
hive> select id,devid from tb_in_base sort by devid;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201307151509_15434, Tracking URL = http://mwtec-50:50030/jobdetails.jsp jobid=job_201307151509_15434
Kill Command = /home/hadoop/hadoop-0.20.2/bin/hadoop job -Dmapred.job.tracker=mwtec-50:9002 -kill job_201307151509_15434
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-08-
首页 上一页 1 2 3 4 5 6 7 下一页 尾页 2/10/10
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇Redis批量导入数据 下一篇如何优化单表大批量数据提取插入..

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容:

·Libevent C++ 高并发 (2025-12-26 00:49:30)
·C++ dll 设计接口时 (2025-12-26 00:49:28)
·透彻理解 C 语言指针 (2025-12-26 00:22:52)
·C语言指针详解 (经典 (2025-12-26 00:22:49)
·C 指针 | 菜鸟教程 (2025-12-26 00:22:46)