设为首页 加入收藏

TOP

hive排序特性研究(一)
2014-11-24 07:25:19 来源: 作者: 【 】 浏览:6
Tags:hive 排序 特性 研究
hive排序特性研究
1. 排序定义
所谓排序就是使一串记录,按照其中的某个或某些关键字,递增或是递减的排列。
2. hive 中排序相关内容
2.1 order by
order by 会对输入做全局排序,故只有一个reducer,若数据的规模比较大时,需要较长的计算时间。hive中order by 也是对一个结果集进行排序,不同于关系型 数据库是底层架构。hive的hive-site.xml配置文件中的参数hive.mapred.mode控制着hive的执行方式,若选择strict,则order by 则需要指定limit(若有分区还有指定哪个分区) ;若为nostrict,则与关系型数据库差不多。由于order by 执行时,只有一个reducer ,如果结果集过大,那执行时间相对会比较漫长。
注:若不想修改配置文件,可临时执行:set hive.mapred.mode=nonstrict 或set hive.mapred.mode=strict;也可以在当前会话中达到同样的效果。
测试:
--未开启strict模式,即nostrict模式
hive> select id,devid,devname from tb_in_base order by devid;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201307151509_15431, Tracking URL = http://mwtec-50:50030/jobdetails. jsp jobid=job_201307151509_15431
Kill Command = /home/hadoop/hadoop-0.20.2/bin/hadoop job -Dmapred.job.tracker=mwtec-50:9002 -kill job_201307151509_15431
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-08-05 16:33:21,817 Stage-1 map = 0%, reduce = 0%
2013-08-05 16:33:23,828 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2013-08-05 16:33:24,834 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2013-08-05 16:33:25,843 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2013-08-05 16:33:26,849 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2013-08-05 16:33:27,855 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2013-08-05 16:33:28,860 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2013-08-05 16:33:29,873 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2013-08-05 16:33:30,880 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 0.94 sec
2013-08-05 16:33:31,888 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.51 sec
2013-08-05 16:33:32,893 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.51 sec
2013-08-05 16:33:33,899 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.51 sec
MapReduce Total cumulative CPU time: 2 seconds 510 msec
Ended Job = job_201307151509_15431
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.51 sec HDFS Read: 559 HDFS Write: 138 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 510 msec
OK
1 121212 test1
2 131313 test2
3 141414 test3
4 151515 test5
5 161616 test6
6 171717 test7
8 191919 test9overwrite
8 191919 test9overwrite
Time taken: 16.872 seconds
结果说明:没有开启严格模式时,order by 与关系型数据库效果类似。
--开启scrict模式,且未在order by 后面加limit
hive> select id,devid from tb_in_base order by devid;
FAILED: Error in semantic analysis: 1:41 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'devid'
注:没有指定limit 报错
--开启scrict模式,且未在order by 后面加limit ,且未指定分区
hive> select * from tb_in_base;
FAILED: Error in semantic analysis: No partition predicate found for Alias "tb_in_base" Table "tb_in_base"
结果说明:严格模式下,无法直接进行查询。
--开启
首页 上一页 1 2 3 4 5 6 7 下一页 尾页 1/10/10
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇Redis批量导入数据 下一篇如何优化单表大批量数据提取插入..

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容:

·Libevent C++ 高并发 (2025-12-26 00:49:30)
·C++ dll 设计接口时 (2025-12-26 00:49:28)
·透彻理解 C 语言指针 (2025-12-26 00:22:52)
·C语言指针详解 (经典 (2025-12-26 00:22:49)
·C 指针 | 菜鸟教程 (2025-12-26 00:22:46)