设为首页 加入收藏

TOP

hive on te,mr,spark性能测试
2019-02-12 01:01:28 】 浏览:543
Tags:hive spark 性能 测试

组件版本

hadoop集群

  • hive 2.1.1
  • hive on spark 1.6.3
  • tez 0.8.5

数据准备

创建report.data_security_lab,表结构与阳泉集群相同

CREATE EXTERNAL TABLE `report.data_security_lab`(
  xxx
PARTITIONED BY (
  `stat_date` string,
  `log_id` string)
stored as ORC;

创建分区

alter table report.data_security_lab add partition (stat_date=20170614,log_id=xxxxxxx)

性能测试 hive cli

test 1 统计pv:

select count(1) from report.data_security_lab where stat_date=20170614;
mr tez yq01-mr
90.228 seconds 69.559 seconds(container.reuse:30-40 seconds) 127.341 seconds

test 2 每个log_id的uv

select log_id,count(1) from( select cuid,log_id from report.data_security_lab where stat_date=20170614 group by 1,2)tmp group by 1;
mr tez yq01-mr
368.222 seconds 324.259 seconds(container.reuse:300 seconds) 229.018 seconds

test 3 每个log_id的pv,uv

select log_id,count(1) ,sum(pv) from( select cuid,log_id,count(1) pv from report.data_security_lab where stat_date=20170614 group by 1,2)tmp group by 1;
mr tez yq01-mr
392.168 seconds 352.286 seconds(container.reuse:330 seconds) 218.734 seconds

yq集群速度快hive2.1.1与hive1.2.1执行计划不同,hive1.2.1 2个stage,hive2.1.1 1个stage

test 4: uv

select count(1) from (select cuid from report.data_security_lab where stat_date=20170614 group by 1) tmp;
mr tez yq01-mr
146.33 seconds 129.805 seconds 193.618 seconds

test 5 join

select count(1) from 
(select cuid,stat_date from report.data_security_lab where stat_date=20170614  and log_id=1003003 ) a
join 
(select cuid from report.data_security_lab where stat_date=20170614  and log_id=1011105 ) b
on a.cuid = b.cuid
join
(select cuid,stat_date from report.data_security_lab where stat_date=20170614  and log_id=1007102 ) c
on a.cuid= c.cuid
mr tez yq01-mr
360.74 seconds 318.365 seconds(container.reuse:290 seconds) 475.085 seconds

性能测试 hiveserver2(性能与hive cli相同):

join

select count(1) from 
(select cuid,stat_date from report.data_security_lab where stat_date=20170614  and log_id=1003003 ) a
join 
(select cuid from report.data_security_lab where stat_date=20170614  and log_id=1011105 ) b
on a.cuid = b.cuid
join
(select cuid,stat_date from report.data_security_lab where stat_date=20170614  and log_id=1007102 ) c
on a.cuid= c.cuid
mr tez yq01-mr
5 mins 54 secs 5 mins 12 secs (contaner 不可重用)

性能测试 小数据量(10G)测试

数据(7.6 G):hdfs://szth-ns1/user/hive/warehouse/report.db/data_security_lab/stat_date=20170614/log_id=1003123

test 1 pv

select count(1) from report.data_security_lab where stat_date=20170614 and log_id=1003123
mr tez spark yq01-mr
24.151 seconds 18.18 seconds (contaner reuse 6.131 seconds) 16.125 seconds 35.223 seconds

test 2 uv

select count(1) from(select cuid from report.data_security_lab where stat_date=20170614 and log_id=1003123 group by 1)tmp
mr tez spark yq01-mr
62.181 seconds 26.182 seconds (contaner reuse 16.198 seconds seconds) 296.828 seconds 96.494 seconds

test 3 join 多 stage

select count(1) from 
(select cuid,stat_date from report.data_security_lab where
stat_date=20170614  and log_id=1003003 group by 1,2) a
join 
(select cuid from report.data_security_lab where stat_date=20170614
and log_id=1011105 group by 1) b
on a.cuid = b.cuid
join
(select cuid,stat_date from report.data_security_lab where
stat_date=20170614  and log_id=1007102 group by 1,2) c
on a.cuid= c.cuid
mr tez spark yq01-mr
duration 142.052 seconds 98.273 seconds (with some task failed) 296.828 seconds 837.784 seconds
result 13501072 13501072 46225003 13501072

spark计算结果有误

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇hive 窗口函数,分析函数的一些理.. 下一篇hive count(*)问题处理

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目