设为首页 加入收藏

TOP

Hive与HBase集成进行数据分析
2019-04-14 12:46:03 】 浏览:103
Tags:Hive HBase 集成 进行 数据分析
版权声明:本文为博主原创文章,欢迎大家转载交流,但必须在博文明显位置标明转载地址。否则维权必究! https://blog.csdn.net/py_123456/article/details/89283937

(一)Hive 概述

http://ke.dajiangtai.com/content/6912/1.png

(二)HiveHadoop生态圈中的位置

http://ke.dajiangtai.com/content/6912/2.png

(三)Hive 架构设计

http://ke.dajiangtai.com/content/6912/3.png

http://ke.dajiangtai.com/content/6912/4.png

(四)Hive 的优点及应用场景

http://ke.dajiangtai.com/content/6912/5.png

(五)Hive 的下载和安装部署

1.Hive 下载

Apache版本的Hive。

Cloudera版本的Hive。

这里选择下载Apache稳定版本apache-hive-0.13.1-bin.tar.gz,并上传至bigdata-pro03.kfk.com节点。

2.解压安装hive

[kfk@bigdata-pro03 modules]$ ls

flume-1.7.0-bin hbase-0.98.6-cdh5.3.0 jdk1.8.0_60 kafka_2.11-0.9.0.0 zookeeper-3.4.5-cdh5.10.0

hadoop-2.6.0 hive-0.13.1-cdh5.3.0.tar.gz kafka_2.11-0.8.2.1 producer.properties

[kfk@bigdata-pro03 modules]$ tar -zxf hive-0.13.1-cdh5.3.0.tar.gz

[kfk@bigdata-pro03 modules]$ ls

flume-1.7.0-bin hbase-0.98.6-cdh5.3.0 hive-0.13.1-cdh5.3.0.tar.gz kafka_2.11-0.8.2.1 producer.properties

hadoop-2.6.0 hive-0.13.1-cdh5.3.0 jdk1.8.0_60 kafka_2.11-0.9.0.0 zookeeper-3.4.5-cdh5.10.0

[kfk@bigdata-pro03 modules]$ mv hive-0.13.1-cdh5.3.0.tar.gz /opt/softwares/

[kfk@bigdata-pro03 modules]$ ls

flume-1.7.0-bin hbase-0.98.6-cdh5.3.0 jdk1.8.0_60 kafka_2.11-0.9.0.0 zookeeper-3.4.5-cdh5.10.0

hadoop-2.6.0 hive-0.13.1-cdh5.3.0 kafka_2.11-0.8.2.1 producer.properties

[kfk@bigdata-pro03 modules]$ cd hive-0.13.1-cdh5.3.0/

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ ll

total 312

drwxr-xr-x 3 kfk kfk 4096 Dec 17 2014 bin

drwxr-xr-x 2 kfk kfk 4096 Dec 17 2014 conf

drwxr-xr-x 3 kfk kfk 4096 Dec 17 2014 data

drwxr-xr-x 6 kfk kfk 4096 Dec 17 2014 docs

drwxr-xr-x 4 kfk kfk 4096 Dec 17 2014 examples

drwxr-xr-x 7 kfk kfk 4096 Dec 17 2014 hcatalog

drwxr-xr-x 4 kfk kfk 4096 Dec 17 2014 lib

-rw-r--r-- 1 kfk kfk 23828 Dec 17 2014 LICENSE

-rw-r--r-- 1 kfk kfk 277 Dec 17 2014 NOTICE

-rw-r--r-- 1 kfk kfk 3838 Dec 17 2014 README.txt

-rw-r--r-- 1 kfk kfk 253839 Dec 17 2014 RELEASE_NOTES.txt

drwxr-xr-x 3 kfk kfk 4096 Dec 17 2014 scripts

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ cd scripts/

[kfk@bigdata-pro03 scripts]$ ls

metastore

[kfk@bigdata-pro03 scripts]$ cd metastore/

[kfk@bigdata-pro03 metastore]$ ls

upgrade

[kfk@bigdata-pro03 metastore]$ cd upgrade/

[kfk@bigdata-pro03 upgrade]$ ls

derby mssql mysql oracle postgres //hive默认的五种metastore数据库

3.修改hive-log4j.properties配置文件

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ cd conf/

[kfk@bigdata-pro03 conf]$ ll

total 120

-rw-r--r-- 1 kfk kfk 109185 Dec 17 2014 hive-default.xml.template

-rw-r--r-- 1 kfk kfk 2378 Dec 17 2014 hive-env.sh.template

-rw-r--r-- 1 kfk kfk 2662 Dec 17 2014 hive-exec-log4j.properties.template

-rw-r--r-- 1 kfk kfk 3505 Dec 17 2014 hive-log4j.properties.template

[kfk@bigdata-pro03 conf]$ mv hive-log4j.properties.template hive-log4j.properties

[kfk@bigdata-pro03 conf]$ vi hive-log4j.properties

#日志目录需要提前创建

hive.log.dir=/opt/modules/hive-0.13.1-cdh5.3.0/logs

4.修改hive-env.sh配置文件

[kfk@bigdata-pro03 conf]$ mv hive-env.sh.template hive-env.sh

[kfk@bigdata-pro03 conf]$ vi hive-env.sh

export HADOOP_HOME=/opt/modules/hadoop-2.6.0

export HIVE_CONF_DIR=/opt/modules/hive-0.13.1-cdh5.3.0/conf

5.首先启动HDFS(启动过程就不再累述了有疑问的请参考前面博文),然后创建Hive的目录(官网要求如下)。

/tmp目录已有,不需要创建。

bin/hdfs dfs -mkdir -p /user/hive/warehouse
bin/hdfs dfs -chmod g+w /user/hive/warehouse

6.启动hive

[kfk@bigdata-pro03 zookeeper-3.4.5-cdh5.10.0]$ cd ../hive-0.13.1-cdh5.3.0/

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ bin/hive

Logging initialized using configuration in file:/opt/modules/hive-0.13.1-cdh5.3.0/conf/hive-log4j.properties

hive> show databases;

OK

default

Time taken: 0.872 seconds, Fetched: 1 row(s)

启动成功!

(六)Hive MySQL集成

1.在/opt/modules/hive-0.13.1-bin/conf目录下创建hive-site.xml文件,配置mysql元数据库。

vi hive-site.xml

 <property>

 <name>javax.jdo.option.ConnectionURL</name>

 <value>jdbc:mysql://bigdata-pro01.kfk.com/metastorecreateDatabaseIfNotExist=true</value>

 </property>

 <property>

 <name>javax.jdo.option.ConnectionDriverName</name>

 <value>com.mysql.jdbc.Driver</value>

 </property>

<property>

 <name>javax.jdo.option.ConnectionUserName</name>

 <value>root</value>

 </property>

 <property>

 <name>javax.jdo.option.ConnectionPassword</name>

 <value>root</value>

 </property>

2.设置用户连接

1)查看用户信息

[kfk@bigdata-pro01 hadoop-2.6.0]$ mysql -uroot -proot

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| mysql |

| test |

+--------------------+

3 rows in set (0.00 sec)



mysql> use mysql;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql> show tables;

+---------------------------+

| Tables_in_mysql |

+---------------------------+

| columns_priv |

| db |

| event |

| func |

| general_log |

| help_category |

| help_keyword |

| help_relation |

| help_topic |

| host |

| ndb_binlog_index |

| plugin |

| proc |

| procs_priv |

| servers |

| slow_log |

| tables_priv |

| time_zone |

| time_zone_leap_second |

| time_zone_name |

| time_zone_transition |

| time_zone_transition_type |

| user |

+---------------------------+

23 rows in set (0.00 sec)

mysql> select User,Host,Password from user;

+------+-----------------------+-------------------------------------------+

| User | Host | Password |

+------+-----------------------+-------------------------------------------+

| root | localhost | *81F5E21E35407D884A6CD4A731AEBFB6AF209E1B |

| root | bigdata-pro01.kfk.com | |

| root | 127.0.0.1 | |

| | localhost | |

| | bigdata-pro01.kfk.com | |

+------+-----------------------+-------------------------------------------+

5 rows in set (0.00 sec)

2)更新用户信息

update user set Host='%' where User = 'root' and Host='localhost';

3)删除用户信息

delete from user where user='root' and host='127.0.0.1';

delete from user where user='root' and host='bigdata-pro01.kfk.com';

delete from user where host='localhost';

delete from user where host='bigdata-pro01.kfk.com';

select User,Host,Password from user;

4)刷新信息

flush privileges;

3.上传mysql驱动包(mysql-connector-java-5.1.27.jar)到hive的lib目录下

4.保证第三台集群到其他节点无秘钥登录

做完以上工作之后就可以保证集群的其他节点也可以直接登录到Hive环境中去。

(七)Hive 服务启动与测试

1.启动HDFS和YARN服务

2.启动hive

./hive

3.通过hive服务创建表

CREATE TABLE test(id INT,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;

4.创建数据文件

[kfk@bigdata-pro03 lib]$ cd /opt/datas/

[kfk@bigdata-pro03 datas]$ ls

weblog-flume.log weblog.log weblog-shell.sh weblogs.log

[kfk@bigdata-pro03 datas]$ vi test.txt

0001 mary(中间用Tab隔开)

0002 jacky

0003 monica

0004 tom

5.加载数据到hive表中

hive> load data local inpath '/opt/datas/test.txt' into table test;

Copying data from file:/opt/datas/test.txt

Copying file: file:/opt/datas/test.txt

Loading data to table default.test

Table default.test stats: [numFiles=1, numRows=0, totalSize=43, rawDataSize=0]

OK

Time taken: 2.568 seconds

hive> select * from test;

OK

1 mary

2 jacky

3 monica

4 tom

Time taken: 0.4 seconds, Fetched: 4 row(s)

再看节点1的MySQL端:

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| metastore |

| mysql |

| test |

+--------------------+

4 rows in set (0.00 sec)

mysql> use metastore;

mysql> select * from TBLS;

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

| 2 | 1553240538 | 1 | 0 | kfk | 0 | 2 | test | MANAGED_TABLE | NULL | NULL |

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

1 row in set (0.00 sec)

接下来再做以下配置:

将文件中红框的内容拷贝到hive-site.xml文件中去,并将value值改为true。

<property>

 <name>hive.cli.print.header</name>

 <value>true</value>

</property>

<property>

 <name>hive.cli.print.current.db</name>

 <value>true</value>

</property>

然后退出重启Hive,可以发现会显示出数据库名和表的列名信息。

可以发现,我们在Hive端创建的数据库和表已经同步到我们的MySQL端了,即MySQL与Hive的集成没有问题。

(八)HiveHBase集成

1.在hive-site.xml文件中配置Zookeeper,hive通过这个参数去连接HBase集群。

<property>

 <name>hbase.zookeeper.quorum</name> <value>bigdata-pro01.kfk.com,bigdata-pro02.kfk.com,bigdata-pro03.kfk.com</value>

</property>

2.将hbase的9个包拷贝到hive/lib目录下。

在[kfk@bigdata-pro03 lib]$路径下运行下列指令:

export HBASE_HOME=/opt/modules/hbase-0.98.6-cdh5.3.0

export HIVE_HOME=/opt/modules/hive-0.13.1-cdh5.3.0

ln -s $HBASE_HOME/lib/hbase-server-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-server-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-client-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-client-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-it-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-it-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/htrace-core-2.04.jar$HIVE_HOME/lib/htrace-core-2.04.jar

ln -s $HBASE_HOME/lib/hbase-hadoop2-compact-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-hadoop2-compact-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-hadoop-compact-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-hadoop-compact-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/high-scale-lib-1.1.1.jar $HIVE_HOME/lib/high-scale-lib-1.1.1.jar

ln -s $HBASE_HOME/lib/hbase-common-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-common-0.98.6-cdh5.3.0.jar

执行完以上指令之后使用ll命令查看一下拷贝的jar包是否有问题,没有问题的话再进行下一步。

像上图就是有问题的jar包,最好进到hbase的lib目录下用cp命令重新将有问题的包拷贝一遍。

3.创建与HBase集成的Hive的外部表

hive (default)> create external table weblogs(

 id string,

 datatime string,

 userid string,

 searchname string,

 retorder string,

 cliorder string,

 cliurl string

 )

 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

 WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,info:datatime,info:userid,info:searchname,info:retorder,info:cliorder,info:cliurl") TBLPROPERTIES("hbase.table.name" = "weblogs");

OK

Time taken: 3.661 seconds

结果有报错!

从日志可以看出错误原因是DNS服务贸易配置,无法进行正确的地址解析,解决原因参加博客:使用bind搭建DNS服务(CentOS 6.5)

#查看hbase数据记录

报错解决之后删掉之前的业务表,重新创建,然后再查询:

select * from weblogs;

我们再运行一个MapReduce程序查看表里有多少条数据:

4.hive 中beeline和hiveserver2的使用

1)启动hiveserver2

bin/hiveserver2

2)启动beeline

#连接hive2服务

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ bin/beeline

Beeline version 0.13.1-cdh5.3.0 by Apache Hive

beeline> !connect jdbc:hive2://bigdata-pro03.kfk.com:10000

scan complete in 15ms

Connecting to jdbc:hive2://bigdata-pro03.kfk.com:10000

Enter username for jdbc:hive2://bigdata-pro03.kfk.com:10000: kfk

Enter password for jdbc:hive2://bigdata-pro03.kfk.com:10000: ***

Connected to: Apache Hive (version 0.13.1-cdh5.3.0)

Driver: Hive JDBC (version 0.13.1-cdh5.3.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://bigdata-pro03.kfk.com:10000>

#查看表

0: jdbc:hive2://bigdata-pro03.kfk.com:10000> show tables;

+-----------+--+

| tab_name |

+-----------+--+

| test |

| weblogs |

+-----------+--+

2 rows selected (1.44 seconds)

#查看前10条数据

0: jdbc:hive2://bigdata-pro03.kfk.com:10000> select * from weblogs limit 10;

+-------------------------------------------+-------------------+----------------------+---------------------+-------------------+-------------------+----------------------------------------------------------------------------------+--+

| weblogs.id | weblogs.datatime | weblogs.userid | weblogs.searchname | weblogs.retorder | weblogs.cliorder | weblogs.cliurl |

+-------------------------------------------+-------------------+----------------------+---------------------+-------------------+-------------------+----------------------------------------------------------------------------------+--+

| 001436217275865958600:00:011541058199433 | 00:00:01 | 0014362172758659586 | [明星合成] | 64 | 21 | link.44box.com/ |

| 001436217275865958600:00:011541058212288 | 00:00:01 | 0014362172758659586 | [明星合成] | 64 | 21 | link.44box.com/ |

| 001442567877373568400:00:201541058289917 | 00:00:20 | 0014425678773735684 | [www.hntz.net] | 2 | 1 | www.gaywang.com/ |

| 001442567877373568400:00:201541058302167 | 00:00:20 | 0014425678773735684 | [www.hntz.net] | 2 | 1 | www.gaywang.com/ |

| 001982399550216229500:00:081541058229644 | 00:00:08 | 0019823995502162295 | [抗震救灾晚会] | 8 | 17 | news.xinhuanet.com/newscenter/2008-05/31/content_8287444.htm |

| 001982399550216229500:00:081541058242323 | 00:00:08 | 0019823995502162295 | [抗震救灾晚会] | 8 | 17 | news.xinhuanet.com/newscenter/2008-05/31/content_8287444.htm |

| 001982399550216229500:00:341541058380339 | 00:00:34 | 0019823995502162295 | [抗震救灾晚会] | 10 | 18 | news.qq.com/a/20080531/000369.htm |

| 001982399550216229500:00:341541058395143 | 00:00:34 | 0019823995502162295 | [抗震救灾晚会] | 10 | 18 | news.qq.com/a/20080531/000369.htm |

| 00697069831361674100:00:091541058229692 | 00:00:09 | 006970698313616741 | [高成义+厦门] | 4 | 2 | www.chinatally.com.cn/backend/infoview.jspinfoID=1148&infoClass=.002.&comID=17 |

| 00697069831361674100:00:091541058242369 | 00:00:09 | 006970698313616741 | [高成义+厦门] | 4 | 2 | www.chinatally.com.cn/backend/infoview.jspinfoID=1148&infoClass=.002.&comID=17 |

+-------------------------------------------+-------------------+----------------------+---------------------+-------------------+-------------------+----------------------------------------------------------------------------------+--+

10 rows selected (2.816 seconds)

以上就是博主为大家介绍的这一板块的主要内容,这都是博主自己的学习过程,希望能给大家带来一定的指导作用,有用的还望大家点个支持,如果对你没用也望包涵,有错误烦请指出。如有期待可关注博主以第一时间获取更新哦,谢谢!同时也欢迎转载,但必须在博文明显位置标注原文地址,解释权归博主所有!

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Hive的访问接口 | Allen's Wo.. 下一篇3 hql语法及自定义函数(含array..

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目