设为首页 加入收藏

TOP

搭建Sqoop, Hive和Mysql之间数据导入
2019-05-14 13:47:41 】 浏览:361
Tags:搭建 Sqoop Hive Mysql 之间 数据 导入

经过一下午的努力,终于实现了Sqoop,在Hive和Mysql之间互相导入数据(表)

前期准备:Hapood平台、Zookeeper、Hbase、Hive都安装好。(安装可以参考其它资源)
本Demo版本:jdk1.8.0_171,hadoop-2.7.3,zookeeper-3.4.9,mysql-5.6.40-linux-glibc2.12-x86_64,hbase-1.2.4,apache-hive-2.1.1-bin

1.安装部署Sqoop

Sqoop有Sqoop1和Sqoop2之区。

可以参阅此文章:http://doc.yonyoucloud.com/doc/ae/921008.html

我使用的是Sqoop1:sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz (有人说Sqoop2不支持Hive与Mysql之间,因此选择Sqoog1)

还有这个Jar包:mysql-connector-java-5.1.45-bin.jar(这个是必须的)

对Sqoop的。gz文件进行释放,这里不在赘述。

把Mysql-connector的包拷贝到你安装的Sqoop的lib目录下。($SQOOP_HOME/lib)

2. Sqoop配置

Sqoop环境变量配置:


[root@master software]# more ~/.bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH
export JAVA_HOME=/application/software/jdk1.8.0_171
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin
export HADOOP_HOME=/application/software/hadoop-2.7.3
export HBASE_HOME=/application/software/hbase-1.2.4
export MAVEN_HOME=/application/software/apache-maven-3.5.4
export SQOOP_HOME=/application/software/sqoop-1.4.7.bin__hadoop-2.6.0
export ZOOKEEPER_HOME=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf #这行比较重要,不配置后边报错
export HADOOP_CLASSPATH=$HBASE_HOME/lib/*:$HIVE_HOME/lib/*:$HADOOP_HOME/lib/* #这行也是,不配置报错。

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$MAVEN_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$HIVE_H
OME/bin
[root@master software]#

Sqoop-env.sh配置:

[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh

[root@master conf]# more sqoop-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/application/software/hadoop-2.7.3

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/application/software/hadoop-2.7.3

#set the path to where bin/hbase is available
export HBASE_HOME=/application/software/hbase-1.2.4

#Set the path to where bin/hive is available
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf
[root@master conf]#

注意:根据实际路径配置HADOOP_COMMON_HOME,HADOOP_MAPRED_HOME,HIVE_HOME

Sqoop-site.xml 我未作任何修改。

3. 环境已具备,可以做实验了。

1)首先是从Hive导出数据到Mysql

Hive里的表内容:

hive> show tables> ;
OK
blog_hive_hbase
brand_dimension
record
record_orc
record_parquet
record_partition
sogoutest
stocks
stu
stu2
stu3
stu_hive
user_dimension
Time taken: 0.031 seconds, Fetched: 13 row(s)
hive> desc sogoutest;
OK
rid string
uid string
price int
Time taken: 0.058 seconds, Fetched: 3 row(s)
hive> select * from sogoutest;
OK
0000000005 00000008 120
0000000004 00000002 429
0000000003 00000004 347
0000000002 00000004 697
0000000001 00000001 252
0000000000 00000001 625
Time taken: 0.102 seconds, Fetched: 6 row(s)
hive>

在Mysql要创建同样的表结构。

mysql> create table sogoutest1(
-> ridvarchar(50),uidvarchar(50),price int);
Query OK, 0 rows affected (0.02 sec)

mysql>

mysql> desc sogoutest1 ;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| rid | varchar(50) | YES | | NULL | |
| uid | varchar(50) | YES | | NULL | |
| price | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

执行操作:

[root@master conf]# sqoop export --connect jdbc:mysql://master:3306/test --username root --password root123 --table sogoutest1 --export-dir '/Hive/sogoutest' --fields-terminated-by '\t' -m 1

--table sogoutest1 #你在mysql里创建的表

--export-dir #是你在Hive里的表

可以通过这个命令查看:

hive> show create table sogoutest;
OK
CREATE TABLE `sogoutest`(
`rid` string,
`uid` string,
`price` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://master:9000/Hive/sogoutest' #这个位置
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='1',
'numRows'='6',
'rawDataSize'='138',
'totalSize'='144',
'transient_lastDdlTime'='1530754256')
Time taken: 0.069 seconds, Fetched: 22 row(s)
hive>


结果展示(在Mysql里可以查到数据):

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sogoutest1 |
| stu_mysql |
+----------------+
2 rows in set (0.00 sec)

mysql> select * from sogoutest1;
+------------+----------+-------+
| rid | uid | price |
+------------+----------+-------+
| 0000000005 | 00000008 | 120 |
| 0000000004 | 00000002 | 429 |
| 0000000003 | 00000004 | 347 |
| 0000000002 | 00000004 | 697 |
| 0000000001 | 00000001 | 252 |
| 0000000000 | 00000001 | 625 |
+------------+----------+-------+
6 rows in set (0.00 sec)

mysql>

2) 其次是从Mysql导出数据到Hive。

Mysql的表内容:

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sogoutest1 |
| stu_mysql |
+----------------+
2 rows in set (0.00 sec)

mysql> desc stu_mysql ;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| stu_id | int(11) | YES | | NULL | |
| name | varchar(15) | YES | | NULL | |
| sex | varchar(8) | YES | | NULL | |
| score | int(11) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> select * from stu_mysql;
+--------+------------+------+-------+
| stu_id | name | sex | score |
+--------+------------+------+-------+
| 1 | mafeng | male | 899 |
| 2 | mazhuoxing | male | 1000 |
| 3 | qizhuoxing | male | 1000 |
+--------+------------+------+-------+
3 rows in set (0.00 sec)

mysql>

不需要在Hive里创建表。通过Sqoop可以在Hive里创建。

直接执行操作:(这是全表导入,也可以选择几列进行导入)

[root@master conf]# sqoop import --connect jdbc:mysql://master:3306/test --username root --password root123 --table stu_mysql --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite --create-hive-table --hive-table stu3 -m 1

结果展示:

hive> show tables;
OK
blog_hive_hbase
brand_dimension
record
record_orc
record_parquet
record_partition
sogoutest
stocks
stu
stu2
stu3
stu_hive
user_dimension
Time taken: 0.037 seconds, Fetched: 13 row(s)
hive> desc stu3;
OK
stu_id int
name string
sex string
score int
Time taken: 0.07 seconds, Fetched: 4 row(s)
hive> select * from stu3;
OK
1 mafeng male 899
2 mazhuoxing male 1000
3 qizhuoxing male 1000
Time taken: 0.117 seconds, Fetched: 3 row(s)
hive>

4. 遇到的坑:

1)报错:Caused by: java.lang.RuntimeException: java.sql.SQLException: null, message from server: "Host 'datanode1' is not allowed to connect to this MySQL server"

解决方案:改变Mysql授权。

例如,你想myuser使用mypassword从任何主机连接到mysql服务器的话。

Mysql>GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%' IDENTIFIED BY 'mypassword' WITH GRANT OPTION;

Mysql>FLUSH PRIVILEGES;

2)报错:

18/07/05 12:03:55 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.

解决方案:
在~/.bash_profile里增加:
export JAVA_HOME=/application/software/jdk1.8.0_171
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin
export HADOOP_HOME=/application/software/hadoop-2.7.3
export HBASE_HOME=/application/software/hbase-1.2.4
export MAVEN_HOME=/application/software/apache-maven-3.5.4
export SQOOP_HOME=/application/software/sqoop-1.4.7.bin__hadoop-2.6.0
export ZOOKEEPER_HOME=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf
export HADOOP_CLASSPATH=$HBASE_HOME/lib/*:$HIVE_HOME/lib/*:$HADOOP_HOME/lib/*


】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇利用数据库存储订单 通知和任务,.. 下一篇hadoop配置项的调优

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目