hive: 安装使用（metastore, top10, beeline,事务支持） - Hive

TOP

hive: 安装使用（metastore, top10, beeline,事务支持）

2019-04-26 00:42:27 【大中小】浏览:94次

Tags：hive: 安装使用 metastore top10 beeline 事务支持

要点如下：

如何安装hive, 使用hive (hive +mysql )
hive中：如何建表（数据类型, 字段修改）
hive中：怎样向表中插入数据
hive 与mysql 的关系： mysql作为元数据存储容器
hive中：有哪些查询语句，函数（wordcount, union, topk问题)
hive 交互的jdbc接口： hiveserver2( java api, beeline命令行)
hive 支持事务的配置

(前提： hive中表的操作依赖于hdfs, 所以要先启动hdfs程序)

第一步：hive安装，配置

安装hive: hive + mysql (hive的架构中，默认使用derby数据库，只能开启一个会话，所以使用mysql替代)

安装mysql （设置登录密码，修改远程登录权限）
解压hive安装包，修改配置文件： conf/hive-site.xml

mysql安装: shell脚本

#!/bin/bash
#mysql 安装脚本
sudo yum -y install wget;
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm
sudo yum -y install mysql-server

#更改配置
sudo chown -R root:root /var/lib/mysql
service mysqld restart

#修改root密码 为root
mysql -uroot -p -e '
update mysql.user set password=password("root") where user="root";
flush privileges;
grant all privileges on *.* to "root"@"%" identified by "root";
flush privileges;'

hive安装: shell脚本

（准备条件： ~/ 目录下====>准备两个文件： mysql-connector-java-5.1.44.jar， apache-hive-2.1.1-bin.tar.gz）

以root身份执行以下脚本：

#!/bin/bash
mkdir -p /soft
tar -xzvf apache-hive-2.1.1-bin.tar.gz -C /soft/
mv /soft/*hive*/  /soft/hive/

#环境变量配置
echo  -e 'export HIVE_HOME=/soft/hive \n  export PATH=$PATH:$HIVE_HOME/bin' >>/etc/profile
source /etc/profile

#修改配置文件
cd /soft/hive/conf; 
rename '.template' '' *.template

cp hive-default.xml hive-site.xml
sed -i 's@${system:user.name}@centos@g' hive-site.xml
sed -i 's@${system:java.io.tmpdir}@/home/centos/hive@g' hive-site.xml

#复制mysql驱动
cp ~/mysql-connector-java-5.1.44.jar /soft/hive/lib/

#在mysql中： 创建hive使用的数据库
mysql -uroot -proot -e create database hive;
#初始化：刚建立的hive元数据库
schematool -initSchema -dbType mysql

#验证是hive否安装成功
hive version

#修改hive-site.xml,配置自定义的mysql元数据库信息 ：驱动，用户名，密码, url,
echo -e '
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://s101:3306/hive</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/dbssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
' >> /soft/hive/conf/hive-site.xml

#切换使用权限: 分给centos用户
chown -R centos:centos /soft/ 

#初始化mysql
schematool -dbType mysql -initSchema

第二步：编写hive脚本h1.sql --->使用命令执行hive脚本: hive -f h1.sql

--编写hive sql脚本： 创建数据库, 表, 插入数据
--1,建库
create database big12;

--2, 建表
--==================
use big12;
create table emp2(
     id int,
     name string,

     work_city array<string>,
     sex_age struct<sex:string,age:int>,
     dept_job map<string,string>

)row format delimited
     fields terminated by '|'
     collection items terminated by ','
     map keys terminated by ':'
     lines terminated by '\n'
     stored as textfile;

--3，插入数据===方法1
insert into emp2(
id,name,
work_city,sex_age,dept_job)

select '1','user1',
array('天津','廊坊'),
named_struct('sex','男','age',20),
map('test','emp', 'sail','emp');

--3，插入数据===方法2：load 文件
--准备好文件 ~/emp.txt: 内容如下-----------
--1|张三|北京，上海|男,30|product:开发工程师
--2|李四|广州,南京|男,23|product:领导,test:领导
--3|小丽|重庆,西安|女,37|test:领导,sail:干部
----------------------------------------
load data local inpath 'emp2.txt' into table emp2;

第三步：查看mysql----中存储的hive元数据信息, hdfs存储真实数据

其中的hive就是上面的hive-site.xml定义的---hive使用的数据库，存储元数据信息，可以具体查看内容：

use hive; show tables; --------->有：VERSION, DBS（存hive数据库信息）, TBLS（存hive表信息）等表

第四步：使用hive---执行mapreduce任务： wordcount, topK

--单词统计hive sql脚本:  wordcount:  建表, 插入数据， 执行sql语句
--1,建表
create table wc(line string) 
row format delimited 
lines terminated by '\n'
stored as textfile;

--2,插入数据: 如下数据---wc.txt 
--create table stu(
--name string,
--age int)
--fields terminated by '|'
--collection items terminated by ','
--map keys terminated by ':'
--lines terminated by '\n'
--stored as textfile ;

load data local inpath 'wc.txt' into table wc;

--3，执行sql语句： 单词统计
select word , count(*) count 
from (select explode(split(line,' ')) word from wc)  tmp
group by word
order by count desc;

top 10 问题：编写以下sql 脚本(top10.sql) ，进入hive, 运行脚本： source top10.sql

--top 10 问题（sql脚本）: 求频数最高的前10个密码-----------
----原数据如下(500M)： 
id,           username,  pwd,        email,   name2
--11768479    lkuj      1162592404  q1162592404
--11768480    13165     911001      auqia6
--11768481    鹤舞       intong      qintongfei
--11768482    侠客	be64b9caf12d1c69cf5a8f0fb541c5d8	
--13665116  233229@qq.com a13712731328
--.....
--1. 建表
create table top10(id int, name1 string, pwd string, email string, name2 string)
row format delimited
fields terminated by '\t'
stored as textfile;

--2,插入数据
load data local inpath 'user_pwd.txt' into table top10;

--3,sql语句
select pwd ,count(*) count from top10
group by pwd
order by count desc
limit 10;

第五步： beeline ----> hive和java的交互， hive 运行shell命令

1，开启beeline: 启动sql服务

2, 创建maven项目，导入依赖

3，编写java: 实现jdbc通信

 @Test
    public void t1() throws  Exception{
       //  Class.forName("org.apache.hive.jdbc.HiveDriver");
         //建立连接
        Connection con = DriverManager.getConnection("jdbc:hive2://s101:10000/big12");

        //查询
        Statement stm = con.createStatement();
        ResultSet res = stm.executeQuery("select * from stu");

        while (res.next()){
            System.out.println(res.getString(1)+", "+res.getString(2));
        }

        //关闭资源
        res.close();
        con.close();
    }

    //插入数据
    @Test
    public void t2() throws  Exception{

        Connection con = DriverManager.getConnection("jdbc:hive2://s101:10000/big12");
        Statement stm = con.createStatement();

        //插入
        stm.executeUpdate("insert into per values (3,'测试',23)");
        con.close();
        stm.close();
    }

hive事务支持配置（1，建事务表 2，配置事务支持）

create table tx
(id int, msg string) 
clustered by (day string) into 3 buckets 
stored as orc TBLPROPERTIES ('transactional'='true');

查询表内容： select * from tx ; 报错如下（因为没有配置事务支持）

解决办法：添加事务支持配置（shell 窗口输入以下内容）

SET hive.support.concurrency = true;
SET hive.enforce.bucketing = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on = true;


【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：3 hql语法及自定义函数（含array..	下一篇：Hive的访问接口 \| Allen's Wo..

第一步：hive安装，配置

第二步： 编写hive脚本h1.sql --->使用命令执行hive脚本: hive -f h1.sql

第三步： 查看mysql----中存储的hive元数据信息, hdfs存储真实数据

第四步： 使用hive---执行mapreduce任务： wordcount, topK

第五步： beeline ----> hive和java的交互， hive 运行shell命令

hive事务支持配置（1，建事务表 2，配置事务支持）

第二步：编写hive脚本h1.sql --->使用命令执行hive脚本: hive -f h1.sql

第三步：查看mysql----中存储的hive元数据信息, hdfs存储真实数据

第四步：使用hive---执行mapreduce任务： wordcount, topK