(*)不需要MySQL的支持,使用Hive的自带的数据库Derby
(*)局限:只支持一个连接
<xml version="1.0" encoding="UTF-8" standalone="no">
<xml-stylesheet type="text/xsl" href="configuration.xsl">
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>file:///root/training/apache-hive-2.3.0-bin/warehouse</value>
</property>
</configuration>
初始化Derby数据库
schematool -dbType derby -initSchema
日志
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
测试数据:员工表和部门表
7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30
1、内部表:相当于MySQL的表 对应的HDFS的目录 /user/hive/warehouse
create table emp
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int,
deptno int);
插入数据 insert、load语句(从某个HDFS的目录或者本地Linux的目录上,把数据导入Hive的表 本质ctrl+x)
load data inpath '/scott/emp.csv' into table emp; 导入HDFS的数据
load data local inpath '/root/temp/*****' into table emp; 导入本地Linux的数据
客户端的静默模式:不打印日志(hive -S)(Silent)
创建表的时候,一定指定分隔符
create table emp1
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int,
deptno int)
row format delimited fields terminated by ',';
创建部门表 并且导入数据
create table dept
(deptno int,
dname string,
loc string)
row format delimited fields terminated by ',';
2、分区表: 可以提高查询的效率的----> 通过查看SQL的执行计划
*根据员工的部门号创建分区* partitioned by (deptno int)
create table emp_part
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int)
partitioned by (deptno int)
row format delimited fields terminated by ',';
Oracle中SQL的执行计划
查询10号部门的员工
explain plan from select * from emp where deptno-10;
select * from table(dbms)
Hive执行计划
SQL的执行计划
以Oracle数据库为例
补充点知识:Oracle数据库的优化器有两种:RBO、CBO
R是Rule 规则
C就是cost 成本
举例:查询10号部门的员工信息
explain plan for select * from emp where deptno=10; -----> 生成该语句的执行计划
select * from table(dbms_xplan.display);
情况1:没有索引
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 261 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| EMP | 3 | 261 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
情况2:有索引 create index myindex on emp(deptno
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 261 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP | 3 | 261 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | MYINDEX | 3 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Hive的执行计划
1、查询普通的内部表
explain select * from emp1 where deptno=10;
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: emp1
Statistics: Num rows: 1 Data size: 617 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (deptno = 10) (type: boolean)
Statistics: Num rows: 1 Data size: 617 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: empno (type: int), ename (type: string), job (type: string), mgr (type: int), hiredate (type: string), sal (type: int), comm (type: int), 10 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 1 Data size: 617 Basic stats: COMPLETE Column stats: NONE
ListSink
2、查询分区表
explain select * from emp_part where deptno=10;
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: emp_part
Statistics: Num rows: 3 Data size: 121 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: empno (type: int), ename (type: string), job (type: string), mgr (type: int), hiredate (type: string), sal (type: int), comm (type: int), 10 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 3 Data size: 121 Basic stats: COMPLETE Column stats: NONE
ListSink
查询10号部门员工
情况一:查询内部表
select * from empl where deptno=10;
情况二:查询分区表
指明导入的数据的分区(通过子查询导入数据) ----> MapReduce程序
insert into table emp_part partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=10;
insert into table emp_part partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=20;
insert into table emp_part partition(deptno=30) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=30;