设为首页 加入收藏

TOP

hive学习笔记之--hive index
2018-12-05 01:03:14 】 浏览:41
Tags:hive 学习 笔记 --hive index

Hive中可以创建Index

索引是标准的数据库技术,hive 0.7版本之后支持索引。hive索引采用的不是'one size fites all'的索引实现方式,而是提供插入式接口,并且提供一个具体的索引实现作为参考。

hive索引具有以下特点:

1.索引key冗余存储,提供基于key的数据视图

2.存储设计以优化查询&检索性能

3.对于某些查询减少IO,从而提高性能。

hive索引创建语句:

CREATE INDEX index_name

ON TABLE base_table_name (col_name, ...)

AS 'index.handler.class.name'

[WITH DEFERRED REBUILD]

[IDXPROPERTIES (property_name=property_value, ...)]

[IN TABLE index_table_name]

[PARTITIONED BY (col_name, ...)]

[

[ ROW FORMAT...] STORED AS ...

| STORED BY ...

]

[LOCATION hdfs_path]

[TBLPROPERTIES (...)]

[COMMENT "index comment"]

注意

1.indexpartition默认和数据表一致

2.视图上不能创建index

3. index可以通过stored as配置存储格式

重建索引

ALTER INDEX index_name ON table_name [PARTITION (...)]REBUILD

假如在创建索引是我们使用“ WITH DEFERRED REBUILD”语句,则索引创建是为空,可以通过“Alter index ... REBUILD”在一个partition上或所有partition上构建索引。

注意:

1.hive数据更新时,必须调用该语句更新索引。

2. index rebuild操作时一个原子操作,因此,当rebuild失败时,先前构建的索引也无法使用

删除索引:

DROP INDEX index_name ON table_name

示例:

创建表&索引

hive> create table index_test(id INT, nameSTRING)

>partitioned by (dt STRING)

> row formatdelimited fields terminated by ',';

OK

Time taken: 32.446 seconds

hive> create table index_tmp(id INT, name STRING, dtSTRING)

> row formatdelimited fields terminated by ',';

OK

Time taken: 0.157 seconds

hive> load data local inpath '/home/work/data/alter_test.txt'into table index_tmp;

Copying data fromfile:/home/work/data/alter_test.txt

Copying file: file:/home/work/data/alter_test.txt

Loading data to table default.index_tmp

OK

Time taken: 1.587 seconds

hive> set hive.exec.dynamic.partition.mode=nonstrict;

hive> set hive.exec.dynamic.partition=true;

hive> insert overwrite table index_testpartition(dt)

> selectid,name,dt

> fromindex_tmp;

Total MapReduce jobs = 2

......

OK

Time taken: 56.103 seconds

hive> create index index1_index_test on tableindex_test(id) AS'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITHDEFERRED REBUILD ;

OK

Time taken: 0.764 seconds

hive> alter index index1_index_test on index_testrebuild;

Total MapReduce jobs = 3

......

OK

Time taken: 138.589 seconds

hive> show index on index_test;

OK

index1_index_testindex_test iddefault__index_test_index1_index_test__ compact

Time taken: 0.163 seconds

hive> showpartitions index_test;

OK

dt=2012-08-10

dt=2012-08-11

dt=2012-08-12

Time taken: 0.17 seconds

举例2

--下面是给上节的stocks表创建索引

CREATE INDEX stocks_index

ON TABLE stocks (ymd)

AS'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

WITH DEFERRED REBUILD

IDXPROPERTIES ('creator' = 'me', 'created_at' ='some_time')

IN TABLE stocks_index_table

COMMENT 'stocks indexed by ymd.';

alter index stocks_index on stocks rebuild;


查看索引数据:

$ hadoop fs -ls/user/hive/warehouse/default__index_test_index1_index_test__

Found 3 items

drwxr-xr-x - worksupergroup 0 2012-08-16 18:27/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-10

drwxr-xr-x - worksupergroup 0 2012-08-16 18:27/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-11

drwxr-xr-x - worksupergroup 0 2012-08-16 18:28/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-12

$ hadoop fs -cat/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-10/000000_0

5,hdfs://localhost:9000/user/hive/warehouse/index_test/dt=2012-08-10/000000_0,0

6,hdfs://localhost:9000/user/hive/warehouse/index_test/dt=2012-08-10/000000_0,10

7,hdfs://localhost:9000/user/hive/warehouse/index_test/dt=2012-08-10/000000_0,20

删除索引:

hive> drop index index1_index_test on index_test;

OK

Time taken: 0.761 seconds

hive> show index on index_test;

OK

Time taken: 0.095 seconds

索引数据也被删除

$ hadoop fs -ls/user/hive/warehouse/default__index_test_index1_index_test__

ls: Cannot access/user/hive/warehouse/default__index_test_index1_index_test__: No such file ordirectory.

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Hive的客户端操作 下一篇hive学习推荐书籍+官方网址

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目