设为首页 加入收藏

TOP

Hive 内部表和外部表的区别
2019-03-19 12:57:20 】 浏览:174
Tags:Hive 内部 外部 区别

hive (hive)> desc formatted ruoze_emp;
OK

col_name data_type comment

empno int
ename string
job string
mgr int
hiredate string
sal double
comm double
deptno int

Detailed Table Information

Database: hive
Owner: hadoop
CreateTime: Mon Oct 22 20:11:23 CST 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop01:9000/user/hive/warehouse/hive.db/ruoze_emp
Table Type: MANAGED_TABLE ----代表的是内部表
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 0
rawDataSize 0
totalSize 700
transient_lastDdlTime 1540210294

Storage Information

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.087 seconds, Fetched: 39 row(s)

创建了一张内部表:HDFS MySQL 都有数据
删除表: hdfs和mysql的数据都没有

内部表如果ctas建表之后。采用put的方法 hdfs dfs -put emp.txt /user/hive/warehouse/hive.db/test,那么数据会乱码的
CREATE TABLE test (
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’; 这样就不会乱码

外部表:

CREATE EXTERNAL TABLE ruoze_emp_external (
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
LOCATION ‘/ruoze_emp_external’
;
自动创建目录,

desc formatted ruoze_emp_external; 查看表的类型和存放位置

hdfs dfs -put emp.txt /ruoze_emp_external

drop table ruoze_emp_external; ----> mysql的元数据删除,数据还在

load data local inpath ‘/home/hadoop/app/data/emp.txt’ overwrite into table ruoze_emp_external;
采用load data也是一样,元数据删除,数据还在

采用先放数据,后建表的方式:----也是可以的

官方文档说明:

Managed tables
A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /user/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
External tables
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇大数据(十八):Hive元数据配置.. 下一篇Hive中数据的加载和导出

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目