内部表如果ctas建表之后。采用put的方法 hdfs dfs -put emp.txt /user/hive/warehouse/hive.db/test,那么数据会乱码的
CREATE TABLE test (
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’; 这样就不会乱码
外部表:
CREATE EXTERNAL TABLE ruoze_emp_external (
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
LOCATION ‘/ruoze_emp_external’
;
自动创建目录,
desc formatted ruoze_emp_external; 查看表的类型和存放位置
hdfs dfs -put emp.txt /ruoze_emp_external
drop table ruoze_emp_external; ----> mysql的元数据删除,数据还在
load data local inpath ‘/home/hadoop/app/data/emp.txt’ overwrite into table ruoze_emp_external;
采用load data也是一样,元数据删除,数据还在
采用先放数据,后建表的方式:----也是可以的
官方文档说明:
Managed tables
A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /user/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
External tables
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.