SQOOP is the JDBC-based utility for integrating with traditional
databases.
A SQOOP Import allows for the movement of data into either HDFS (a
delimited format can be defined as part of the Import definition) or
directly into a Hive table.
The entire source table can be moved into HDFS or Hive using the
“–table” parameter.
sqoop import
--connect jdbc:teradata://{host name or ip address}/Database=retail
--connection-manager org.apache.sqoop.teradata.TeradataConnManager
--username dbc
--password dbc
--table SOURCE_TBL
--target-dir /user/hive/incremental_table -m 1
注**
–table source_TBL: 是指关系型数据库里的原表
–target-dir :Hive 中表对应的存储目录
After the initial import, subsequent imports can leverage SQOOP’s native support for “Incremental Import” by using the “check-column”, “incremental” and “last-value” parameters.
sqoop import
--connect jdbc:teradata://{host name or ip address}/Database=retail
--connection-manager org.apache.sqoop.teradata.TeradataConnManager
--username dbc
--password dbc
--table SOURCE_TBL
--target-dir /user/hive/incremental_table -m 1
--check-column modified_date
--incremental lastmodified
--last-value {last_import_date|last_import_value}
Alternately, you can leverage the “query” parameter, and have SQL select statements limit the import to new or changed records only.
sqoop import
--connect jdbc:teradata://{host name or ip address}/Database=retail
--connection-manager org.apache.sqoop.teradata.TeradataConnManager
--username dbc
--password dbc
--target-dir /user/hive/incremental_table -m 1
--query 'select * from SOURCE_TBL where modified_date > {last_import_date} AND $CONDITIONS’
Note: For the initial load, substitute “base_table” for “incremental_table”. For all subsequent loads, use “incremental_table”.
CREATEVIEW reconcile_view ASSELECT t2.id, t2.field1, t2.field2, t2.field3, t2.field4, t2.field5, t2.modified_date FROM
(SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDERBY modified_date DESC) rn
FROM (SELECT * FROM base_table
UNIONALLSELECT * FROM incremental_table)
t1) t2
WHERE rn = 1;