TOP

hadoop distcp命令的使用
2018-12-02 08:35:27 】 浏览:526
Tags:hadoop distcp 命令 使用

hadoop distcp -update -skipcrccheck -m $num_map  $old_table_location  $new_table_location

命令的使用。
简单介绍:http://blog.csdn.net/stark_summer/article/details/45869945
如何在两个集群之间进行表数据的拷贝呢?
1. 复制表结构;
2. 获取旧表的Location、在获取新表的Location,通过下面的命令进行复制:
3. 使用msck repair table new_table命令,修复新表的分区元数据(分区表的必备)。
下面我进行相应的操作:

hive> select *
    > from t441;
OK
30      beijing dongdong        man
40      shanghai        lisi    woman
Time taken: 0.078 seconds
hive> desc formatted t441;
OK
# col_name              data_type               comment             

id                      int                     None                
city                    string                  None                
name                    string                  None                
sex                     string                  None                

# Detailed Table Information             
Database:               fdm                      
Owner:                  root                     
CreateTime:             Mon May 01 09:09:36 PDT 2017     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop:9000/warehouse/fdm.db/t441         
Table Type:             MANAGED_TABLE            

从上面我们可以获取到旧表的Location。
而后我们在获取新表的Location。

hive> desc formatted t444;
OK
# col_name              data_type               comment             

id                      int                     None                
city                    string                  None                
name                    string                  None                
sex                     string                  None                

# Detailed Table Information             
Database:               fdm                      
Owner:                  root                     
CreateTime:             Mon May 01 09:56:57 PDT 2017     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop60:9000/warehouse/fdm.db/t444       
Table Type:             MANAGED_TABLE  

最后我们进行拷贝数据操作:

[root@hadoop local]# hadoop distcp -update -skipcrccheck hdfs://hadoop:9000/warehouse/fdm.db/t441  hdfs://hadoop60:9000/warehouse/fdm.db/t444 ; 
17/05/01 10:09:10 INFO tools.DistCp: srcPaths=[hdfs://hadoop:9000/warehouse/fdm.db/t441]
17/05/01 10:09:10 INFO tools.DistCp: destPath=hdfs://hadoop60:9000/warehouse/fdm.db/t444
17/05/01 10:09:10 INFO tools.DistCp: sourcePathsCount=2
17/05/01 10:09:10 INFO tools.DistCp: filesToCopyCount=1
17/05/01 10:09:10 INFO tools.DistCp: bytesToCopyCount=47.0
17/05/01 10:09:11 INFO mapred.JobClient: Running job: job_201705010710_0010
17/05/01 10:09:12 INFO mapred.JobClient:  map 0% reduce 0%
17/05/01 10:09:17 INFO mapred.JobClient:  map 100% reduce 0%
17/05/01 10:09:17 INFO mapred.JobClient: Job complete: job_201705010710_0010
17/05/01 10:09:17 INFO mapred.JobClient: Counters: 22
17/05/01 10:09:17 INFO mapred.JobClient:   Map-Reduce Framework
17/05/01 10:09:17 INFO mapred.JobClient:     Spilled Records=0
17/05/01 10:09:17 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=289374208
17/05/01 10:09:17 INFO mapred.JobClient:     Map input records=1
17/05/01 10:09:17 INFO mapred.JobClient:     SPLIT_RAW_BYTES=152
17/05/01 10:09:17 INFO mapred.JobClient:     Map output records=0
17/05/01 10:09:17 INFO mapred.JobClient:     Physical memory (bytes) snapshot=38797312
17/05/01 10:09:17 INFO mapred.JobClient:     Map input bytes=130
17/05/01 10:09:17 INFO mapred.JobClient:     CPU time spent (ms)=130
17/05/01 10:09:17 INFO mapred.JobClient:     Total committed heap usage (bytes)=16252928
17/05/01 10:09:17 INFO mapred.JobClient:   distcp
17/05/01 10:09:17 INFO mapred.JobClient:     Bytes copied=47
17/05/01 10:09:17 INFO mapred.JobClient:     Bytes expected=47
17/05/01 10:09:17 INFO mapred.JobClient:     Files copied=1
17/05/01 10:09:17 INFO mapred.JobClient:   File Input Format Counters 
17/05/01 10:09:17 INFO mapred.JobClient:     Bytes Read=230
17/05/01 10:09:17 INFO mapred.JobClient:   FileSystemCounters
17/05/01 10:09:17 INFO mapred.JobClient:     HDFS_BYTES_READ=429
17/05/01 10:09:17 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53786
17/05/01 10:09:17 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=47
17/05/01 10:09:17 INFO mapred.JobClient:   File Output Format Counters 
17/05/01 10:09:17 INFO mapred.JobClient:     Bytes Written=0
17/05/01 10:09:17 INFO mapred.JobClient:   Job Counters 
17/05/01 10:09:17 INFO mapred.JobClient:     Launched map tasks=1
17/05/01 10:09:17 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
17/05/01 10:09:17 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
17/05/01 10:09:17 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=4939
17/05/01 10:09:17 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

我们随后查看一下数据:

hive> select *  
    > from t444;
OK
30      beijing dongdong        man
40      shanghai        lisi    woman
Time taken: 0.069 seconds

OK,此时我们的数据拷贝成功。


hadoop distcp命令的使用 https://www.cppentry.com/bencandy.php?fid=114&id=192979

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇HADOOP:HIVE常用知识总结 下一篇HADOOP读写性能测试