设为首页 加入收藏

TOP

生产HDFS Block损坏恢复最佳实践
2019-04-28 00:20:04 】 浏览:62
Tags:生产 HDFS Block 损坏 恢复 最佳 实践
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/lin443514407lin/article/details/88099704

1、上传文件hello.txt

[root@cdh-node01 apps]# hdfs dfs -mkdir /blockrecover

[root@cdh-node01 apps]# echo "hello word" > hello.txt

[root@cdh-node01 apps]# hdfs dfs -put hello.txt /blockrecover

[root@cdh-node01 apps]# hdfs dfs -ls /blockrecover

Found 1 items
-rw-r--r-- 2 root supergroup 11 2019-03-03 18:26 /blockrecover/hello.txt

[root@cdh-node01 apps]# hdfs fsck /

Connecting to namenode via http://cdh-node01:50070/fsckugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /192.168.17.20 for path / at Sun Mar 03 18:27:50 CST 2019

Status: HEALTHY
Number of data-nodes: 3
Number of racks: 1
Total dirs: 40
Total symlinks: 0

Replicated Blocks:
Total size: 108216 B
Total files: 35
Total blocks (validated): 25 (avg. block size 4328 B)
Minimally replicated blocks: 25 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)

Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
FSCK ended at Sun Mar 03 18:27:50 CST 2019 in 65 milliseconds


The filesystem under path '/' is HEALTHY

二.直接DN节点上删除文件一个block的一个副本(2副本)

删除块和meta文件:

查看块和meta文件位置:

[root@cdh-node02 subdir0]# rm -rf blk_1073741874 blk_1073741874_1065.meta

直接重启HDFS,直接模拟损坏效果,然后fsck检查:

[root@cdh-node01 ~]# hdfs fsck /

Connecting to namenode via http://cdh-node01:50070/fsckugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /192.168.17.20 for path / at Sun Mar 03 19:48:31 CST 2019

/blockrecover/hello.txt: Under replicated BP-794681415-192.168.17.20-1548403311677:blk_1073741874_1065. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).

/user/root/.Trash/Current/blockrecover/hello.txt: MISSING 1 blocks of total size 11 B.
Status: CORRUPT
Number of data-nodes: 3
Number of racks: 1
Total dirs: 45
Total symlinks: 0

Replicated Blocks:
Total size: 108227 B
Total files: 36
Total blocks (validated): 26 (avg. block size 4162 B)
********************************
UNDER MIN REPL'D BLOCKS: 1 (3.8461537 %)
MINIMAL BLOCK REPLICATION: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 11 B
********************************
Minimally replicated blocks: 25 (96.15385 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (3.8461537 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.8846154
Missing blocks: 1
Corrupt blocks: 0
Missing replicas: 1 (1.9230769 %)

Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
FSCK ended at Sun Mar 03 19:48:31 CST 2019 in 100 milliseconds


The filesystem under path '/' is CORRUPT

三.手动修复hdfs debug

修复命令:

[root@cdh-node01 apps]# hdfs debug recoverLease -path /blockrecover/hello.txt -retries 10 recoverLease SUCCEEDED on /blockrecover/hello.txt
直接DN节点查看,block文件和meta文件恢复:

[root@cdh-node02 subdir0]# ll
total 8
-rw-r--r-- 1 root root 11 Mar 4 10:38 blk_1073741874
-rw-r--r-- 1 root root 11 Mar 4 10:38 blk_1073741874_1065.meta

四.自动修复

当数据块损坏后,DN节点执行directoryscan操作之前,都不会发现损坏;
也就是directoryscan操作是间隔6h
dfs.datanode.directoryscan.interval : 21600
在DN向NN进行blockreport前,都不会恢复数据块;
也就是blockreport操作是间隔6h
dfs.blockreport.intervalMsec : 21600000
当NN收到blockreport才会进行恢复操作。

总结:

生产上本人一般倾向于使用手动修复方式,但是前提要手动删除损坏的block块。
切记,是删除损坏block文件和meta文件,而不是删除hdfs文件。
当然还可以先把文件get下载,然后hdfs删除,再对应上传。
切记删除不要执行: hdfs fsck / -delete 这是删除损坏的文件, 那么数据不就丢了嘛;除非无所谓丢数据,或
者有信心从其他地方可以补数据到hdfs!

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇实时计算Flink > 独享模式 >.. 下一篇centos-7 部署hadoop2.5.1 >&g..

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目