首先需要明确的问题是:HDFS健康的标准是什么样的呢?
如果所有的文件满足最小副本的要求,那么就认为文件系统是健康的。
(HDFS is considered healthy if—and only if—all files have a minimum number of replicas available)
如何检查HDFS的健康情况呢?
hadoop提供了fsck tool来对整个文件系统或者单独的文件、目录来进行健康状态的检查。
低版本的命令为:sudo -u hdfs hadoop fsck /
高版本的命令为:sudo -u hdfs hdfs fsck /
<path> start checking from this path 指定要进行检查的路径
-move move corrupted files to /lost+found 将有问题的文件move到 /lost+found
-delete delete corrupted files 删除有问题的文件
-files print out files being checked 打印出正在被检查的文件
-openforwrite print out files opened for write 打印出正在被写入的文件
-includeSnapshots include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
-list-corruptfileblocks print out list of missing blocks and files they belong to 打印出missing的block,以及其对应的文件
-blocks print out block report 打印block报告 (需要和-files参数一起使用)
-locations print out locations for every block 打印每个block的位置信息(需要和-files参数一起使用)
-racks print out network topology for data-node locations 打印位置信息的网络拓扑图 (需要和-files参数一起使用)
Please Note:
1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status
2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same file present in both original fs tree and inside snapshots.