Fix Corrupt Blocks on HDFS

zoukankan html css js c++ java

Fix Corrupt Blocks on HDFS
来自：http://centoshowtos.org/hadoop/fix-corrupt-blocks-on-hdfs/

How do I know if my hadoop hdfs filesystem has corrupt blocks, and how do I fix it?

The easiest way to determine this is to run an fsck on the filesystem. If you have setup your hadoop environment variables you should be able to use a path of /, if not hdfs://ip.or.hostname:50070/.
```
hdfs fsck /
```
or.
```
hdfs fsck hdfs://ip.or.hostname:50070/
```
If the end of your output looks something like this, you have corrupt blocks on your fs.
```
.............................Status: CORRUPT
 Total size: 3453345169348 B (Total open files size: 664 B)
 Total dirs: 15233
 Total files: 14029
 Total symlinks: 0 (Files currently being written: 8)
 Total blocks (validated): 40961 (avg. block size 84308126 B) (Total open file blocks (not validated): 8)
 ********************************
 CORRUPT FILES: 2
 MISSING BLOCKS: 2
 MISSING SIZE: 15731297 B
 CORRUPT BLOCKS: 2
 ********************************
 Corrupt blocks: 2
 Number of data-nodes: 12
 Number of racks: 2
FSCK ended at Fri Mar 27 XX:03:21 UTC 201X in XXX milliseconds

The filesystem under path '/' is CORRUPT
```
How do I know which files have blocks that are corrupt?

The output of the fsck above will be very verbose, but it will mention which blocks are corrupt. We can do some grepping of the fsck above so that we aren't "reading through a firehose".
```
hdfs fsck / | egrep -v '^.+$' | grep -v replica | grep -v Replica
```
or
```
hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^.+$' | grep -v replica | grep -v Replica
```
This will list the affected files, and the output will not be a bunch of dots, and also files that might currently have under-replicated blocks (which isn't necessarily an issue). The output should include something like this with all your affected files.
```
/path/to/filename.fileextension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305

/path/to/filename.fileextension: MISSING 1 blocks of total size 15620361 B
```
The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?

If it's easy enough just to replace the file, that's the route I would take.

Remove the corrupted file from your hadoop cluster

This command will move the corrupted file to the trash.
```
hdfs dfs -rm /path/to/filename.fileextension
```
```
hdfs dfs -rm hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension
```
Or you can skip the trash to permanently delete (which is probably what you want to do)
```
hdfs dfs -rm -skipTrash /path/to/filename.fileextension
```
```
hdfs dfs -rm -skipTrash hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension
```
How would I repair a corrupted file if it was not easy to replace?

This might or might not be possible, but the first step would be to gather information on the file's location, and blocks.
```
hdfs fsck /path/to/filename/fileextension -locations -blocks -files
```
```
hdfs fsck hdfs://ip.or.hostname.of.namenode:50070/path/to/filename/fileextension -locations -blocks -files
```
From this data, you can track down the node where the corruption is. On those nodes, you can look through logs and determine what the issue is. If a disk was replaced, i/o errors on the server, etc. If possible to recover on that machine and get the partition with the blocks online that would report back to hadoop and the file would be healthy again. If that isn't possible, you will unforunately have to find another way to regenerate.
查看全文

相关阅读:
C# 操作DataTable
SQLSERVER 连接常见问题
 python 3 与python 2连接mongoDB的区别
 图片url 设置大小
 Python在VSCode环境抓取TuShare数据存入MongoDB环境搭建
 excel解决日常问题记录
 安装MAT内存分析工具独立版
 类加载机制介绍
 jvm启动语句
 linux监控系统语句

原文地址：https://www.cnblogs.com/sunxucool/p/5497820.html

Fix Corrupt Blocks on HDFS

How do I know if my hadoop hdfs filesystem has corrupt blocks, and how do I fix it?

If the end of your output looks something like this, you have corrupt blocks on your fs.

How do I know which files have blocks that are corrupt?

Remove the corrupted file from your hadoop cluster

How would I repair a corrupted file if it was not easy to replace?