hadoop的datanode多磁盘空间处理

link：http://hi.baidu.com/wisejenny/item/c199beb87219c0f462388e96

hadoop-0.20.2 测试修改hdfs-site.xml:添加

<property>
<name>dfs.datanode.du.reserved</name>
<value>53687091200</value>
<description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
</description>
</property>

以下转自网友：

最开始安装hadoop集群的时候每台机器的磁盘只有260+G大小，使用了一段时间之后发现磁盘空间占满了，于是每个datanode又添加了两块2T的磁盘，通过hdfs-site.xml中的dfs.datanode.data.dir配置项通过逗号分隔将新添加的磁盘添加到datanode磁盘中。

添加之后问题有出现了，hadoop并不会自动将数据写到有更大空闲空间的磁盘中，还是会将之前的小磁盘写满，小磁盘写满会使mapreduce产生的临时文件没有空间写，而导致mapreduce执行失败。所以需要小磁盘留有一定的空闲空间，查看hadoop资料，设置 dfs.datanode.du.reserved配置项可以是每个磁盘保留相应的磁盘空间单位使用bytes，但是我设置之后发现其没有起作用，我使用的hadoop版本是cloudera的cdh3u3。

没有办法，只能继续查看资料，hadoop faq中写道：

3.12. On an individual data node, how do you balance the blocks on the disk?

Hadoop currently does not have a method by which to do this automatically. To do this manually:

Take down the HDFS
Use the UNIX mv command to move the individual blocks and meta pairs from one directory to another on each host
Restart the HDFS

对于1）停止hdfs，只需要停止datanode，使用命令$HADOOP_HOME/bin/hadoop-daemon.sh stop datanode

对于2）必须是dfs.data.dir目录下current目录的子目录 mv /mnt/exdata/dev1/cloudera/dfs/dn/current/subdir11/* /mnt/exdata/dev2/cloudera/dfs/dn/current/subdir11

对于3）$HADOOP_HOME/bin/hadoop-daemon.sh start datanode