hdfs du命令是算的一份数据

zoukankan html css js c++ java

hdfs du命令是算的一份数据

As you can see, hadoop fsck and hadoop fs -dus report the effective HDFS storage space used, i.e. they show the “normal” file size (as you would see on a local filesystem) and do not account for replication in HDFS. In this case, the directory path/to/directory has stored data with a size of 16565944775310 bytes (15.1 TB). Now fsck tells us that the average replication factor for all files in path/to/directory is exactly 3.0 This means that the total raw HDFS storage space used by these files – i.e. factoring in replication – is actually: 1
3.0 x 16565944775310 (15.1 TB) = 49697834325930 Bytes (45.2 TB)
This is how much HDFS storage is consumed by files in path/to/directory

hdfs du命令是算的一份数据

If you never change the default value of 3 for the HDFS replication count of any files you store in your Hadoop cluster, this means in a nutshell that you should always multiply the numbers reported by hadoop fsck or hadoop fs -dus times 3 when you want to reason about HDFS space quotas.

参考：

http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/

stackoverflow也有回答

https://stackoverflow.com/questions/11574410/how-to-find-the-size-of-a-hdfs-file

hadoop fs -dus /user/frylock/input
and you would get back the total size (in bytes) of all of the files in the "/user/frylock/input" directory.

Also, keep in mind that HDFS stores data redundantly so the actual physical storage used up by a file might be 3x or more than what is reported by hadoop fs -ls and hadoop fs -dus.

du得出的是一份数据。如果要得到数据存储空间就是得到平均副本数，然后平均副本数 * du得到的大小就是数据占空间大小。

查看全文

相关阅读:
CSS未知宽高元素水平垂直居中
 CSS（二）
CSS（一）
菜鸟学JS&JQuery（随笔二——jQuery提供的选择器、修改一个标签中的内容、操作标签的类属性）
菜鸟学JS&JQuery（随笔一）
webpack loader的加载顺序（从右向左，从下向上）
padStart()方法,padEnd()方法
 说说 Vue.js 中的 v-cloak 指令
 用阿里云 DNS SDK 实现动态域名解析
 使用阿里云 dns sdk 解决电信公网ip自动变化问题；自己动手实现ddns

原文地址：https://www.cnblogs.com/bonelee/p/6955861.html