cephfs删除报nospace的问题

zoukankan html css js c++ java

cephfs删除报nospace的问题
ceph Vol 45 Issue 2

CephFS: No space left on device

After upgrading to 10.2.3 we frequently see messages like

'rm: cannot remove '...': No space left on device

The folders we are trying to delete contain approx. 50K files 193 KB each.

The cluster state and storage available are both OK:

cluster 98d72518-6619-4b5c-b148-9a781ef13bcb
health HEALTH_WARN
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure
monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0}
election epoch 11, quorum 0 000-s-ragnarok
fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active}
osdmap e20203: 16 osds: 16 up, 16 in
flags sortbitwise
pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects
23048 GB used, 6745 GB / 29793 GB avail
1085 active+clean
2 active+clean+scrubbing
1 active+clean+scrubbing+deep

Has anybody experienced this issue so far?

这个问题是作者在升级了一个集群以后（jewel 10.2.3），做删除的时候，发现提示了 No space left on device，按正常的理解做删除不会出现提示空间不足

这个地方的原因是，有一个参数会对目录的entry做一个最大值的控制 mds_bal_fragment_size_max ,而这个参数实际上在做删除操作的时候，当文件被unlink的时候，被放入待删除区的时候，这个也是被限制住的，所以需要调整这个参数，如果有上百万的文件被等待删除的时候，可能就会出现这个情况,并且出现 failing to respond to cache pressure 我们根据自己的需要去设置这个值

默认的 mds_bal_fragment_size_max=100000，也就是单个目录10万文件，如果不调整，单目录写入10万文件就能出现上面的问题，根据需要调大这个值

这个地方可以用命令来监控mds的当前状态
```
[root@lab8106 mnt]# ceph daemonperf mds.lab8106
-----mds------ --mds_server-- ---objecter--- -----mds_cache----- ---mds_log---- 
rlat inos caps|hsr  hcs  hcr |writ read actv|recd recy stry purg|segs evts subm|
  0  163k   5 |  0    0    0 |  0    0   36 |  0    0  145k   0 | 33   29k   0 
  0  163k   5 |  0    0    0 |  6    0   34 |  0    0  145k   6 | 33   29k   6 
  0  163k   5 |  0    0    0 | 24    0   32 |  0    0  145k  24 | 32   29k  24 
  0  163k   5 |  0    0    0 | 42    0   32 |  0    0  145k  42 | 32   29k  42 
  0  159k   5 |  0    0    0 |972    0   32 |  0    0  144k 970 | 33   27k 971 
  0  159k   5 |  0    0    0 |905    0   32 |  0    0  143k 905 | 31   28k 906 
  0  159k   5 |  0    0    0 |969    0   32 |  0    0  142k 969 | 32   29k 970 
  0  159k   5 |  0    0    0 |601    0   31 |  0    0  141k 601 | 33   29k 602
```
这个地方还有一个硬链接删除以后没有释放stry的问题，最新版的master里面已经合进去了代码（scan_link）

修复过程如下
执行flush MDS journal
```
ceph daemon mds.xxx flush journal 
```
停止掉所有mds
```
stop all mds
```
执行
```
cephfs-data-scan scan_links
```
重启mds
```
restart mds
```
执行命令
```
ceph daemon mds.x scrub_path / recursive repair
```
执行完了以后去对目录进行一次ll，可以看到mds_cache的stry的就会被清理干净了

这个问题就可以解决了,实际测试中在换了新版本以后，重启后然后进行目录的ll，也能清空stry
查看全文

相关阅读:
android.os.FileUriExposedException: exposed beyond app through ClipData.Item.getUri()踩坑记录
 【Android】 recycleview显示空白踩坑
 eclipse导入新项目，文件没有报错，但项目名一直报红
 【原因分析】The superclass "javax.servlet.http.HttpServlet" was not found on the Java Bu
SSM框架的搭建-xml文件报错-.xsd文件缺失
 命令行安装matplotlib中遇到问题及解决（简单记录）
eclipse里tomcat子容器缺失解决
 导入javaFX
ERROR 1290 (HY000): The MySQL server is running with the --skip-grant-tables option so it cannot execute this statement
eclipse左边目录（Project Explorer）不小心删除后怎么找回来

原文地址：https://www.cnblogs.com/zphj1987/p/13575384.html

cephfs删除报nospace的问题

ceph Vol 45 Issue 2

CephFS: No space left on device