zoukankan      html  css  js  c++  java
  • RHCE ext3文件系统故障一例

    好久没来了,博客长草了,我来除除草。

    给我分了两人,一个统招,一个Java两年开发经验的社招,让我这从工具平台运维往Python开发方向转的工作是举步维艰啊~

    领导看人还是真特么的不准,希望今年招聘的两位童鞋能来啊~

    昨天下午,某客户打来电话,说文件系统只读,无法写入内容,导致系统无法正常使用——说实在的,现在转行做开发,问题接触得少(因为公司主要用Windows系统),所以也没听说过。

    后来客户提供了账号密码,远程再远程登录上去(相当不稳定,看了不到20分钟的日志,我特么就登录了不下5次)查看dmesg和messages文件内容:

    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_free_blocks_sb: bit already cleared for block 154861
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_free_blocks_sb: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_free_blocks_sb: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_truncate: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_orphan_del: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_delete_inode: Journal has aborted
    May 7 01:46:36 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_journal_start_sb: Detected aborted journal
    May 7 13:37:08 hostserver1 kernel: usb 1-1: device not accepting address 2, error -71
    May 7 13:37:35 hostserver1 kernel: EXT3-fs warning (device dm-10): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
    May 7 13:37:35 hostserver1 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_free_blocks_sb: bit already cleared for block 131687
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_free_blocks_sb: Journal has aborted
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_free_blocks_sb: Journal has aborted
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_orphan_del: Journal has aborted
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_truncate: Journal has aborted
    May 7 13:49:23 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_journal_start_sb: Detected aborted journal
    May 7 13:50:26 hostserver1 kernel: batchtrans[6636]: segfault at 0000000000000001 rip 000000361005e587 rsp 00007fffc77df470 error 4
    May 7 13:56:15 hostserver1 kernel: batchtrans[18111]: segfault at 0000000000000001 rip 000000361005e587 rsp 00007fff62ef30e0 error 4
    May 7 14:08:02 hostserver1 kernel: EXT3-fs warning (device dm-10): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
    May 7 14:08:02 hostserver1 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    May 7 14:30:40 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_free_blocks_sb: bit already cleared for block 131731
    May 7 14:30:40 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 14:30:40 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 14:30:40 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_orphan_del: Journal has aborted
    May 7 14:30:40 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_truncate: Journal has aborted
    May 7 14:30:40 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_journal_start_sb: Detected aborted journal
    May 7 16:28:20 hostserver1 kernel: usb 1-1: device not accepting address 2, error -71
    May 7 16:28:40 hostserver1 kernel: EXT3-fs warning (device dm-10): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
    May 7 16:28:40 hostserver1 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    May 7 20:08:24 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_free_blocks_sb: bit already cleared for block 20605365
    May 7 20:08:24 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 20:08:24 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_reserve_inode_write: Journal has aborted
    May 7 20:08:24 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_orphan_del: Journal has aborted
    May 7 20:08:24 hostserver1 kernel: EXT3-fs error (device dm-10) in ext3_truncate: Journal has aborted
    May 7 20:08:24 hostserver1 kernel: EXT3-fs error (device dm-10): ext3_journal_start_sb: Detected aborted journal
    May 7 22:38:11 hostserver1 kernel: EXT3-fs warning (device dm-10): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
    May 7 22:38:11 hostserver1 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended

    外事不决问Google,技术问题当然也是——查得是因为Ext3日志型文件系统的原因,重启后没有进行磁盘检查,日志与文件数据不一致,累积多了文件系统就锁定了,只需要进行e2fsck修复即可。

    device dm-10是什么意思呢?对应的是一个磁盘设备,用下面的命令判断吧:

    lvdisplay|awk  '/LV Name/{n=$3} /Block device/{d=$3; sub(".*:","dm-",d); print d,n;}' | grep dm-10

    实际上就是Block device对应的“:”后的内容。

    处理办法也很简单:

    1. 备份分区数据,并进行修复分区操作,建议使用e2fsck修复。
    2. 如果此方法无效,建议卸载分区后重新格式化使用。
    3. 上述方法无效,估计硬件有问题了?可惜没到这一步。

    很简单的问题处理了6个小时,原因是客户那边的系统管理员对系统不熟,业务系统并没有全关,导致数据备份做了很多次才OK。

    另外,处理过程中客户不断强调要无损数据,但是文件系统已经受损,你让我怎么无损?幸好被开掉的前系统管理员来了,确认这个分区的文件都有备份,这才大胆的进行检查和恢复。

    ps:整个过程中,感受到前系统管理员在某些方面经验都特么的比我丰富……泪奔啊

  • 相关阅读:
    Spring配置文件中的那些标签意味着什么(持续更新)
    转 spring配置文件
    Web.xml配置详解之context-param
    web.xml 中的listener、 filter、servlet 加载顺序及其详解
    spring mvc 中web.xml配置信息解释
    转 一个web项目web.xml的配置中<context-param>配置作用
    在web.xml中通过contextConfigLocation配置spring
    (转)web.xml中的contextConfigLocation在spring中的作用
    Android菜鸟的成长笔记(20)——IntentService
    PhotoSwipe源码解读系列(二)
  • 原文地址:https://www.cnblogs.com/rexkang/p/Linux-logicalvolume-readonly-caused-by-ext3_journal_err.html
Copyright © 2011-2022 走看看