[Linux运维 -- 硬件]smartctl的使用
1. 是什么
常用的磁盘检查工具,smart(Self-Monitoring,Analysis and Reporting Technology)
2. 安装
(1)ubuntu
$ sudo apt-get install smartmontools
(2)rhat & Centos
$ yum install smartmontools
3. 使用
(1) 看磁盘是否支持smartctl
$ sudo smartctl -i /dev/sda1
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Constellation ES (SATA 6Gb/s)
Device Model: ST1000NM0011
Serial Number: Z1N0EVRZ
LU WWN Device Id: 5 000c50 03f123968
Firmware Version: SN02
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7202 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Aug 23 23:27:54 2015 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
最后两行给出了是否支持smartctl
(2)手动开启支持smartctl
$ smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda1
各个参数意思如下:
-s VALUE, --smart=VALUE
Enable/disable SMART on device (on/off)-o VALUE, --offlineauto=VALUE (ATA)
Enable/disable automatic offline testing on device (on/off)-S VALUE, --saveauto=VALUE (ATA)
Enable/disable Attribute autosave on device (on/off)
(3)检查磁盘的健康状况
$ sudo smartctl -H /dev/sda1
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
(4)显示磁盘的属性值
$ sudo smartctl -A /dev/sdl1
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 084 063 044 Pre-fail Always - 238687534
3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 3
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 573183052
9 Power_On_Hours 0x0032 063 063 000 Old_age Always - 33120
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 3
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 075 049 045 Old_age Always - 25 (Min/Max 20/30)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 567
194 Temperature_Celsius 0x0022 025 051 000 Old_age Always - 25 (0 20 0 0 0)
195 Hardware_ECC_Recovered 0x001a 120 099 000 Old_age Always - 238687534
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 2
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 2
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
基本上,SMART属性表列出了制造商在硬盘中定义好的属性值,以及这些属性相关的故障阈值。这个表由驱动固件自动生成和更新。
- ID: 属性值,通常是1到255之间的十进制数字
- ATTRIBUTE_NAME:制造商定义的属性值
- VALUE:这是表格中最重要的信息之一,代表给定属性的标准化值,在1到253之间。253意味着最好情况,1意味着最坏情况。取决于属性和制造商,初始化VALUE可以被设置成100或200.
- FLAG:属性操作标志
- THRESH: 在报告硬盘FAILED状态前,WORST可以允许的最小值
- TYPE: 属性的类型(Pre-fail或Oldage)。Pre-fail类型的属性可被看成一个关键属性,表示参与磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性故障,那么可视为磁盘将要发生故障。另一方面,Oldage类型的属性可被看成一个非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障。
- UPDATED: 表示属性的更新频率。Offline代表磁盘上执行离线测试的时间。
- WHEN_FAILED: 如果VALUE小于等于THRESH,会被设置成“FAILING_NOW”;如果WORST小于等于THRESH会被设置成“In_the_past”;如果都不是,会被设置成“-”。在“FAILING_NOW”情况下,需要尽快备份重要文件,特别是属性是Pre-fail类型时。“In_the_past”代表属性已经故障了,但在运行测试的时候没问题。“-”代表这个属性从没故障过。
- RAW_VALUE: 制造商定义的原始值,从VALUE派生。
(5)测试磁盘
- short 测试
$ sudo smartctl -t short /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Mon Aug 24 00:01:22 2015
Use smartctl -X to abort test.
- long测试
$ sudo smartctl -t long /dev/sda
- 看测试进度
$ sudo smartctl -l selftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 33120 -
- 停止测试
$ sudo smartctl -X /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Abort SMART off-line mode self-test routine".
Self-testing aborted!
参考:
(1) http://linux.cn/article-4682-1.html
(2) http://xmodulo.com/check-hard-disk-health-linux-smartmontools.html
(3) http://chaorenyong.blog.51cto.com/2163445/1051859
(4) http://bbs.chinaunix.net/thread-4132241-1-1.html