zoukankan      html  css  js  c++  java
  • Ceph更换OSD磁盘

    简介

    首先需要说明的是,ceph的osd是不建议做成raid10或者raid5的,一般建议单盘跑。在我们的环境中,为了充分利用raid卡的缓存,即使是单盘,我们还是将其挂在raid卡下做成raid0。

    这样不可避免的问题就是磁盘的损坏,需要在ceph当中做一些摘除动作,同时还需要重建raid。

    在更换完磁盘重建raid之后,需要重新添加osd。新的osd加入到集群后,ceph还会自动进行数据恢复和回填的过程。我们还需要通过调整一些数据恢复和回填的参数来控制其恢复速度。

    下面是详细的说明。

    更换OSD操作步骤

    1. 故障磁盘定位

    一般来讲,通过硬件监控,我们能感知到磁盘故障。但是故障的磁盘对应于系统中的哪一个盘符却没法知道。

    我们可以通过检查dmesg日志来确认:

    [4814427.336053] print_req_error: 5 callbacks suppressed[]
    [4814427.336055] print_req_error: I/O error, dev sdj, sector 0
    [4814427.337422] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
    [4814427.337432] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [4814427.337434] print_req_error: I/O error, dev sdj, sector 0
    [4814427.338901] buffer_io_error: 4 callbacks suppressed
    [4814427.338904] Buffer I/O error on dev sdj, logical block 0, async page read
    [4814749.780689] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
    [4814749.780694] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [4814749.780697] print_req_error: I/O error, dev sdj, sector 0
    [4814749.781903] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
    [4814749.781905] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [4814749.781906] print_req_error: I/O error, dev sdj, sector 0
    [4814749.783105] Buffer I/O error on dev sdj, logical block 0, async page read
    

    在我们的日志当中,可以看到,故障盘是/dev/sdj

    接着,我们需要确认/dev/sdj对应的osd,在ceph L版本中,默认使用了bluestore,挂载变成了如下方式:

    root@ctnr:~# df -hT
    Filesystem     Type      Size  Used Avail Use% Mounted on
    ...
    tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-2
    tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-3
    tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-5
    tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-6
    tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-7
    tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-8
    

    所以没有办法通过这种方式直接查看到某个osd对应的磁盘。

    可通过如下操作查看每块磁盘对应的lvm:

    root@ctnr:~# lsblk 
    NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    sdf                                                                                                     8:80   0   1.8T  0 disk 
    └─ceph--295361e9--45ed--4f85--be6a--a3eb06ba8341-osd--block--e2e485b7--65c0--49ad--a37c--24eaefbc3343 253:4    0   1.8T  0 lvm  
    sdd                                                                                                     8:48   0   1.8T  0 disk 
    └─ceph--20b494d7--bcd0--4f60--bee0--900edd843b26-osd--block--620cf64c--e76a--44d4--b308--87a0e78970cb 253:2    0   1.8T  0 lvm  
    sdb                                                                                                     8:16   0   1.8T  0 disk 
    └─ceph--1c9e3474--e080--478c--aa50--d9e2cc9900e1-osd--block--33dccd23--a7c4--416d--8a22--1787f98c243f 253:0    0   1.8T  0 lvm  
    sdk                                                                                                     8:160  0 476.4G  0 disk 
    └─ceph--a3f4913b--d3e1--4c51--9d4d--87340e1d4271-osd--block--f9d7958b--8a66--41e4--8964--8e5cb95e6d09 253:9    0 476.4G  0 lvm  
    sdg                                                                                                     8:96   0   1.8T  0 disk 
    └─ceph--36092d1e--4e85--49a1--8378--14b432d1c3d0-osd--block--9da0cba0--0a12--4e32--bed6--438f4db71e69 253:5    0   1.8T  0 lvm  
    sde                                                                                                     8:64   0   1.8T  0 disk 
    └─ceph--a21e1b26--0c40--4a36--b6ad--39a2b9920fe7-osd--block--b55e0ccd--cd1e--4067--9299--bb709e64765b 253:3    0   1.8T  0 lvm  
    sdc                                                                                                     8:32   0   1.8T  0 disk 
    └─ceph--5ac4fc0f--e517--4a0b--ba50--586707f582b4-osd--block--ab1cb37e--6612--4d18--a045--c2375af9012c 253:1    0   1.8T  0 lvm  
    sda                                                                                                     8:0    0   3.7T  0 disk 
    ├─sda2                                                                                                  8:2    0 279.4G  0 part /
    ├─sda3                                                                                                  8:3    0   3.4T  0 part /home
    └─sda1                                                                                                  8:1    0     1M  0 part 
    sdj                                                                                                     8:144  0 476.4G  0 disk 
    └─ceph--9c93296c--ff24--4ed7--8227--eae40dda38fc-osd--block--5ea3c735--3770--4b42--87aa--12bbe9885bdb 253:8    0 476.4G  0 lvm 
    

    然后通过如下操作查看所有osd对应的lvm:

    root@ctnr:~# ll /var/lib/ceph/osd/ceph-*/block
    lrwxrwxrwx 1 ceph ceph 93 Jun 18 18:49 /var/lib/ceph/osd/ceph-10/block -> /dev/ceph-a3f4913b-d3e1-4c51-9d4d-87340e1d4271/osd-block-f9d7958b-8a66-41e4-8964-8e5cb95e6d09
    lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:18 /var/lib/ceph/osd/ceph-2/block -> /dev/ceph-1c9e3474-e080-478c-aa50-d9e2cc9900e1/osd-block-33dccd23-a7c4-416d-8a22-1787f98c243f
    lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:19 /var/lib/ceph/osd/ceph-3/block -> /dev/ceph-5ac4fc0f-e517-4a0b-ba50-586707f582b4/osd-block-ab1cb37e-6612-4d18-a045-c2375af9012c
    lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:19 /var/lib/ceph/osd/ceph-5/block -> /dev/ceph-20b494d7-bcd0-4f60-bee0-900edd843b26/osd-block-620cf64c-e76a-44d4-b308-87a0e78970cb
    lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:20 /var/lib/ceph/osd/ceph-6/block -> /dev/ceph-a21e1b26-0c40-4a36-b6ad-39a2b9920fe7/osd-block-b55e0ccd-cd1e-4067-9299-bb709e64765b
    lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:20 /var/lib/ceph/osd/ceph-7/block -> /dev/ceph-295361e9-45ed-4f85-be6a-a3eb06ba8341/osd-block-e2e485b7-65c0-49ad-a37c-24eaefbc3343
    lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:21 /var/lib/ceph/osd/ceph-8/block -> /dev/ceph-36092d1e-4e85-49a1-8378-14b432d1c3d0/osd-block-9da0cba0-0a12-4e32-bed6-438f4db71e69
    lrwxrwxrwx 1 ceph ceph 93 Jun 18 18:49 /var/lib/ceph/osd/ceph-9/block -> /dev/ceph-9c93296c-ff24-4ed7-8227-eae40dda38fc/osd-block-5ea3c735-3770-4b42-87aa-12bbe9885bdb
    

    通过对比lvm的名称,即可确定故障磁盘对应的osd

    2. 摘除故障磁盘

    通过上面的方法确认故障磁盘及其对应的osd后,我们需要执行相应的摘除操作:

    1. 从ceph中删除相应的osd
    # 在monitor上操作
    ceph osd out osd.9
    # 在相应的节点机上停止服务
    ceph stop ceph-osd@9
    # 在monitory上操作
    ceph osd crush remove osd.9
    ceph auth del osd.9
    ceph osd rm osd.9
    
    1. 卸载磁盘
    umount /var/lib/ceph/osd/ceph-9
    

    3. 重建raid0

    重建raid需要依赖mega工具包,下面是ubuntu上安装的示例:

    wget -O - http://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -
    echo "deb http://hwraid.le-vert.net/ubuntu precise main" >> /etc/apt/sources.list
    apt-get update
    apt-get install megacli megactl megaraid-status
    

    查看raid状态:

    megacli -PDList -aALL | egrep 'Adapter|Enclosure|Slot|Inquiry|Firmware'
    
    Adapter #0
    ...
    Enclosure Device ID: 32
    Slot Number: 9
    Enclosure position: 1
    Firmware state: Online, Spun Up
    Device Firmware Level: GS0F
    Inquiry Data: SEAGATE ST2000NM0023    GS0FZ1X2Q5P6     
    
    Enclosure Device ID: 32
    Slot Number: 10
    Enclosure position: 1
    Firmware state: Unconfigured(good), Spun Up
    Device Firmware Level: 004C
    Inquiry Data: PHLA914001Y6512DGN  INTEL SSDSC2KW512G8                      LHF004C
    
    

    相关说明:

    • Adapter:代表raid控制器编号
    • Enclosure Device ID:硬盘盒id
    • Slot Number:插槽编号
    • Firmware state:固件的状态。Online, SpunUP代表正常状态,Unconfigured(good), Spun Up代表非正常状态

    我们需要对非正常状态的磁盘重建raid:

    # 对硬盘盒id为32、插槽号为10的硬盘做raid0
    
    root@ctnr:~# megacli  -CfgLdAdd -r0'[32:10]' -a0 
                                         
    Adapter 0: Created VD 7
    
    Adapter 0: Configured the Adapter!!
    
    

    这个时候再通过fdisk -l就可以看到新添加的磁盘了

    fdisk -l 
    
    ...
    Disk /dev/sdj: 476.4 GiB, 511503761408 bytes, 999030784 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    

    4. 重建osd

    ceph-deploy disk list ctnr.a1-56-14.pub.unp
    ceph-deploy disk zap ctnr.a1-56-14.pub.unp /dev/sdj
    ceph-deploy osd create --data /dev/sdj ctnr.a1-56-14.pub.unp
    

    控制数据恢复及回填速度

    # 将用于数据恢复操作的优先级提到最高级别
    ceph tell osd.* injectargs "--osd_recovery_op_priority=63"
    
    # 将用于client I/O操作的优先级降到3
    ceph tell osd.* injectargs "--osd_client_op_priority=3"
    
    # 将每个osd上用于回填并发操作数由默认的1调整到50
    ceph tell osd.* injectargs "--osd_max_backfills=50"
    
    # 将每个osd上用于恢复的并发操作数由默认的3调整到50
    ceph tell osd.* injectargs "--osd_recovery_max_active=50"
    
    # 将每个osd上用于执行恢复的线程数由默认的1调整到10
    ceph tell osd.* injectargs "--osd_recovery_threads=10"
    

    注意:以上所有操作都是为了尽快恢复数据,在数据恢复完成以后,需要再调整回来,如果在恢复过程中仍然需要优先保证client服务质量,可不做相关调整,保持默认值即可

  • 相关阅读:
    June. 26th 2018, Week 26th. Tuesday
    June. 25th 2018, Week 26th. Monday
    June. 24th 2018, Week 26th. Sunday
    June. 23rd 2018, Week 25th. Saturday
    June. 22 2018, Week 25th. Friday
    June. 21 2018, Week 25th. Thursday
    June. 20 2018, Week 25th. Wednesday
    【2018.10.11 C与C++基础】C Preprocessor的功能及缺陷(草稿)
    June.19 2018, Week 25th Tuesday
    June 18. 2018, Week 25th. Monday
  • 原文地址:https://www.cnblogs.com/breezey/p/11080534.html
Copyright © 2011-2022 走看看