zoukankan      html  css  js  c++  java
  • 解决异常断电导致的: CorruptSSTableException: java.io.EOFException

    问题产生

    服务器重启,导致cassandra损坏,整个集群不可用。所使用的cassandra为2.1.9版本。

    问题描述

    运行启动命令,报错如下:
    DEBUG 07:51:03 All segments have been unmapped successfully
    INFO  07:51:03 Opening ./../data/data/system/size_estimates-618f817b005f3678b8a453f3930b8e86/system-size_estimates-ka-7382 (1293711 bytes)
    ERROR 07:51:03 Exiting forcefully due to file system exception on startup, disk failure policy "stop"
    org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
        at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_45]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
    Caused by: java.io.EOFException: null
        at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.8.0_45]
        at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.8.0_45]
        at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.8.0_45]
        at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106) ~[apache-cassandra-2.1.9.jar:2.1.9]
        ... 14 common frames omitted
    DEBUG 07:51:03 INDEX LOAD TIME for ./../data/data/system/size_estimates-618f817b005f3678b8a453f3930b8e86/system-size_estimates-ka-7382: 0 ms.
    DEBUG 07:51:03 Load metadata for ./../data/data/system/size_estimates-618f817b005f3678b8a453f3930b8e86/system-size_estimates-ka-7381
    INFO  07:51:03 Opening ./../data/data/system/size_estimates-618f817b005f3678b8a453f3930b8e86/system-size_estimates-ka-7381 (1288730 bytes)
    DEBUG 07:51:03 INDEX LOAD TIME for ./../data/data/system/size_estimates-618f817b005f3678b8a453f3930b8e86/system-size_estimates-ka-7381: 0 ms.

    解决方案

    1. 在正常节点上执行(节点需要启动)

    ./nodetool ring | grep 192.168.66.149 | awk '{print $NF ","}' | xargs
     

    返回如下:

    -9175319402198777604, -9170966623369513088, -9066395509261047988, -9061308135820417583, -8987840430409870999, -8923895599110112842, -8831790480138023745, -8707928721987809356, -8572722604267862153, -8547968984009501919, -8491811255912818550, -8309384982272297324, -8283336608511152755, -8160022901417917666, -7968412688633895882, -7910443224642539468, -7837325178316934917, -7687171165864820298, -7669489855885759411, -7555364119816090117, -7540164571402941309, -7531761820743388069, -7374720538004749334, -7358613146959565416, -7321163556942690092, -7296094605964368489, -7191495439272779345, -7183404435626538766, -7162696106328731822, -7143626274246159491, -7010155548945640294, -6988139514282290305, -6986120655310826238, -6951830604413298153, -6934301930217833958, -6866660920654049232, -6829611593598277494, -6783086415918273881, -6764819745683402811, -6661008162116205739, -6620822761899368284, -6572907054252526945,

    2. 在损坏节点上修改配置文件:cassandra.yaml,并加入如下配置项:

    auto_bootstrap: false
    
    initial_token: -9175319402198777604, -9170966623369513088, -9066395509261047988, -9061308135820417583, -8987840430409870999, -8923895599110112842, -8831790480138023745, -8707928721987809356, -8572722604267862153, -8547968984009501919,......(后面的省略...)
     
    注意:initial_token的取值为上一步骤的返回值。

    3. 删除数据盘下system目录

    如/usr/local/cassandra2/apache-cassandra-2.1.9/data/data/system

    4. 启动cassandra

    ./cassandra
     
    启动过程可能会报错,但会继续重建system库,只要能启动成功加入集群就算正常。

    5. 修复数据

    运行nodetool工具:
    nodetool repair

    6. 将配置项改回原样并重启

    本解决方案参考自:/usr/local/cassandra2/apache-cassandra-2.1.9/data/data/system

  • 相关阅读:
    stress工具使用指南和结果分析
    copy.c实现
    sysbench测试阿里云CPU
    sysbench测试阿里云ECS云磁盘的IOPS,吞吐量
    iostat详解
    sysbench_fileio.sh
    rm -f /var/lib/rpm/__db*;rpm --rebuilddb
    HeadFirst 13 (包装器, 过滤器) not Finish
    基于Linux的oracle数据库管理 part5( linux启动关闭 自动启动关闭 oracle )
    基于Linux的oracle数据库管理 part4( shell管理 上 )
  • 原文地址:https://www.cnblogs.com/jiyuqi/p/8308663.html
Copyright © 2011-2022 走看看