zoukankan      html  css  js  c++  java
  • Etcd常用运维命令

    常用命令

    #查看集群member情况

    etcdctl --endpoints=${exist-advertise-peer-urls} member list 
    

    #动态扩容

    etcdctl --endpoints=${exist-advertise-peer-urls} member add infra4 --peer-urls=${new-advertise-peer-urls}
    

    #运行时缩容

    etcdctl --endpoints=${exist-advertise-peer-urls} member remove ${cluster_id}
    

    常见操作

    如何缩容?

    使用member remove命令进行缩容

    如何扩容?

    使用member add命令进行扩容。控制台会输出如下内容(新节点加入集群的重要启动参数):

    启动新实例的参数:--name、--initial-advertise-peer-urls、--initial-cluster-state、--initial-cluster必须和控制台输出保持一致,否则启动失败。

    启动新实例的参数:--name、--initial-advertise-peer-urls、--initial-cluster-state、--initial-cluster必须和控制台输出保持一致,否则启动失败。
    
    参数详解:
    --initial-cluster-state:
    设置成existing,必须确保在启动时候其他member是存活的(peer端口),否则启动失败。用在扩容新实例的启动。
    设置成new,用在cluster已知member的启动。
    

    新节点加入集群的重要启动参数,按照参数去启动:

    ETCD_NAME="infra1"
    ETCD_INITIAL_CLUSTER="infra3=http://127.0.0.1:32380,infra2=http://127.0.0.1:22380,infra1=http://127.0.0.1:12380"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://127.0.0.1:12380"
    ETCD_INITIAL_CLUSTER_STATE="existing"
    

    示例:

    etcd 
    --name ${ETCD_NAME} 
    --listen-client-urls http://127.0.0.1:42379 
    --advertise-client-urls http://127.0.0.1:42379 
    --listen-peer-urls http://127.0.0.1:42380 
    --initial-advertise-peer-urls ${ETCD_INITIAL_ADVERTISE_PEER_URLS} 
    --initial-cluster-state ${ETCD_INITIAL_CLUSTER_STATE} 
    --initial-cluster ${ETCD_INITIAL_CLUSTER}
    
    

    数据目录丢失或被误删除,节点启动失败或者加入集群报错?

    操作步骤

    member信息会持久化到磁盘上,数据丢失的节点必须以新的member身份加入,必须严格按照如下操作:

    1. 移除failure节点:使用member remove命令剔除错误节点。保证当前集群的健康状况。

    2. 彻底清理数据目录:错误节点必须停止,然后删除data dir。保证member信息被清理干净,清空member目录。

    3. 集群扩容:使用member add命令添加步骤1的错误节点。参考3.2。

    4. 重新启动:步骤1的错误节点进行启动,参考3.2

    操作步骤不正确的各种常见错误日志

    数据丢失后,启动参数使用 --initial-cluster-state="new",错误日志如下,提示:member ddd67b312462fd7b has already been bootstrapped

    
    2019-07-09 00:24:55.880988 I | etcdmain: etcd Version: 3.3.10
    2019-07-09 00:24:55.881077 I | etcdmain: Git SHA: 27fc7e2
    2019-07-09 00:24:55.881082 I | etcdmain: Go Version: go1.10.4
    2019-07-09 00:24:55.881089 I | etcdmain: Go OS/Arch: darwin/amd64
    2019-07-09 00:24:55.881093 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
    2019-07-09 00:24:55.881099 N | etcdmain: failed to detect default host (default host not supported on darwin_amd64)
    2019-07-09 00:24:55.881106 W | etcdmain: no data-dir provided, using default data-dir ./infra1.etcd
    2019-07-09 00:24:55.881236 I | embed: listening for peers on http://127.0.0.1:12380
    2019-07-09 00:24:55.881254 I | embed: pprof is enabled under /debug/pprof
    2019-07-09 00:24:55.881299 I | embed: listening for client requests on 127.0.0.1:2380
    2019-07-09 00:24:55.883626 C | etcdmain: member ddd67b312462fd7b has already been bootstrapped
    

    数据丢失后,启动参数使用 --initial-cluster-state="existing",错误日志如下,提示:Was the raft log corrupted, truncated, or lost?

    tocommit(10) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
    panic: tocommit(10) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
    goroutine 135 [running]:
    github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc42000a660, 0x1c0cad8, 0x5d, 0xc42000a160, 0x2, 0x2)
        /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x162
    github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc420277500, 0xa)
        /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/log.go:191 +0x15c
    github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc420244300, 0x8, 0xddd67b312462fd7b, 0x9e737febb6b99eee, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:1194 +0x54
    github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.stepFollower(0xc420244300, 0x8, 0xddd67b312462fd7b, 0x9e737febb6b99eee, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:1140 +0x3ff
    github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).Step(0xc420244300, 0x8, 0xddd67b312462fd7b, 0x9e737febb6b99eee, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:868 +0x12f1
    github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*node).run(0xc4201df320, 0xc420244300)
        /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/node.go:323 +0x1059
    Nov  9 19:14:20 kubernetes-65 systemd: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
    Nov  9 19:14:20 kubernetes-65 systemd: Failed to start Etcd Server.
    Nov  9 19:14:20 kubernetes-65 systemd: Unit etcd.service entered failed state.
    Nov  9 19:14:20 kubernetes-65 systemd: etcd.service failed.
    
    

    步骤中1和3正确执行,而遗漏步骤2并且中间有错误启动,使得磁盘留有错误member信息。错误日志如下,提示:

    2019-07-09 01:24:19.311630 E | rafthttp: failed to find member 9e737febb6b99eee in cluster 73841b4a9097c907
    2019-07-09 01:24:19.311710 E | rafthttp: failed to find member 628170c800dbcee in cluster 73841b4a9097c907
    2019-07-09 01:24:19.410573 E | rafthttp: failed to find member 9e737febb6b99eee in cluster 73841b4a9097c907
    2019-07-09 01:24:19.410616 E | rafthttp: failed to find member 628170c800dbcee in cluster 73841b4a9097c907
    2019-07-09 01:24:19.410678 E | rafthttp: failed to find member 9e737febb6b99eee in cluster 73841b4a9097c907
    2019-07-09 01:24:19.410767 E | rafthttp: failed to find member 628170c800dbcee in cluster 73841b4a
    
  • 相关阅读:
    【转】DirectoryEntry.Properties属性的遍历
    mysql 插入优化
    MySQL错误无法启动1067
    用ADO.NET的ExecuteScalar方法返回单一值资讯动态
    poj 1416 Shredding Company
    poj 1724 ROADS
    poj 3411 Paid Roads
    poj 1129 Channel Allocation
    poj 2676 Sudoku
    spring里的applicationlisener
  • 原文地址:https://www.cnblogs.com/Serverlessops/p/13289455.html
Copyright © 2011-2022 走看看