mesos slave启动失败,查看状态如下:
# systemctl status mesos-slave ● mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Sat 2019-12-28 21:41:50 CST; 13s ago Process: 15627 ExecStart=/usr/bin/mesos-init-wrapper slave (code=exited, status=1/FAILURE) Main PID: 15627 (code=exited, status=1/FAILURE) Dec 28 21:41:50 test-003 systemd[1]: Unit mesos-slave.service entered failed state. Dec 28 21:41:50 test-003 systemd[1]: mesos-slave.service failed.
查看mesos-slave日志如下:
# journalctl -u mesos-slave -f -n 300 ... Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.604262 15978 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.604274 15978 group.cpp:419] Trying to create path '/mesos' in ZooKeeper Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.602254 15961 slave.cpp:615] Agent resources: [{"name":"ports","ranges":{"range":[{"begin":80,"end":60000}]},"type":"RANGES"},{"name":"cpus","scalar":{"value":8.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":30987.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":95544.0},"type":"SCALAR"}] Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.605845 15961 slave.cpp:623] Agent attributes: [ ] Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.605868 15961 slave.cpp:632] Agent hostname: test003 Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.605935 15977 task_status_update_manager.cpp:181] Pausing sending task status updates Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.606037 15982 detector.cpp:152] Detected a new leader: (id='79') Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.606160 15981 group.cpp:700] Trying to get '/mesos/json.info_0000000079' in ZooKeeper Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.607014 15975 state.cpp:66] Recovering state from '/var/lib/mesos/meta' Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.607070 15975 state.cpp:742] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info' Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.607249 15981 zookeeper.cpp:262] A new leading master (UPID=master@192.168.0.1:5050) is detected Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.646075 15979 slave.cpp:6951] Finished recovering checkpointed state from '/var/lib/mesos/meta', beginning agent recovery Dec 28 21:42:10 test-003 mesos-slave[15974]: E1228 21:42:10.649549 15979 slave.cpp:7311] EXIT with status 1: Failed to perform recovery: Incompatible agent info detected. Dec 28 21:42:10 test-003 mesos-slave[15974]: ecovery Dec 28 21:42:10 test-003 mesos-slave[15974]: ------------------------------------------------------------ Dec 28 21:42:10 test-003 mesos-slave[15974]: Old agent info: Dec 28 21:42:10 test-003 mesos-slave[15974]: hostname: "test003" Dec 28 21:42:10 test-003 mesos-slave[15974]: resources { Dec 28 21:42:10 test-003 mesos-slave[15974]: name: "ports" Dec 28 21:42:10 test-003 mesos-slave[15974]: type: RANGES Dec 28 21:42:10 test-003 mesos-slave[15974]: ranges { Dec 28 21:42:10 test-003 mesos-slave[15974]: range { Dec 28 21:42:10 test-003 mesos-slave[15974]: begin: 80 Dec 28 21:42:10 test-003 mesos-slave[15974]: end: 60000 Dec 28 21:42:10 test-003 mesos-slave[15974]: } Dec 28 21:42:10 test-003 mesos-slave[15974]: } Dec 28 21:42:10 test-003 mesos-slave[15974]: } Dec 28 21:42:10 test-003 mesos-slave[15974]: resources { Dec 28 21:42:10 test-003 mesos-slave[15974]: name: "cpus" Dec 28 21:42:10 test-003 mesos-slave[15974]: type: SCALAR Dec 28 21:42:10 test-003 mesos-slave[15974]: scalar { Dec 28 21:42:10 test-003 mesos-slave[15974]: value: 8 Dec 28 21:42:10 test-003 mesos-slave[15974]: } Dec 28 21:42:10 test-003 mesos-slave[15974]: } Dec 28 21:42:10 test-003 mesos-slave[15974]: resources { Dec 28 21:42:10 test-003 systemd[1]: mesos-slave.service: main process exited, code=exited, status=1/FAILURE Dec 28 21:42:10 test-003 systemd[1]: Unit mesos-slave.service entered failed state. Dec 28 21:42:10 test-003 systemd[1]: mesos-slave.service failed.
注意关键的几行
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.646075 15979 slave.cpp:6951] Finished recovering checkpointed state from '/var/lib/mesos/meta', beginning agent recovery
Dec 28 21:42:10 test-003 mesos-slave[15974]: E1228 21:42:10.649549 15979 slave.cpp:7311] EXIT with status 1: Failed to perform recovery: Incompatible agent info detected.
尝试从/var/lib/mesos/meta恢复,但是失败了,然后进程退出,
# rm -rf /var/lib/mesos/meta/*
将meta目录删除之后再启动mesos slave成功,问题解决;