zoukankan      html  css  js  c++  java
  • 尝试在CentOS 7上启动cosbench遭遇的一些问题 之三

    当cosbench的测试莫名其妙的terminated了,而且时而发生,时而不发生,mission log里也看不出什么信息,记得看一眼system log.


    如果发现这个call stack, 那么请注意,很可能这次测试的失败是由于controller和drivers与storage cluster 之间的时间不同步引起的。

    2020-08-13 02:19:28,977 [ERROR] [AbstractAgent] - unexpected exception

    java.lang.ArrayIndexOutOfBoundsException: -9626

    at com.intel.cosbench.bench.Counter.doAdd(Counter.java:65)

    at com.intel.cosbench.driver.model.OperatorContext.doAddSample(OperatorContext.java:76)

    at com.intel.cosbench.driver.model.OperatorContext.addSample(OperatorContext.java:70)

    at com.intel.cosbench.driver.agent.WorkAgent.onSampleCreated(WorkAgent.java:211)

    at com.intel.cosbench.driver.operator.Preparer.operate(Preparer.java:99)

    at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)

    at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)

    at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)

    at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)

    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)

    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)

    2020-08-13 02:19:28,977 [ERROR] [MissionHandler] - detected workers [19, 20, 21, 22, 23, 24] have encountered errors

    2020-08-13 02:19:28,979 [INFO] [MissionHandler] - mission M2E66EA747D has been terminated


    当你在controller的system.log中发现如下的记录,那么说明这次测试的terminate很可能是由于controller与drivers之间的时间不同步引起的。

    2020-08-20 10:44:59,277 [WARN] [PingDriverRunner] - The driver driver1 at http://10.246.21.82:18088/driver is not reachable at the 1 time, with error message: Connection refused (Connection refused)
    2020-08-20 11:22:57,348 [WARN] [AbstractCommandTasklet] - time drift is still longer than tolerable time drift 300 mSec after 3 times of synchronization

    2020-08-20 17:47:37,351 [ERROR] [AbstractCommandTasklet] - driver report error: HTTP 400 - no such key defined: sizes

    2020-08-20 17:47:37,359 [ERROR] [StageRunner] - detected tasks [t7, t8, t9, t10, t11, t12] have encountered errors
    2020-08-20 17:47:37,365 [ERROR] [AbstractCommandTasklet] - driver report error: HTTP 400 - unrecognized request: org.apache.catalina.connector.RequestFacade@28481cc4

    2020-08-20 17:47:37,366 [ERROR] [Aborter] - fail to abort driver
    com.intel.cosbench.controller.tasklet.TaskletException
         at com.intel.cosbench.controller.tasklet.AbstractCommandTasklet.issueCommand(AbstractCommandTasklet.java:81)
         at com.intel.cosbench.controller.tasklet.Aborter.executeAbort(Aborter.java:53)
         at com.intel.cosbench.controller.tasklet.Aborter.execute(Aborter.java:42)
         at com.intel.cosbench.controller.tasklet.AbstractTasklet.call(AbstractTasklet.java:47)
         at com.intel.cosbench.controller.tasklet.AbstractTasklet.call(AbstractTasklet.java:1)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         at java.lang.Thread.run(Thread.java:748)


    进一步的排查,可以使用下面的命令,来让controller和drivers同时返回本地时间,一边让时间的差距一目了然。如果不这么做,则很难明确几台机器上的的时间差距是不是输入间隔命令的那几秒造成的。

    # date && ssh root@10.246.21.82 date && ssh root@10.246.21.83 date


    首先,确保controller与driver在同一个时区之内。

    image

    可以看到这台controller的时区是UTC,而我们应该改成与其他drivers一样的New_York.

    # timedatectl list-timezones | grep York

    # timedatectl set-zimezone America/New_York


    使用下面的命令来在CentOS 7上进行time sync.

    先检查NTP的状态:

    image


    修改NTP的配置文件。

    # vi /etc/ntp.conf

    添加一条本地的NTP的服务器的信息,如下的两行:

    server 172.16.199.1

    server 10.254.140.22


    检查ntp服务的状态:

    # systemctl status ntpd

    举例:

    image


    停掉ntp服务:

    # systemctl stop ntpd

    如果不停掉ntp服务的话,是没办法与服务器同步时间的。会报错:”the NTP socket is in use, exiting”


    检查ntp服务的状态:

    image


    强制时间与ntp服务器同步。

    # ntpdate 10.254.140.22

    或者

    # ntpd -gq

    下图就是一个时间同步成功了之后的输出。

    image

    或:

    image


    再启动ntp服务。

    # systemctl start ntpd.service

    再检查一下NTP服务的状态,可以看到time已经sync了。

    image


    参考资料

    ==============

    https://github.com/intel-cloud/cosbench/issues/264

    https://www.thegeekdiary.com/centos-rhel-how-to-configure-ntp-server-and-client/

    https://www.golinuxhub.com/2017/12/how-to-forcefully-sync-date-and-time/

    https://www.thegeekdiary.com/centos-rhel-6-how-to-force-a-ntp-sync-with-the-ntp-servers/

  • 相关阅读:
    Android Lint简介
    免费HTTP数据抓包Fiddler2[4.6.1.2]以及显示中文包内容的方法
    IE6、7下bug
    图表插件
    学习:使用svg
    jQuery Transit
    jQuery基础学习笔记(1)
    HTTP协议详解学习
    html5学习笔记
    html释疑
  • 原文地址:https://www.cnblogs.com/awpatp/p/13588732.html
Copyright © 2011-2022 走看看