1. Cloudera Management Service服务全部无法启动
观察到的现象:
(1)cm service 组件无法启动,启动时服务请求超时终止;(2)主机信息也无法获取到,一直提示“无法与服务端取得联系”(3)cm-server日志中提示“Authentication failure for user: '__cloudera_internal_user__mgmt-EVENTSERVER-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82” 组件权限认证失败。
猜到到可能的原因:
(1)scm-agent与scm-server服务连接异常;
(2)mysql数据库连接异常,用户认证失败;
cloudera-scm-server 日志信息:
2019-01-29 08:44:10,188 INFO 780911426@scm-web-776:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-EVENTSERVER-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82 2019-01-29 08:44:10,194 INFO 416547936@scm-web-773:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-HOSTMONITOR-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82 2019-01-29 08:44:11,181 INFO 416547936@scm-web-773:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-SERVICEMONITOR-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82
cloudera-scm-agent 日志信息:
[02/Jan/2019 16:20:21 +0000] 28617 MainThread agent ERROR Heartbeating to 10.21.48.82:7182 failed. Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.0-py2.6.egg/cmf/agent.py", line 1419, in _send_heartbeat self.master_port) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 469, in __init__ self.conn.connect() File "/usr/lib64/python2.6/httplib.py", line 742, in connect self.timeout) File "/usr/lib64/python2.6/socket.py", line 567, in create_connection raise error, msg error: [Errno 111] Connection refused
最后定位到了问题,是由于scm-agent连接scm-server的配置之前做过调整,导致scm-agent一直无法与scm-server取得联系,修改scm-agent的连接信息,主要server_host和server_port都要确认下(之前修改了server_host连接还是无法正常取得联系)。
修改scm-agent端所在的配置文件 /etc/cloudera-scm-agent/config.ini :
[General] # Hostname of the CM server. server_host=10.21.48.82 # Port that the CM server is listening on. server_port=7182
修改后,问题解决,cm service正常启动。
Tips:定位问题要从整个系统架构层面去思考,熟悉架构的整体运行逻辑,猜测问题可能出现的环节,不要过早地陷入局部思维,然后就是一定要学会看log。