zoukankan      html  css  js  c++  java
  • Python远程控制模块paramiko遇到的问题及解决记录

    转载 https://zhang.ge/5122.html

    最近一直在开发自动化运维发布平台,底层命令行、文件通道主要基于paramiko模块,使用过程中遇到各种各样的问题,本文主要用于收集问题及解决记录,以备后续使用。Python远程控制模块paramiko遇到的问题及解决记录

    一、Error reading SSH protocol banner连接错误

    这个关键词,在百度、谷歌一搜一大把的提问,也有少部分给出了解决方案,但是最终都无法解决,我经过不断尝试和解读paramiko源码,终于搞定了这个问题,在此记录分享下。

    1、具体报错信息:

     
    Traceback (most recent call last):
     
    File "<stdin>", line 1, in <module>
     
    File "build/bdist.linux-x86_64/egg/paramiko/client.py", line 307, in connect
     
    File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 465, in start_client
     
    paramiko.SSHException: Error reading SSH protocol banner

    2、解决办法:

    重新下载paramiko插件源码,解压后,编辑安装目录下的transport.py文件:

    vim build/lib/paramiko/transport.py

    搜索 self.banner_timeout 关键词,并将其参数改大即可,比如改为300s:

    self.banner_timeout = 300

    最后,重装paramiko即可。

    3、下面的曲折、啰嗦的解决过程,不喜请跳过:

    在谷歌搜到一个老外相关提问,虽然他说的是pysftp,其实也是基于paramiko:

    https://stackoverflow.com/questions/34288526/pysft-paramiko-grequests-error-reading-ssh-protocol-banner/44493465#44493465

    他最后给出了他的解决方案:

    UPDATE:

    It seems the problem is caused by importing the package grequests. If I do not import grequests, pysftp works as expected. The issue was raised before but has not been solved

    意思是,在paramiko使用前,先import grequests,就能解决问题。我照做之后,发现对手头的现网环境无效,可能错误产生的原因不一样。

    但是,我从老外的问题描述过程中,找到了解决方法,他是这样说的:

    I have already tried changing the banner timeout from 15 seconds to 60 secs in the transport.py, but it did not solve the problem.

    我看到有个timeout和transport.py,就想到现网那些报Error reading SSH protocol banner错误的机器也是非常卡,而且目测了下发起paramiko连接到报错的时间,基本是相同的。

    于是系统中搜索,并找到了transport.py这个文件:

    /usr/lib/python2.7/site-packages/paramiko/transport.py

    并搜了下banner,发现果然有一个参数设置,而且和目测的超时基本一致!

    Python远程控制模块paramiko遇到的问题及解决记录

    于是,顺手修改成300S,并重新测试发现没任何效果,依然15S超时。接着打断点、甚至移走这个文件,问题依旧!!看来这个文件不会被引用。。。

    回到最初的报错信息,发现里面显示的是:

     
    build/bdist.linux-x86_64/egg/paramiko/transport.py

    而系统里面搜不到这个问题,最后醍醐灌顶,发觉Python模块编译后,基本是以egg文件保存的,看来 必须修改源码才行了。

    于是cd到paramiko的源码目录,执行搜索,找到2各transport.py文件:

     
    [root@localhost:/data/software/paramiko-1.9]# find . -name transport.py
     
    ./paramiko/transport.py
     
    ./build/lib/paramiko/transport.py

    尝试将文件中的 self.banner_timeout 值改成300,重新安装paramiko,结果一次性测试成功!

    然后,我顺便在老外的帖子回答了下(请忽略蹩脚的英语),算是回馈吧!Python远程控制模块paramiko遇到的问题及解决记录

    二、paramiko远程执行后台脚本“阻塞”问题

    我写的远程命令通道上线之后,发现在远程脚本中后台再执行另一个脚本,通道会一直等待后台脚本执行完成才会返回,有时甚至会僵死。

    1、复现过程如下:

    ①、编写测试脚本

    脚本1:test.sh

     
    #!/bin/bash
     
    sleep 30
     
    echo test end
     
    exit 0

    脚本2:run.sh

     
    #!/bin/bash
     
    bash /tmp/test.sh &
     
    echo run ok!
     
    exit 0

    脚本3:test.py

     
    import paramiko
     
    client = paramiko.SSHClient()
     
    client = paramiko.SSHClient()
     
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
     
    client.connect(hostname='192.168.1.10', port=22, username='root', password='123456', timeout=300,allow_agent=False,look_for_keys=False)
     
    stdin,stdout,stderr=client.exec_command("bash /tmp/run.sh")
       
     
    result_info = ""
       
     
    for line in stdout.readlines():
     
    result_info += line
       
     
    print result_info

    将test.sh和run.sh传到远程服务器上,比如放到192.168.1.10:/tmp/下。

    ②、发起远程执行

    在本地执行 python test.py,会发现整个脚本不会立即打印run ok,而是等30s之后才打印包括test.sh的所有输出信息。

    2、解决办法

    将远程脚本的标准输出stdout重定向到错误输出stderr即可,test.py 修改如下:

     
    import paramiko
     
    client = paramiko.SSHClient()
     
    client = paramiko.SSHClient()
     
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
     
    client.connect(hostname='192.168.1.10', port=22, username='root', password='123456', timeout=300,allow_agent=False,look_for_keys=False)
     
    stdin,stdout,stderr=client.exec_command("bash /tmp/run.sh 1>&2")
       
     
    result_info = ""
       
     
    for line in stderr.readlines():
     
    result_info += line
       
     
    print result_info

    现在执行,就能立即得到结果了。其实原因很简单,因为stdout(标准输出),输出方式是行缓冲。输出的字符会先存放在缓冲区,等按下回车键时才进行实际的I/O操作,导致paramiko远程命令产生等待问题。而stderr(标准错误),是不带缓冲的,这使得出错信息可以直接尽快地显示出来。所以,这里只要将脚本执行的标准输出重定向到错误输出(1>&2),然后paramiko就可以使用stderr快速读取远程打屏信息了。

    三、This operation would block forever 报错解决

    这次扩容一个基于pramiko的自动化apiserver,结果发现在新环境执行远程命令或文件传输时,抛了如下报错:

     
    2017-08-04 12:38:31,243 [ERROR] Exception: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)
     
    2017-08-04 12:38:31,244 [ERROR] Traceback (most recent call last):
     
    2017-08-04 12:38:31,244 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1555, in run
     
    2017-08-04 12:38:31,245 [ERROR] self._check_banner()
     
    2017-08-04 12:38:31,245 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1681, in _check_banner
     
    2017-08-04 12:38:31,245 [ERROR] raise SSHException('Error reading SSH protocol banner' + str(x))
     
    2017-08-04 12:38:31,245 [ERROR] SSHException: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)
     
    2017-08-04 12:38:31,245 [ERROR]
     
    2017-08-04 12:38:31,247 [INFO] Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)

    总以为是python组件安装有问题,反反复复检查,最终发现居然是多装了一个插件导致的!

    解决办法:

    删除已经安装 greenlet插件即可,具体原因见后文:

     
    rm -r /usr/local/python2.7.5/lib/python2.7/site-packages/greenlet*

    下面是"艰难险阻"的解决过程,不喜勿看:

    1、看到报错,作为懒人第一时间就搜了下 【This operation would block forever', <Hub】这个关键词,发现没能get到解决方案。

    2、按照经验,我先找到图中 _check_banner 函数如下:

     
    def _check_banner(self):
     
    # this is slow, but we only have to do it once
     
    for i in range(100):
     
    # give them 15 seconds for the first line, then just 2 seconds
     
    # each additional line. (some sites have very high latency.)
     
    if i == 0:
     
    timeout = self.banner_timeout
     
    else:
     
    timeout = 2
     
    try:
     
    buf = self.packetizer.readline(timeout)
     
    except ProxyCommandFailure:
     
    raise
     
    except Exception, x:
     
    raise SSHException('Error reading SSH protocol banner' + str(x))
     
    if buf[:4] == 'SSH-':
     
    break
     
    self._log(DEBUG, 'Banner: ' + buf)
     
    if buf[:4] != 'SSH-':
     
    raise SSHException('Indecipherable protocol version "' + buf + '"')
     
    # save this server version string for later
     
    self.remote_version = buf
     
    # pull off any attached comment
     
    comment = ''
     
    i = string.find(buf, ' ')
     
    if i >= 0:
     
    comment = buf[i+1:]
     
    buf = buf[:i]
     
    # parse out version string and make sure it matches
     
    segs = buf.split('-', 2)
     
    if len(segs) < 3:
     
    raise SSHException('Invalid SSH banner')
     
    version = segs[1]
     
    client = segs[2]
     
    if version != '1.99' and version != '2.0':
     
    raise SSHException('Incompatible version (%s instead of 2.0)' % (version,))
     
    self._log(INFO, 'Connected (version %s, client %s)' % (version, client))

    3、很明显这个异常由 buf = self.packetizer.readline(timeout) 语句抛出,我印象中的粗暴定位方法就是不使用try,直接将此语句执行看看:

     
    def _check_banner(self):
     
    # this is slow, but we only have to do it once
     
    for i in range(100):
     
    # give them 15 seconds for the first line, then just 2 seconds
     
    # each additional line. (some sites have very high latency.)
     
    if i == 0:
     
    timeout = self.banner_timeout
     
    else:
     
    timeout = 2
     
    buf = self.packetizer.readline(timeout) # 我就加到,看看是从哪出来的异常
     
    try:
     
    buf = self.packetizer.readline(timeout)
     
    except ProxyCommandFailure:
     
    raise
     
    except Exception, x:
     
    raise SSHException('Error reading SSH protocol banner' + str(x))
     
    if buf[:4] == 'SSH-':
     
    break
     
    self._log(DEBUG, 'Banner: ' + buf)
     
    .....

    结果报错信息就更加具体了,如下所示:

     
    2017-08-04 13:23:26,085 [ERROR] Unknown exception: ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)
     
    2017-08-04 13:23:26,087 [ERROR] Traceback (most recent call last):
     
    2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1555, in run
     
    2017-08-04 13:23:26,088 [ERROR] self._check_banner()
     
    2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1676, in _check_banner
     
    2017-08-04 13:23:26,088 [ERROR] buf = self.packetizer.readline(timeout)
     
    2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 280, in readline
     
    2017-08-04 13:23:26,088 [ERROR] buf += self._read_timeout(timeout)
     
    2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 468, in _read_timeout
     
    2017-08-04 13:23:26,089 [ERROR] x = self.__socket.recv(128)
     
    2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 280, in recv
     
    2017-08-04 13:23:26,089 [ERROR] self._wait(self._read_event)
     
    2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 179, in _wait
     
    2017-08-04 13:23:26,089 [ERROR] self.hub.wait(watcher)
     
    2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 630, in wait
     
    2017-08-04 13:23:26,089 [ERROR] result = waiter.get()
     
    2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 878, in get
     
    2017-08-04 13:23:26,090 [ERROR] return self.hub.switch()
     
    2017-08-04 13:23:26,090 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 609, in switch
     
    2017-08-04 13:23:26,090 [ERROR] return greenlet.switch(self)
     
    2017-08-04 13:23:26,090 [ERROR] LoopExit: ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)
     
    2017-08-04 13:23:26,090 [ERROR]
     
    2017-08-04 13:23:26,093 [INFO] ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)

    这次基本就定位到了gevent和greenlet这个真凶了!本以为是我的apiserver调用了gevent,结果定位了半天,确定并没有使用。而且印象中paramiko这个插件也没用到gevent,可这异常是怎么来的?

    直到我再次在谷歌搜索【LoopExit: ('This operation would block forever', <Hub at】关键词,找到一个博客文章:http://www.hongquan.me/?p=178,总算知道是什么原因了!

    具体原因:主要是因为 greenlet 里面有个run函数,覆盖了 paramiko 的transport.py 里面的同名函数,导致paramiko执行_check_banner时,实际调用了greenlet的run函数,因此报错!再次醉了!

  • 相关阅读:
    js 自定义事件
    django项目mysite
    python web 框架
    Python web-Http
    numpy学习
    django 中单独执行py文件修改用户名
    python解决排列组合
    解决Database returned an invalid datetime value. Are time zone definitions for your database installed?
    Anaconda下载地址
    Django中使用geetest实现滑动验证
  • 原文地址:https://www.cnblogs.com/s-seven/p/14604780.html
Copyright © 2011-2022 走看看