zoukankan      html  css  js  c++  java
  • python下载时报错 Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time

    def downloadXml(isExists,filedir,filename):
        if not isExists:
            os.mkdir(filedir)
        local = os.path.join(filedir,filename)
        urllib2.urlopen(url,local)

    报错:

    Traceback (most recent call last):
    File "C:UserswilliamDesktop ova xmlNew folderdownload_xml.py", line 95, in <module>
    downloadXml(isExists,filedir,filename)
    File "C:UserswilliamDesktop ova xmlNew folderdownload_xml.py", line 80, in downloadXml
    urllib.urlretrieve(url,local)
    File "E:Python27liburllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
    File "E:Python27liburllib.py", line 245, in retrieve
    fp = self.open(url, data)
    File "E:Python27liburllib.py", line 213, in open
    return getattr(self, name)(url)
    File "E:Python27liburllib.py", line 350, in open_http
    h.endheaders(data)
    File "E:Python27libhttplib.py", line 1053, in endheaders
    self._send_output(message_body)
    File "E:Python27libhttplib.py", line 897, in _send_output
    self.send(msg)
    File "E:Python27libhttplib.py", line 859, in send
    self.connect()
    File "E:Python27libhttplib.py", line 836, in connect
    self.timeout, self.source_address)
    File "E:Python27libsocket.py", line 575, in create_connection
    raise err
    IOError: [Errno socket error] [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
    >>>

    google查找答案,搜索:urlretrieve Errno 10060

    在 https://segmentfault.com/q/1010000004386726中提到是:频繁的访问某个网站会被认为是DOS攻击,通常做了Rate-limit的网站都会停止响应一段时间,你可以Catch这个Exception,sleep一段时间然后重试,也可以根据重试的次数做exponential backup off。

    想了一个简单的办法,就是每次下载之间加个延时,将代码修改如下:

    def downloadXml(isExists,filedir,filename):
        if not isExists:
            os.mkdir(filedir)
        local = os.path.join(filedir,filename)
        time.sleep(1)
        urllib.urlretrieve(url,local)

    执行。 本来是在第80条左右的数据就开始time out,但现在一直执行到2300多条数据。可惜,最后又time out。 

    这里,若延长延时,将1s改为5s等,虽然可能不会报错,但我想,这样,太费时间了。因为不报错时,也要延时5s,不如等报错时再延时重试。

    于是,

    def downloadXml(isExists,filedir,filename):
        if not isExists:
            os.makedirs(filedir)
        local = os.path.join(filedir,filename)
        try:
            urllib.urlretrieve(url,local)
        except Exception as e:
            time.sleep(5)
            urllib.urlretrieve(url,local) 

     这样的话,发现会卡在某条数据,不向后执行。所以只好改为在某条数据上,最多重试10次。

    def downloadXml(flag_exists,file_dir,file_name,xml_url):
        if not flag_exists:
            os.makedirs(file_dir)
        local = os.path.join(file_dir,file_name)
        try:
            urllib.urlretrieve(xml_url,local)
        except Exception as e:
            print e
            cur_try = 0
            total_try = 10
            if cur_try < total_try:
                cur_try +=1
                time.sleep(15)
                return downloadXml(flag_exists,file_dir,file_name,xml_url)
            else:
                raise Exception(e)

    这样执行后,果然不再报错,顺利执行完了。但一想,有个问题,使用哪个URL进行下载失败,没有记录下来。所以又添加了将失败的url写入本地文本的功能。后面可以查看,并手动执行。

    def downloadXml(flag_exists,file_dir,file_name,xml_url):
        if not flag_exists:
            os.makedirs(file_dir)
        local = os.path.join(file_dir,file_name)
        try:
            urllib.urlretrieve(xml_url,local)
        except Exception as e:
            print 'the first error: ',e
            cur_try = 0
            total_try = 10
            if cur_try < total_try:
                cur_try +=1
                time.sleep(15)
                return downloadXml(flag_exists,file_dir,file_name,xml_url)
            else:
                print 'the last error: '
                with open(test_dir + 'error_url.txt','a') as f:
                    f.write(xml_url)
                raise Exception(e)

    遗憾的是,这次竟再没有失败的url了,可能是网站这时流量不大。

  • 相关阅读:
    hbase
    spark-streaming
    spark-Scala
    经典台词二
    星爷电影经典台词一
    Hadoop第一阶段总结
    测试2
    POI 表格数据导出
    GC垃圾回收机制
    Java常见的200道面试题
  • 原文地址:https://www.cnblogs.com/guohuino2/p/6211312.html
Copyright © 2011-2022 走看看