zoukankan      html  css  js  c++  java
  • Python之FTP多线程下载文件之多线程分块下载文件

    Python之FTP多线程下载文件之多线程分块下载文件

    Python中的ftplib模块用于对FTP的相关操作,常见的如下载,上传等。使用python从FTP下载较大的文件时,往往比较耗时,如何提高从FTP下载文件的速度呢?多线程粉墨登场,本文给大家分享我的多线程下载代码,需要用到的python主要模块包括:ftplib和threading。

    首先讨论我们的下载思路,示意如下:

    1. 将文件分块,比如我们打算采用20个线程去下载同一个文件,则需要将文件以二进制方式打开,平均分成20块,然后分别启用一个线程去下载一个块:

    复制代码
     1 def setupThreads(self, filePath, localFilePath, threadNumber = 20):
     2     """
     3     set up the threads which will be used to download images
     4     list of threads will be returned if success, else
     5     None will be returned
     6     """
     7     try:
     8         temp = self.ftp.sendcmd('SIZE ' + filePath)
     9         remoteFileSize = int(string.split(temp)[1])
    10         blockSize = remoteFileSize / threadNumber
    11         rest = None
    12         threads = []
    13         for i in range(0, threadNumber - 1):
    14             beginPoint = blockSize * i
    15             subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (i, filePath, localFilePath, beginPoint, blockSize, rest,))
    16             threads.append(subThread)
    17             
    18         assigned = blockSize * threadNumber
    19         unassigned = remoteFileSize - assigned
    20         lastBlockSize = blockSize + unassigned
    21         beginPoint = blockSize * (threadNumber - 1)
    22         subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (threadNumber - 1, filePath, localFilePath, beginPoint, lastBlockSize, rest,))
    23         threads.append(subThread)
    24         return threads
    25     except Exception, diag:
    26         self.recordLog(str(diag), 'error')
    27         return None
    复制代码

    其中的downloadFileMultiThreads函数如下:

    复制代码
     1 def downloadFileMultiThreads(self, threadIndex, remoteFilePath, localFilePath, 
     2                                  beginPoint, blockSize, rest = None):
     3     """
     4     A sub thread used to download file
     5     """
     6     try:
     7         threadName = threading.currentThread().getName()
     8         # temp local file
     9         fp = open(localFilePath + '.part.' + str(threadIndex), 'wb')
    10         callback = fp.write
    11         
    12         # another connection to ftp server, change to path, and set binary mode
    13         myFtp = FTP(self.host, self.user, self.passwd)
    14         myFtp.cwd(os.path.dirname(remoteFilePath))
    15         myFtp.voidcmd('TYPE I')
    16         
    17         finishedSize = 0
    18         # where to begin downloading
    19         setBeginPoint = 'REST ' + str(beginPoint)
    20         myFtp.sendcmd(setBeginPoint)
    21         # begin to download
    22         beginToDownload = 'RETR ' + os.path.basename(remoteFilePath)
    23         connection = myFtp.transfercmd(beginToDownload, rest)
    24         readSize = self.fixBlockSize
    25         while 1:
    26             if blockSize > 0:
    27                 remainedSize = blockSize - finishedSize
    28                 if remainedSize > self.fixBlockSize:
    29                     readSize = self.fixBlockSize
    30                 else:
    31                     readSize = remainedSize
    32             data = connection.recv(readSize)
    33             if not data:
    34                 break
    35             finishedSize = finishedSize + len(data)
    36             # make sure the finished data no more than blockSize
    37             if finishedSize == blockSize:
    38                 callback(data)
    39                 break
    40             callback(data)
    41         connection.close()
    42         fp.close()
    43         myFtp.quit()
    44         return True
    45     except Exception, diag:
    46         return False
    复制代码

    2. 等待下载完成之后我们需要对各个文件块进行合并,合并的过程见本系列之二:Python之FTP多线程下载文件之分块多线程文件合并

    感谢大家的阅读,希望能够帮到大家!

    Published by Windows Live Writer!

    作者: 薛定谔の喵 
    出处: http://www.cnblogs.com/berlin-sun/ 
    本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,否则保留追究法律责任的权利。

     
     
  • 相关阅读:
    进程池和线程池、协程、TCP单线程实现并发
    GIL全局解释锁,死锁,信号量,event事件,线程queue,TCP服务端实现并发
    进程补充和线程的介绍
    进程的介绍和使用
    异常处理和UDP Socket套接字
    TCP Socket 套接字 和 粘包问题
    网络编程
    面向对象高级——反射和元类
    面向对象三大特性之——多态和一些内置函数
    面向对象-内置方法
  • 原文地址:https://www.cnblogs.com/Leo_wl/p/3293009.html
Copyright © 2011-2022 走看看