写在前面
为什么要多线程?单个线程不能下载吗?多线程能占满网络实现宽带的满速下载而单线程不能。
举个栗子:你的宽带是100Mb/s
,理论上最大下载速度是100/8=12.5MB/s
。你要下载一个843MB
的视频,采用单线程下载你需要560
秒才能下载完,而采用多线程(12个线程)你却可以在93
秒内完成下载,时间将近缩短了6
倍。
如果计算一下网络的利用率,你还可以发现:单线程的平均下载速度是1.50MB/s
,而多线程的平均下载速度是9.06MB/s
,多线程几乎将网络资源利用满了。这就是多线程的好处!
安装依赖
requests
库用于从服务器请求资源。
pip3 install requests
测试样例
一个843MB
的MP4
格式的视频文件。
https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo/1250921_c7af3a2b73d03604f6421ef11134af72.mp4
多个线程
使用concurrent.futures
模块的子类ThreadPoolExecutor
创建线程池实现多线程。
from concurrent.futures import ThreadPoolExecutor
from requests import get, head
import time
class downloader:
def __init__(self, url, num, name):
self.url = url
self.num = num
self.name = name
self.getsize = 0
r = head(self.url, allow_redirects=True)
self.size = int(r.headers['Content-Length'])
def down(self, start, end, chunk_size=10240):
headers = {'range': f'bytes={start}-{end}'}
r = get(self.url, headers=headers, stream=True)
with open(self.name, "rb+") as f:
f.seek(start)
for chunk in r.iter_content(chunk_size):
f.write(chunk)
self.getsize += chunk_size
def main(self):
start_time = time.time()
f = open(self.name, 'wb')
f.truncate(self.size)
f.close()
tp = ThreadPoolExecutor(max_workers=self.num)
futures = []
start = 0
for i in range(self.num):
end = int((i+1)/self.num*self.size)
future = tp.submit(self.down, start, end)
futures.append(future)
start = end+1
while True:
process = self.getsize/self.size*100
last = self.getsize
time.sleep(1)
curr = self.getsize
down = (curr-last)/1024
if down > 1024:
speed = f'{down/1024:6.2f}MB/s'
else:
speed = f'{down:6.2f}KB/s'
print(f'process: {process:6.2f}% | speed: {speed}', end='
')
if process >= 100:
print(f'process: {100.00:6}% | speed: 00.00KB/s', end=' | ')
break
tp.shutdown()
end_time = time.time()
total_time = end_time-start_time
average_speed = self.size/total_time/1024/1024
print(f'total-time: {total_time:.0f}s | average-speed: {average_speed:.2f}MB/s')
if __name__ == '__main__':
url = 'https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo/1250921_c7af3a2b73d03604f6421ef11134af72.mp4'
down = downloader(url, 12, 'test.mp4')
down.main()
单个线程
import requests
import time
start = time.time()
url = 'https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo/1250921_c7af3a2b73d03604f6421ef11134af72.mp4'
res = requests.get(url, stream=True)
with open('test.mp4', 'wb') as f:
for chunk in res.iter_content(chunk_size=10240):
f.write(chunk)
end = time.time()
print(end-start)
对比分析
同样下载一个843MB
的视频,多线程和单线程的对比分析结果如下:
对比项 | 多线程 | 单线程 |
---|---|---|
总计用时 | 93s | 560s |
平均速度 | 9.06MB/s | 1.50MB/s |
温馨提示
这里还和多线程网络下载器IDM
对比了一下,发现用python
实现的多线程下载器的下载速度并不亚于IDM
,如果继续开发,实现断点续传和GUI
后,应该可以完全替代IDM
的下载功能。
未来展望
- 多线程
- 断点续传
- GUI
引用参考
[0] https://blog.csdn.net/qq_41488943/article/details/107118377
[1] https://docs.python.org/zh-cn/3.8/library/concurrent.futures.html#threadpoolexecutor
[2] https://requests.readthedocs.io/zh_CN/latest/user/quickstart.html#id9