我们经常会在网上搜索井下载图片,然而一张一张地下载就太麻烦了,本案例
就是通过网络爬虫技术, 一次性下载该网站所有的图片并保存 。
网站图片下载并保存
将指定网站的 .jpg 和 .png 格式的图片全部下载井保存在自己本地新建的 images 文件夹中 。
import requests,os from bs4 import BeautifulSoup from urllib.request import urlopen url = 'http://www.tooopen.com/img/87.aspx' html = requests.get(url) html.encoding="utf-8" sp = BeautifulSoup(html.text, 'html.parser')
# 建立images目录保存图片 images_dir="E:\images\" if not os.path.exists(images_dir): os.mkdir(images_dir)
# 取得所有 <a> 和 <img> 标签 all_links=sp.find_all(['a','img']) for link in all_links: # 读取 src 和 href 属性内容 src=link.get('src') href = link.get('href') attrs=[src,src] for attr in attrs: # 读取 .jpg 和 .png 檔 if(attr != None)and(('.jpg' in attr)or('.png' in attr)): # 设置图片文件完整路径 full_path = attr filename = full_path.split('/')[-1] # 取得图片名 ext = filename.split('.')[-1] #取得扩展名 filename = filename.split('.')[-2] #取得主文件名 if('jpg' in ext): filename = filename + '.jpg' else: filename = filename + '.png' print(attr) # 保存图片 try: image = urlopen(full_path) f = open(os.path.join(images_dir,filename),'wb') f.write(image.read()) f.close() except: print("{} 无法读取!".format(filename)) print("当前页图片下载完了")
/static/image/logo.png logo.png 无法读取! /static/image/logo.png logo.png 无法读取! https://www.tooopen.com/static/ad/1500X50-viw.png https://www.tooopen.com/static/ad/1500X50-viw.png https://www.tooopen.com/static/ad/1500X50-too.png https://www.tooopen.com/static/ad/1500X50-too.png http://img08.tooopen.com/20190807/tooopen_wk_131356135671827.jpg tooopen_wk_131356135671827.jpg 无法读取! http://img08.tooopen.com/20190807/tooopen_wk_131356135671827.jpg tooopen_wk_131356135671827.jpg 无法读取! http://img08.tooopen.com/20190807/tooopen_wk_131356135689691.jpg tooopen_wk_131356135689691.jpg 无法读取! http://img08.tooopen.com/20190807/tooopen_wk_131356135689691.jpg tooopen_wk_131356135689691.jpg 无法读取! http://img08.tooopen.com/20190807/tooopen_wk_131355135547978.jpg tooopen_wk_131355135547978.jpg 无法读取! http://img08.tooopen.com/20190807/tooopen_wk_131355135547978.jpg tooopen_wk_131355135547978.jpg 无法读取! http://img08.tooopen.com/20191204/tooopen_sl_135027502737700.jpg tooopen_sl_135027502737700.jpg 无法读取! http://img08.tooopen.com/20191204/tooopen_sl_135027502737700.jpg tooopen_sl_135027502737700.jpg 无法读取! http://img08.tooopen.com/20191122/tooopen_sl_102334233455130.jpg tooopen_sl_102334233455130.jpg 无法读取! http://img08.tooopen.com/20191122/tooopen_sl_102334233455130.jpg tooopen_sl_102334233455130.jpg 无法读取! http://img08.tooopen.com/20191121/tooopen_sl_095522552259405.jpg tooopen_sl_095522552259405.jpg 无法读取! http://img08.tooopen.com/20191121/tooopen_sl_095522552259405.jpg tooopen_sl_095522552259405.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093830383053575.jpg tooopen_sl_093830383053575.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093830383053575.jpg tooopen_sl_093830383053575.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093534353474034.jpg tooopen_sl_093534353474034.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093534353474034.jpg tooopen_sl_093534353474034.jpg 无法读取! http://img08.tooopen.com/20191205/tooopen_sl_134926492663201.jpg tooopen_sl_134926492663201.jpg 无法读取! http://img08.tooopen.com/20191205/tooopen_sl_134926492663201.jpg tooopen_sl_134926492663201.jpg 无法读取! http://img08.tooopen.com/20191122/tooopen_sl_102328232897349.jpg tooopen_sl_102328232897349.jpg 无法读取! http://img08.tooopen.com/20191122/tooopen_sl_102328232897349.jpg tooopen_sl_102328232897349.jpg 无法读取! http://img08.tooopen.com/20191121/tooopen_sl_162428242838278.jpg tooopen_sl_162428242838278.jpg 无法读取! http://img08.tooopen.com/20191121/tooopen_sl_162428242838278.jpg tooopen_sl_162428242838278.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093827382762634.jpg tooopen_sl_093827382762634.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093827382762634.jpg tooopen_sl_093827382762634.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093529352941470.jpg tooopen_sl_093529352941470.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093529352941470.jpg tooopen_sl_093529352941470.jpg 无法读取! http://img08.tooopen.com/20191006/tooopen_sl_09550855847444.jpg tooopen_sl_09550855847444.jpg 无法读取! http://img08.tooopen.com/20191006/tooopen_sl_09550855847444.jpg tooopen_sl_09550855847444.jpg 无法读取! http://img08.tooopen.com/20191119/tooopen_sl_115948594813304.jpg tooopen_sl_115948594813304.jpg 无法读取! http://img08.tooopen.com/20191119/tooopen_sl_115948594813304.jpg tooopen_sl_115948594813304.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_16270727715545.jpg tooopen_sl_16270727715545.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_16270727715545.jpg tooopen_sl_16270727715545.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093822382227436.jpg tooopen_sl_093822382227436.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093822382227436.jpg tooopen_sl_093822382227436.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_092538253863445.jpg tooopen_sl_092538253863445.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_092538253863445.jpg tooopen_sl_092538253863445.jpg 无法读取! http://img08.tooopen.com/20190924/tooopen_sl_095323532347706.jpg tooopen_sl_095323532347706.jpg 无法读取! http://img08.tooopen.com/20190924/tooopen_sl_095323532347706.jpg tooopen_sl_095323532347706.jpg 无法读取! http://img08.tooopen.com/20191121/tooopen_sl_101744174437980.jpg tooopen_sl_101744174437980.jpg 无法读取! http://img08.tooopen.com/20191121/tooopen_sl_101744174437980.jpg tooopen_sl_101744174437980.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_094151415155508.jpg tooopen_sl_094151415155508.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_094151415155508.jpg tooopen_sl_094151415155508.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093819381985689.jpg tooopen_sl_093819381985689.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_093819381985689.jpg tooopen_sl_093819381985689.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_092534253435074.jpg tooopen_sl_092534253435074.jpg 无法读取! http://img08.tooopen.com/20191115/tooopen_sl_092534253435074.jpg tooopen_sl_092534253435074.jpg 无法读取! https://www.tooopen.com/static/image/tooopen-2w.png https://www.tooopen.com/static/image/tooopen-2w.png 当前页图片下载完了