zoukankan html css js c++ java

Python从网页上爬取图片

在搜索壁纸的时候，想把壁纸保存到本地，一张一张的保存太过麻烦，所以想到用Python来爬取壁纸。

设计思路：

1.首先先去找有壁纸的网页：

http://www.acfun.cn/a/ac3345210

2.然后使用urllib.request解析网页得到网页源代码

html= urlopen(url)
bs0bj=BeautifulSoup(html,"lxml")

3.然后使用正则表达式检索图片链接：

imglist = bs0bj.findAll("img",{"src":re.compile("http://imgs.*?live.*?jpg")}

4.最后将图片下载到本地：

urlretrieve(url,'e:\test\%s.jpg'%name)

源码：

 1 from urllib.request import urlopen
 2 from urllib.request import urlretrieve
 3 from bs4 import BeautifulSoup as da
 4 import re
 5 ulist=[]
 6 localDir = 'D:ImageDownload2017-5-7'
 7 def getlink(url):
 8     html= urlopen(url)
 9     bs0bj=da(html,"lxml")
10     imglist = bs0bj.findAll("img",{"src":re.compile("http://imgs.*?live.*?jpg")})            
11     for img in imglist:
12         imgdict=dict(img.attrs)#将检索的字符串转换为字典
13         imgt=imgdict['src']#提取链接
14         ulist.append(imgt)#将图片链接存放到一个列表里
15     return(ulist)
16 
17 ur="http://www.acfun.cn/a/ac3345210"
18 urllist=getlink(ur)#获取图片链接
19 b=len(urllist)
20 name=0
21 localname = localDir+str(name)
22 for url in urllist:
23     urlretrieve(url,'e:\test\%s.jpg'%name)#下载图片并将图片用数字命名
24     name+=1
25     print(int((name/b)*100),'%')

运行后，图片保存在E：\test;

查看全文

相关阅读:
MySQL复制延时排查
 SQL优化之【类型转换】
Twemproxy 介绍与使用
 Redis Cluster 3.0搭建与使用
 unauthenticated user reading from net
XtraBackup之踩过的坑
 Redis学习之实现优先级消息队列
 如何保证接口的幂等性
 Redis缓存网页及数据行
 Rabbitmq 消费者的推模式与拉模式（go语言版本）

原文地址：https://www.cnblogs.com/wanglei0103/p/6844161.html