Python-爬虫初学

   #爬取网站中的图片
 1 import re     #正则表达式库
 2 import urllib #url链接库
 3 
 4 def getHtml(url):
 5     page = urllib.urlopen(url) #打开链接
 6     html = page.read()         #像读文本一样读取网页内容
 7     return html
 8 
 9 def getImg(html):
10     reg = r'<img src="(.+?.png)" alt'   #匹配表达式
11     imgre = re.compile(reg)              #编译成正则表达式对象
12     imglist =re.findall(imgre, html)     #查找全部满足匹配的
13     x = 0
14     for imgurl in imglist:
15         print "imgurl:", imgurl
16         urllib.urlretrieve("http://www.uestc.edu.cn/" + imgurl, '%d.png' % x)  #依次遍历下载，源链接用的是相对地址，所以添加前缀
17         x += 1
18     
19 html = getHtml("http://www.uestc.edu.cn/")
20 print getImg(html)
21 #print html

参考学习链接：

http://www.cnblogs.com/fnng/p/3576154.html

查看全文

相关阅读:
JS 这一次彻底理解选择排序
 JS 这一次彻底理解冒泡排序
 JS script脚本async和defer的区别
 精读JavaScript模式(九)，JS类式继承与现代继承模式其二
 google recaptcha 谷歌人机身份验证超详细使用教程，前端/后端集成说明
 JS 究竟是先有鸡还是有蛋，Object与Function究竟谁出现的更早，Function算不算Function的实例等问题杂谈
 NetFramework 专栏总集篇
 详解服务发现的基本实现
 CF150E Freezing with Style(点分治)
LOJ6032.「雅礼集训 2017 Day2」水箱

原文地址：https://www.cnblogs.com/zhonghuasong/p/4885140.html