使用Python抓取网页图片

zoukankan html css js c++ java

使用Python抓取网页图片

      今天写了一个Python小程序，用来抓取网页图片.



ImgDownloader
1import win32com.client,time,win32inet,win32file,os
2class ImgDownloader:
3    def __init__(self,url,dir):
4        self.__dir=dir
5        self.__ie=win32com.client.Dispatch('InternetExplorer.Application')
6        self.__ie.Navigate(url)
7        self.__wait__()
8
9    def __wait__(self):
10        while self.__ie.Busy:
11            time.sleep(0.1)
12
13    def start(self):
14        self.__wait__()
15        imgs=self.__ie.Document.getElementsByTagName('img')
16
17        for i in range(imgs.length):
18            try:
19                cachInfo=win32inet.GetUrlCacheEntryInfo(imgs[i].src)
20                if cachInfo:
21                    path=cachInfo['LocalFileName']
22                    pathinfo=path.split('\\')
23                    pathinfo.reverse()
24                    filename=('[%d]' % i) + pathinfo[0]
25
26                    win32file.CopyFile(path,os.path.join(self.__dir,filename),True)
27            except:
28                pass
29    def close(self):
30        self.__ie.Quit()
31
32if __name__=='__main__':
33    d=ImgDownloader('http://image.baidu.com/i?ct=201326592&cl=2&lm=-1&tn=baiduimage&pv=&word=boy&z=0','c:\\temp\\')
34    d.start()
35    d.close()

原理：在Python使用com 接口运行IE浏览器，然后打开网页，获取网页所有图片的URL，最后利用win32api函数GetUrlCacheEntryInfo找出图片相应的本地缓存文件,复制到指定目录。

查看全文

相关阅读:
今天看到的一些好文语句摘录
 laravel5 怎么获取数组形式的数据
 laravel5.4将excel表格中的信息导入到数据库中
 phpmax_execution_time
laravel 5.4 导出excel表格
 使用php函数ini_set()重新设置某个配置的设置值
 艾伟_转载：VS.NET 2010 新功能 [整理] 狼人:
艾伟_转载：[一步一步MVC]第五回：让TagBuilder丰富你的HtmlHelper 狼人:
艾伟_转载：.NET 4.0中数组的新增功能狼人:
艾伟_转载：一个简单的 Generic Factory 类狼人:

原文地址：https://www.cnblogs.com/flysun/p/1504278.html