想在妹子生日送妹子一张用零食(或者食物类好看的图片)拼成的马赛克拼图,因此探索了一番= =。
首先需要一个软件来制作马赛克拼图,这里使用Foto-Mosaik-Edda(网上也有在线制作的网站,但是我觉得这个比较方便,而且也找到了一个汉化过的版本,地址为http://witmax.cn/foto-mosaik-edda.html)。要制作马赛克拼图,需要一个图片的数据库,至少需要几千张图片。因此需要爬虫来爬取。
从网上学习了一番后copy了一些代码然后从一个外国的图片网站爬取了4000余张关键字为food的图片,python代码如下:
1 import requests 2 import re 3 import os 4 import time 5 6 7 def get_url(url): 8 kw = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)'} 9 try: 10 r = requests.get(url, headers=kw) 11 r.raise_for_status() 12 r.encoding = r.apparent_encoding 13 return r 14 except: 15 print('wrong!!!!!!!!!!!') 16 17 18 def get_photourl(photo_url): 19 kw = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)'} 20 try: 21 r = requests.get(photo_url, headers=kw) 22 r.raise_for_status() 23 r.encoding = r.apparent_encoding 24 return r 25 except: 26 return 'wrong' 27 28 29 def get_photos(url, new_fpath): 30 result = get_url(url) 31 # pattern = re.compile(r'src="https://images.pexels.com/photos/(d+)/(.*?).(jpg|jpeg)?auto=compress&cs=tinysrgb&h=350"', re.S) 32 pattern = re.compile( 33 'src="https://images.pexels.com/photos/(d+)/(.*?)?auto=compress&cs=tinysrgb&h=750&w=1260"') 34 # 真正的下载链接是static,不是images开头 35 36 items = re.findall(pattern, result.text) 37 38 print("log!"); 39 for item in items: 40 print(item); 41 42 for item in items: 43 try: 44 photo_url = 'https://static.pexels.com/photos/' + str(item[0]) + '/' + str(item[1]) + "?auto=compress&cs=tinysrgb&h=350"; 45 print("url: " + photo_url); 46 # 把图片链接中的images,改成了static 47 save(photo_url, item, new_fpath) 48 time.sleep(1) 49 except: 50 continue 51 52 53 def makedir(new_fpath, i, key): 54 E = os.path.exists(new_fpath) 55 if not E: 56 os.makedirs(new_fpath) 57 os.chdir(new_fpath) 58 print('文件夹' + key + '_page' + str(i) + '创建成功!') 59 else: 60 print('文件夹已存在!') 61 62 63 def save(photo_url, item, new_fpath): 64 Final_fpath = new_fpath + '/' + str(item[0]) + str(item[1]); 65 print("保存文件名: " + Final_fpath) 66 67 print('正在下载图片......') 68 69 result = get_photourl(photo_url) 70 if result != 'wrong': 71 print('下载成功!') 72 else: 73 print('失败') 74 75 E = os.path.exists(Final_fpath) 76 77 if not E: 78 try: 79 with open(Final_fpath, 'wb') as f: 80 f.write(result.content) 81 except: 82 print('下载失败!') 83 else: 84 print('图片已存在') 85 86 87 def main(): 88 key = input('请输入搜索关键词(英文):') 89 90 url = 'https://www.pexels.com/search/' + key + '/' 91 92 # num = int(input('请输入一共要下载的页数:')) # 默认从第1页开始下载 93 st = int(input('请输入起始页码:')) 94 ed = int(input('请输入终止页码:')) 95 96 fpath = 'C:/python/pic' 97 for i in range(st, ed+1): 98 new_fpath = fpath + '/' + key + '/' + key + '_page' + str(i) 99 makedir(new_fpath, i, key) 100 new_url = url + '?page=' + str(i) 101 get_photos(new_url, new_fpath) 102 time.sleep(3) 103 104 105 main()
不得不说python真的很强大,爬虫真的很有意思,有一种在网页的源代码中分析然后处理做事的快乐~