照葫芦画瓢之爬虫豆瓣top100 - 走看看

zoukankan html css js c++ java

照葫芦画瓢之爬虫豆瓣top100

import requests
import re
import json
from requests.exceptions import RequestException

def get(url):
try:
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}
response = requests.get(url,headers = headers)
if response.status_code == 200:
return response.text
return None
except RequestException:
return None
def parse(html):
patter = re.compile('<li.*?cover.*?href="(.*?)"stitle="(.*?)">.*?more-meta.*?author">(.*?)</span>.*?year">(.*?)</span>.*?publisher">(.*?)</span>.*?</li>',re.S)
items = re.findall(patter,html)
for i in items:
yield {
'url': i[0],
'title': i[1],
'name': i[2].strip(),
'date': i[3].strip(),
'pulisher': i[4].strip()
}
def write_to_file (content):
with open('result.txt','a',encoding='utf-8') as f:
f.write(json.dumps(content,ensure_ascii=0)+' ')
f.close()
def main():
url = 'https://book.douban.com/'
html = get(url)
for i in parse(html):
print(i)
write_to_file(i)

if __name__ == '__main__':
main()

查看全文

相关阅读:
asp.net发送email
把GridView控件完全放入UpdatePanel中时，实现了点击编辑、更新等按钮时，页面不再刷新，对话框不起作用
 【原】 POJ 2352 Stars 树状数组解题报告
 【原】 POJ 2739 Sum of Consecutive Prime Numbers 筛素数+积累数组解题报告
 【原】 POJ 2262 Goldbach's Conjecture 筛素数解题报告
 【原】 POJ 2593 Max Sequence 动态规划解题报告
 【原】 POJ 2159 Tree Recovery 解题报告
 【原】 POJ 3067 Japan 2D树状数组+逆序数解题报告
 【原】 POJ 2299 UltraQuickSort 逆序数解题报告
 【原】 POJ 2499 Binary Tree 优化经典解题报告

原文地址：https://www.cnblogs.com/MisterZZL/p/9534307.html

Copyright © 2011-2022 走看看