爬豆瓣电影
网站分析:
1 打开https://movie.douban.com,选择 【排行榜】,然后随便选择一类型,我这里选择科幻
2 一直浏览网页,发现没有下一的标签,是下滑再加载的,可以判定使用了 ajax 请求,进行异步的加载
检查请求信息:
1.右键【检查】>【Network】
2 找url
简单实现代码
1 from urllib import request 2 import json 3 import time 4 5 headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"} 6 # url 信息:interval_id 表示排名段 可修改 ,limit 限制20个,就是每页请求多少个 7 url = "https://movie.douban.com/j/chart/top_list?type=17&interval_id=100%3A90&action=&start=20&limit=20" 8 9 rsp = request.urlopen(url) 10 data = rsp.read().decode() 11 12 data = json.loads(data) 13 14 print(data)
运行效果
优化输出格式,代码
1 from urllib import request 2 import json 3 4 url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=20&limit=20" 5 6 rsp = request.urlopen(url) 7 data = rsp.read().decode() 8 9 data = json.loads(data) 10 11 #遍历输出每个'k'和‘v’的值 12 for item in data: 13 print("排名:", item['rank']," ", 14 "名称:",item['title']," ", 15 "类型:",item['types']," ", 16 "主演:",item['actors']," ", 17 "国家:",item['regions']," ", 18 "分数:",item['score']," ", 19 "图片:",item['cover_url']," ---------------")
优化效果
好了,这样的效果,看起来更顺眼了