zoukankan      html  css  js  c++  java
  • urllib|requests爬取网页Ajax,以豆瓣电影为例

      python3中的urllib库和requests库的使用,这里主要介绍下什么是Ajax,以及对于网页Ajax的爬取,以豆瓣电影为例,分别用urllib库和requests库进行抓取。
      
      一、什么是Ajax?
      
      “Ajax 即“Asynchronous Javascript And XML”(异步 JavaScript 和 XML),是指一种创建交互式网页应用的网页开发技术。Ajax = 异步 JavaScript 和 XML(标准通用标记语言的子集)。Ajax 是一种用于创建快速动态网页的技术。Ajax 是一种在无需重新加载整个网页的情况下,能够更新部分网页的技术。通过在后台与服务器进行少量数据交换,Ajax 可以使网页实现异步更新。这意味着可以在不重新加载整个网页的情况下,对网页的某部分进行更新。传统的网页(不使用 Ajax)如果需要更新内容,必须重载整个网页页面。”

      二、urllib对于豆瓣电影Ajax的爬取:

    import urllib.request
    from urllib import parse
    
    #headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
      #       "Referer":"https://movie.douban.com/explore"}
    #fiddler抓包豆瓣,找到POST的请求网址
    url="https://movie.douban.com/j/search_subjects?type=movie&tag=%E5%8D%8E%E8%AF%AD&sort=recommend"
    formdata={"page_limit":"20","page_start":"0"}
    data=parse.urlencode(formdata)  #编码
    #print(data)
    request=urllib.request.Request(url,data=data.encode('utf-8'))  #post,发送请求,传递data
    
    #用add_headers()来添加headers
    request.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36")
    
    response=urllib.request.urlopen(request).read()   #打开请求 读取数据
    print(response.decode("utf-8"))

    运行结果如下:

    {"subjects":[{"rate":"9.0","cover_x":2810,"title":"我不是药神","url":"https://movie.douban.com/subject/26752088/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561305376.jpg","id":"26752088","cover_y":3937,"is_new":false},{"rate":"8.5","cover_x":5594,"title":"哪吒之魔童降世","url":"https://movie.douban.com/subject/26794435/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2563780504.jpg","id":"26794435","cover_y":8268,"is_new":false},{"rate":"7.9","cover_x":1786,"title":"流浪地球","url":"https://movie.douban.com/subject/26266893/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2545472803.jpg","id":"26266893","cover_y":2500,"is_new":false},{"rate":"4.7","cover_x":5906,"title":"诛仙 Ⅰ","url":"https://movie.douban.com/subject/25779217/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2567346094.jpg","id":"25779217","cover_y":8268,"is_new":false},{"rate":"8.3","cover_x":5906,"title":"少年的你","url":"https://movie.douban.com/subject/30166972/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2572166063.jpg","id":"30166972","cover_y":8268,"is_new":false},{"rate":"6.5","cover_x":679,"title":"西虹市首富","url":"https://movie.douban.com/subject/27605698/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2529206747.jpg","id":"27605698","cover_y":950,"is_new":false},{"rate":"7.8","cover_x":5906,"title":"我和我的祖国","url":"https://movie.douban.com/subject/32659890/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2567998580.jpg","id":"32659890","cover_y":8268,"is_new":false},{"rate":"7.1","cover_x":1080,"title":"一出好戏","url":"https://movie.douban.com/subject/26985127/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2529571873.jpg","id":"26985127","cover_y":1512,"is_new":false},{"rate":"6.9","cover_x":7142,"title":"飞驰人生","url":"https://movie.douban.com/subject/30163509/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2542973862.jpg","id":"30163509","cover_y":10000,"is_new":false},{"rate":"8.1","cover_x":1429,"title":"无名之辈","url":"https://movie.douban.com/subject/27110296/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2539661066.jpg","id":"27110296","cover_y":2000,"is_new":false},{"rate":"6.7","cover_x":1286,"title":"中国机长","url":"https://movie.douban.com/subject/30295905/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2568258113.jpg","id":"30295905","cover_y":1800,"is_new":false},{"rate":"8.1","cover_x":1000,"title":"无双","url":"https://movie.douban.com/subject/26425063/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2535260806.jpg","id":"26425063","cover_y":1400,"is_new":false},{"rate":"6.4","cover_x":960,"title":"疯狂的外星人","url":"https://movie.douban.com/subject/25986662/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2541901817.jpg","id":"25986662","cover_y":1359,"is_new":false},{"rate":"6.0","cover_x":1080,"title":"囧妈","url":"https://movie.douban.com/subject/30306570/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2581835383.jpg","id":"30306570","cover_y":1542,"is_new":false},{"rate":"7.9","cover_x":5315,"title":"白蛇:缘起","url":"https://movie.douban.com/subject/30331149/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2544313786.jpg","id":"30331149","cover_y":7441,"is_new":false},{"rate":"7.2","cover_x":2999,"title":"动物世界","url":"https://movie.douban.com/subject/26925317/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2525528688.jpg","id":"26925317","cover_y":4181,"is_new":false},{"rate":"7.0","cover_x":2048,"title":"邪不压正","url":"https://movie.douban.com/subject/26366496/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2526297221.jpg","id":"26366496","cover_y":2867,"is_new":false},{"rate":"7.4","cover_x":1000,"title":"半个喜剧","url":"https://movie.douban.com/subject/30269016/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2576482356.jpg","id":"30269016","cover_y":1500,"is_new":false},{"rate":"6.8","cover_x":2000,"title":"超时空同居","url":"https://movie.douban.com/subject/27133303/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2520331478.jpg","id":"27133303","cover_y":2800,"is_new":false},{"rate":"8.2","cover_x":2000,"title":"罗小黑战记","url":"https://movie.douban.com/subject/26709258/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2568288336.jpg","id":"26709258","cover_y":3208,"is_new":false}]}
    
    Process finished with exit code 0

     三、requests爬取豆瓣电影

    import requests
    
    headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
             "Referer":"https://movie.douban.com/explore"}
    
    url="https://movie.douban.com/j/search_subjects?type=movie&tag=%E5%8D%8E%E8%AF%AD&sort=recommend"
    data={"page_limit":"20","page_start":"0"}
    response=requests.post(url,headers=headers,data=data)
    response.encoding="utf-8"
    print(response.status_code)
    print(response.text)

    运行结果如下:

    200
    {"subjects":[{"rate":"9.0","cover_x":2810,"title":"我不是药神","url":"https://movie.douban.com/subject/26752088/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561305376.jpg","id":"26752088","cover_y":3937,"is_new":false},{"rate":"8.5","cover_x":5594,"title":"哪吒之魔童降世","url":"https://movie.douban.com/subject/26794435/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2563780504.jpg","id":"26794435","cover_y":8268,"is_new":false},{"rate":"7.9","cover_x":1786,"title":"流浪地球","url":"https://movie.douban.com/subject/26266893/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2545472803.jpg","id":"26266893","cover_y":2500,"is_new":false},{"rate":"4.7","cover_x":5906,"title":"诛仙 Ⅰ","url":"https://movie.douban.com/subject/25779217/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2567346094.jpg","id":"25779217","cover_y":8268,"is_new":false},{"rate":"8.3","cover_x":5906,"title":"少年的你","url":"https://movie.douban.com/subject/30166972/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2572166063.jpg","id":"30166972","cover_y":8268,"is_new":false},{"rate":"6.5","cover_x":679,"title":"西虹市首富","url":"https://movie.douban.com/subject/27605698/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2529206747.jpg","id":"27605698","cover_y":950,"is_new":false},{"rate":"7.8","cover_x":5906,"title":"我和我的祖国","url":"https://movie.douban.com/subject/32659890/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2567998580.jpg","id":"32659890","cover_y":8268,"is_new":false},{"rate":"7.1","cover_x":1080,"title":"一出好戏","url":"https://movie.douban.com/subject/26985127/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2529571873.jpg","id":"26985127","cover_y":1512,"is_new":false},{"rate":"6.9","cover_x":7142,"title":"飞驰人生","url":"https://movie.douban.com/subject/30163509/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2542973862.jpg","id":"30163509","cover_y":10000,"is_new":false},{"rate":"8.1","cover_x":1429,"title":"无名之辈","url":"https://movie.douban.com/subject/27110296/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2539661066.jpg","id":"27110296","cover_y":2000,"is_new":false},{"rate":"6.7","cover_x":1286,"title":"中国机长","url":"https://movie.douban.com/subject/30295905/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2568258113.jpg","id":"30295905","cover_y":1800,"is_new":false},{"rate":"8.1","cover_x":1000,"title":"无双","url":"https://movie.douban.com/subject/26425063/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2535260806.jpg","id":"26425063","cover_y":1400,"is_new":false},{"rate":"6.4","cover_x":960,"title":"疯狂的外星人","url":"https://movie.douban.com/subject/25986662/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2541901817.jpg","id":"25986662","cover_y":1359,"is_new":false},{"rate":"6.0","cover_x":1080,"title":"囧妈","url":"https://movie.douban.com/subject/30306570/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2581835383.jpg","id":"30306570","cover_y":1542,"is_new":false},{"rate":"7.9","cover_x":5315,"title":"白蛇:缘起","url":"https://movie.douban.com/subject/30331149/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2544313786.jpg","id":"30331149","cover_y":7441,"is_new":false},{"rate":"7.2","cover_x":2999,"title":"动物世界","url":"https://movie.douban.com/subject/26925317/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2525528688.jpg","id":"26925317","cover_y":4181,"is_new":false},{"rate":"7.0","cover_x":2048,"title":"邪不压正","url":"https://movie.douban.com/subject/26366496/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2526297221.jpg","id":"26366496","cover_y":2867,"is_new":false},{"rate":"7.4","cover_x":1000,"title":"半个喜剧","url":"https://movie.douban.com/subject/30269016/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2576482356.jpg","id":"30269016","cover_y":1500,"is_new":false},{"rate":"6.8","cover_x":2000,"title":"超时空同居","url":"https://movie.douban.com/subject/27133303/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2520331478.jpg","id":"27133303","cover_y":2800,"is_new":false},{"rate":"8.2","cover_x":2000,"title":"罗小黑战记","url":"https://movie.douban.com/subject/26709258/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2568288336.jpg","id":"26709258","cover_y":3208,"is_new":false}]}
    
    Process finished with exit code 0
  • 相关阅读:
    常用的正则表达式,字符串,地址操作
    倒计时工具
    Java—集合框架List
    Java—包装类、Date和SimpleDateFormat、Calendar类
    Java—字符串
    Java —异常
    Java—多态
    Java—继承
    Java—封装
    Java —类和对象
  • 原文地址:https://www.cnblogs.com/my-global/p/12441205.html
Copyright © 2011-2022 走看看