zoukankan      html  css  js  c++  java
  • 爬虫(AJEX)——豆瓣动态页面

    工具:python3

    解释:Ajax 是一种用于创建快速动态网页的技术,在无需重新加载整个网页的情况下,能够更新部分网页的技术。

    目标:爬取使用Ajex结束的豆瓣网页

    import urllib.request
    
    # url为抓包(get请求)获取的,而不是web页面上的 url
    = "https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start=80" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36", }
    # fiddle中webforms中得到的表格数据 formdata
    ={ "page_limit": "20", "page_start": "80", "sort": "recommend", "tag" : "热门", "type": "movie" } data = urllib.parse.urlencode(formdata) data = bytes(data, "utf8")
    request
    = urllib.request.Request(url, data=data, headers=headers) response = urllib.request.urlopen(request).read()
    # response
    = response.decode("utf-8")
    with open(
    "douban.json","w") as f: f.write(str(response))

    执行上述代码后,将得到的内容在json.cn中转码,出现如下错误:

    说明文件格式不对,没能正确转码,尝试将返回值response进行解码:response=response.decode("utf-8")

    得到正确的json格式的文件:

    观察发现url中包含了formdata中的全部数据,尝试将formdata删除:
    import urllib.request
    
    url = "https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start=80"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
               }
    # formdata ={
    #     "page_limit": "20",
    #     "page_start": "80",
    #     "sort": "recommend",
    #     "tag"    : "热门",
    #     "type": "movie"
    # }
    # data = urllib.parse.urlencode(formdata)
    # data = bytes(data, "utf8")
    request = urllib.request.Request(url, headers=headers)
    response = urllib.request.urlopen(request).read()
    response = response.decode("utf-8")
    with open("douban.json","w") as f:
        f.write(str(response))

    运行结果与之前相同!

  • 相关阅读:
    电容
    51单片机
    三极管
    Scala 面向对象(八):特质(接口) 一
    Scala 面向对象(七):静态属性和静态方法
    Scala 面向对象(六):面向对象的特征二:继承 (一)
    Scala 面向对象(五):面向对象的特征一:封装性
    Scala 面向对象(四):import
    Scala 面向对象(三):package 包 (二)
    Scala 面向对象(二):package 包 (一) 入门
  • 原文地址:https://www.cnblogs.com/gaoquanquan/p/9102307.html
Copyright © 2011-2022 走看看