zoukankan      html  css  js  c++  java
  • Ajax实战微博

    转载自:静觅 » [Python3网络爬虫开发实战] 6.3-Ajax结果提取

     上面的代码中比较好的几个地方记录:

     1 base_url = 'https://m.weibo.cn/api/container/getIndex?'
     2 
     3 headers = {
     4     'Host': 'm.weibo.cn',
     5     'Referer': 'https://m.weibo.cn/u/2830678474',
     6     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
     7     'X-Requested-With': 'XMLHttpRequest',
     8 }
     9 
    10 
    11 def get_page(page):
    12     params = {
    13         'type': 'uid',
    14         'value': '2830678474',
    15         'containerid': '1076032830678474',
    16         'page': page
    17     }
    18     
    19     # 在这一步中将url分成路径和参数两个部分,使用urlencode对参数进行加载
    20     url = base_url + urlencode(params)
    21     try:
    22         response = requests.get(url, headers=headers)
    23         # 这个部分对返回码进行判断,去掉非正常情况的处理
    24         if response.status_code == 200:
    25             # 返回结果是json格式的直接调用json方法,不用json.loads(response.content)
    26             return response.json()
    27     except requests.ConnectionError as e:
    28         print('Error', e.args)

    个人代码:

     1 import requests
     2 import json
     3 
     4 headers = {
     5     "Referer":"https://m.weibo.cn/u/2830678474?sudaref=cuiqingcai.com&display=0&retcode=6102",
     6     "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
     7     "X-Requested-With":"XMLHttpRequest",
     8     "X-XSRF-TOKEN":"609539"
     9 }
    10 
    11 url = "https://m.weibo.cn/api/container/getIndex?sudaref=cuiqingcai.com&display=0&retcode=6102&type=uid&value=2830678474&containerid=1076032830678474"
    12 while True:
    13     response = requests.get(url,headers=headers)
    14     try:
    15         since_id = json.loads(response.content)["data"]["cardlistInfo"]["since_id"]
    16     except:
    17         break
    18     url = "https://m.weibo.cn/api/container/getIndex?sudaref=cuiqingcai.com&display=0&retcode=6102&type=uid&value=2830678474&containerid=1076032830678474&since_id=" + str(since_id)
    19     content = json.loads(response.content)["data"]["cards"]
    20     for i in range(10):
    21         try:
    22             print(content[i]["mblog"]["text"])
    23         except:
    24             continue

    部分结果展示:

    1 每当我颓废的时候,看看这个视频,我就浑身充满了斗志!为了我和我老婆的小米之家!我可以!我能行!加油! <a data-url="http://t.cn/A6hrPmIS" href="https://m.weibo.cn/p/index?containerid=2304444475185156522026&url_type=39&object_type=video&pos=1&luicode=10000011&lfid=1076032830678474" data-hide=""><span class='url-icon'><img style=' 1rem;height: 1rem' src='https://h5.sinaimg.cn/upload/2015/09/25/3/timeline_card_small_video_default.png'></span><span class="surl-text">崔庆才丨静觅的微博视频</span></a> 
    2 <span class="url-icon"><img alt=[doge] src="//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-861403219c.png" style="1em; height:1em;" /></span> 
    3 转发微博
    4 今天我和我老婆都是健康饮食的好仔仔。<span class="url-icon"><img alt=[馋嘴] src="//h5.sinaimg.cn/m/emoticon/icon/default/d_chanzui-01ee2388fd.png" style="1em; height:1em;" /></span> 
  • 相关阅读:
    保持URL不变和数字验证
    centOS ftp key?
    本地环境测试二级域名
    linux 解决You don't have permission to access 问题
    php smarty section loop
    php header Cannot modify header information headers already sent by ... 解决办法
    linux部分命令
    Linux 里面的文件操作权限说明
    用IT网络和安全专业人士视角来裁剪云的定义
    SQL Server 2008 R2炫酷报表"智"作有方
  • 原文地址:https://www.cnblogs.com/waws1314/p/12501707.html
Copyright © 2011-2022 走看看