zoukankan      html  css  js  c++  java
  • day_5:Ajax数据爬取

    浏览器调试模式下Network选项XHR筛选出Ajax请求(Request Headers的X-Requested-With:XMLHttpRequest)

    #  个人微博爬取,存入csv(字段:时间、评论、转发、点赞、内容)
    import requests
    import json
    import csv
    
    def get_html(path, page):
        params = {
            'type': 'uid',
            'value': 2830678474,
            'containerid': 1076032830678474,
            'page': page
        }
        headers = {
            'Referer': 'https://m.weibo.cn/u/2830678474',
            'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1'
        }
        try:
            r = requests.get(url=path, params=params, headers=headers)
            r.raise_for_status()
            return r.text
        except:
            print('status_code is not 200')
            return None
    
    def detail_html(txt):
        data = txt.get('data')
        page = data.get('cardlistInfo').get('page')
        cards = data.get('cards')
        with open('weibo.csv', 'a', encoding='utf-8') as csvfile:
            fieldnames = ['created_at', 'attitudes_count', 'comments_count', 'reposts_count', 'content']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            for card in cards:
                if card.get('card_type') != 9:  # 去除广告
                    continue
                mblog = card.get('mblog')
                data = {
                    'created_at' : mblog.get('created_at'),  # 创建时间
                    'attitudes_count': mblog.get('attitudes_count'),  # 点赞数量
                    'comments_count': mblog.get('comments_count'),  # 评论数量
                    'reposts_count': mblog.get('reposts_count'),  # 转发数量
                    'content': mblog.get('text')  # 文本内容
                }
                writer.writerow(data)
        return page
    
    if __name__ == '__main__':
        path = 'https://m.weibo.cn/api/container/getIndex'
        page = 1
        with open('weibo.csv', 'w', encoding='utf-8') as csvfile:
            fieldnames = ['created_at', 'attitudes_count', 'comments_count', 'reposts_count', 'content']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
        while page:
            txt = json.loads(get_html(path, page))
            if txt:
                page = detail_html(txt)
                print(page)
  • 相关阅读:
    十七:CSS之CSS继承和层叠
    十六:CSS之CSS选择器之后代选择器、伪类选择器
    十五:CSS之CSS选择器之群组选择器、全局选择器
    十四:CSS之CSS选择器之标签选择器、类选择器、ID选择器
    Android开发技巧——ViewPager加View情况封装PagerAdapter的实现类
    JAVA知识笔记
    机器学习笔记
    设计模式学习
    Android小知识汇总
    判断GPS是否开启&转到设置GPS界面
  • 原文地址:https://www.cnblogs.com/jp-mao/p/10032876.html
Copyright © 2011-2022 走看看