zoukankan      html  css  js  c++  java
  • 微博二级评论爬取

     def wb_child_comment(self,req):
             try:
                 main_url = "https://weibo.com/aj/v6/comment/big?ajwvr=6&{}&from=singleWeiBo"
                 # self.get_all_content(req)
                 #https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4095052063593913&is_child_comment=ture&id=4095051414397198&from=singleWeiBo
                 url="https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4213888171751114&is_child_comment=tur&id=4095051414397198&from=singleWeiBo"
                 jsonstr = req.get(url).json()
                     #r"https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4215074627189144&is_child_comment=ture&id=4095051414397198&from=singleWeiBo").json()
                 croot = html.fromstring(jsonstr["data"]["html"])
                 print(croot)
                 with open("weibocomment3.html", "w", encoding='utf-8') as fs:
                     fs.write(jsonstr["data"]["html"])
                 hava_more_node = croot.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data")
                 while hava_more_node:
                     hava_more_url = hava_more_node[0]
                     if hava_more_url:
                         next_c_url = main_url.format(hava_more_url)
                         next_jsonstr = req.get(next_c_url).json()
                         chtml = next_jsonstr["data"]["html"]
                         with open("weibocomment4.html", "w", encoding='utf-8') as fs:
                             fs.write(chtml)
                         croot2 = html.fromstring(chtml)
                         hava_more_node = croot2.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data")
                 else:
                     print("no more")
             except:
                 print("get child comment error")
    

    思路:

    1。第一次需要访问的链接是

    https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4215074627189144&is_child_comment=ture&id=4095051414397198&from=singleWeiBo
    参数说明:

    https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big& 前面这些固定


    root_comment_id:是一级评论的id

    is_child_comment=ture 固定的

    id=4095051414397198 这个id目前还不知道干嘛,有知道朋友请赐教

    from=singleWeiBo 固定的必须加 这个后面还会用到

         2。循环判断是否有更多

           获取更多按钮的xpath

          hava_more_node = croot.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data")
          然后可以获取一个url,拼接完整的url最后在拼接一个重要的参数from=singleWeiBo,如果不加这个参数将取得是一级评论的列表。 

    
    
  • 相关阅读:
    python爬虫学习
    Java基础知识11--Optional类
    07 Windows访问远程共享文件夹---利用\IP地址
    Springcloud 学习笔记15-打开postman console控制台,查看接口测试打印log日志信息
    Springcloud 学习笔记14-IaaS, PaaS和SaaS的区别
    Springcloud 学习笔记13-使用PostMan上传/下载文件,前后端联合测试
    Java基础知识10--Stream API详解02
    Java基础知识09--Stream API详解01
    洛谷 P2587 [ZJOI2008]泡泡堂(贪心)
    洛谷 P3199 [HNOI2009]最小圈(01分数规划,spfa判负环)
  • 原文地址:https://www.cnblogs.com/c-x-a/p/8526753.html
Copyright © 2011-2022 走看看