zoukankan html css js c++ java

微博二级评论爬取

 def wb_child_comment(self,req):
         try:
             main_url = "https://weibo.com/aj/v6/comment/big?ajwvr=6&{}&from=singleWeiBo"
             # self.get_all_content(req)
             #https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4095052063593913&is_child_comment=ture&id=4095051414397198&from=singleWeiBo
             url="https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4213888171751114&is_child_comment=tur&id=4095051414397198&from=singleWeiBo"
             jsonstr = req.get(url).json()
                 #r"https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4215074627189144&is_child_comment=ture&id=4095051414397198&from=singleWeiBo").json()
             croot = html.fromstring(jsonstr["data"]["html"])
             print(croot)
             with open("weibocomment3.html", "w", encoding='utf-8') as fs:
                 fs.write(jsonstr["data"]["html"])
             hava_more_node = croot.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data")
             while hava_more_node:
                 hava_more_url = hava_more_node[0]
                 if hava_more_url:
                     next_c_url = main_url.format(hava_more_url)
                     next_jsonstr = req.get(next_c_url).json()
                     chtml = next_jsonstr["data"]["html"]
                     with open("weibocomment4.html", "w", encoding='utf-8') as fs:
                         fs.write(chtml)
                     croot2 = html.fromstring(chtml)
                     hava_more_node = croot2.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data")
             else:
                 print("no more")
         except:
             print("get child comment error")

思路:

1。第一次需要访问的链接是

https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4215074627189144&is_child_comment=ture&id=4095051414397198&from=singleWeiBo
参数说明:

https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big& 前面这些固定

root_comment_id:是一级评论的id

is_child_comment=ture 固定的

id=4095051414397198 这个id目前还不知道干嘛，有知道朋友请赐教

from=singleWeiBo 固定的必须加这个后面还会用到

2。循环判断是否有更多

获取更多按钮的xpath

hava_more_node = croot.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data")
然后可以获取一个url，拼接完整的url最后在拼接一个重要的参数from=singleWeiBo，如果不加这个参数将取得是一级评论的列表。

查看全文

相关阅读:
Linux 共享库
 使用Visual Studio(VS)开发Qt程序代码提示功能的实现（转）
ZOJ 3469 Food Delivery（区间DP）
POJ 2955 Brackets （区间DP）
HDU 3555 Bomb（数位DP）
HDU 2089 不要62（数位DP）
UESTC 1307 windy数（数位DP）
HDU 4352 XHXJ's LIS（数位DP）
POJ 3252 Round Numbers（数位DP）
HDU 2476 String painter （区间DP）

原文地址：https://www.cnblogs.com/c-x-a/p/8526753.html