zoukankan html css js c++ java

python爬虫 --- 简书评论

某些网站的一些数据是通过js加载的 ,所以爬取下来的数据拿不到,

找到评论的地址 .进行请求获取评论数据

#coding=utf-8
import json

import requests


def requests_view(response):
    import webbrowser
    requests_url = response.url
    base_url = '<head><base href="%s">' %(requests_url)
    base_url = base_url.encode('utf-8')
    content = response.content.replace(b"<head>",base_url)
    tem_html = open('tmp.html','wb')
    tem_html.write(content)
    tem_html.close()
    webbrowser.open_new_tab("tmp.html")

headers = {
        "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
response = requests.get("https://www.jianshu.com/notes/26504955/comments?comment_id=&author_only=false&since_id=0&max_id=1586510606000&order_by=likes_count&page=1",headers=headers)
comments = json.loads(response.content)

if comments['comment_exist'] == True:
    for item in comments['comments']:
        print(item['user']['nickname'],item['compiled_content'])

查看全文

相关阅读:
Matplotlib 使用
 谈谈 JavaScript 的正则表达式
 Sequelize 学习笔记（11）- Migrations 迁移
 影响 POST 请求文件上传失败的几个环节的配置（php + nginx）
安装 composer 并启动 yii2 项目
 机器学习初探
 如何深拷贝一个对象数组？
断舍离 ——《代码整洁之道》读书笔记
 moment.js 学习笔记
 postgres Date/Time 学习笔记

原文地址：https://www.cnblogs.com/brady-wang/p/8945439.html

热门文章
STL之deque
STL之set
STL之map
STL之queue
STL之stack
STL之vector
STL之string
微信卡券
 iTerm2 使用笔记
 FileZilla 使用笔记