zoukankan      html  css  js  c++  java
  • Python爬去知乎上问题下所有图片

    from zhihu_oauth import ZhihuClient
    from zhihu_oauth.exception import NeedCaptchaException
    
    client = ZhihuClient()
    
    try:
        client.login('email_or_phone', 'password')
        print(u"登陆成功!")
    except NeedCaptchaException:
        # 保存验证码并提示输入,重新登录
        with open('a.gif', 'wb') as f:
            f.write(client.get_captcha())
        captcha = input('please input captcha:')
        client.login('+8613872273541', 'z289784552', captcha)
        print(u"登陆成功!")
    client.save_token('token.pkl')
    获取Token
    from __future__ import print_function # 使用python3的print方法
    from zhihu_oauth import ZhihuClient
    import re
    import os
    import urllib.request
    
    client = ZhihuClient()
    # 登录
    client.load_token('token.pkl')  # 加载token文件
    id = 46508954 # https://www.zhihu.com/question/24400664(长得好看是一种怎么样的体验)
    question = client.question(id)
    print(u"问题:",question.title)
    print(u"回答数量:",question.answer_count)
    os.mkdir(question.title + u"(图片)")
    path = question.title + u"(图片)"
    index = 1 # 图片序号
    for answer in question.answers:
        content = answer.content  # 回答内容
        re_compile = re.compile(r'<img src="(https://picd.zhimg.com/.*?.(jpg|png))".*?>')
        img_lists = re.findall(re_compile, content)
        if (img_lists):
            for img in img_lists:
                img_url = img[0]  # 图片url
                urllib.request.urlretrieve(img_url, path + u"/%d.jpg" % index)
                print(u"成功保存第%d张图片" % index)
                index += 1
    加载Token并读取数据
  • 相关阅读:
    07.swoole学习笔记--tcp客户端
    06.swoole学习笔记--异步tcp服务器
    04.swoole学习笔记--webSocket服务器
    bzoj 4516: [Sdoi2016]生成魔咒
    bzoj 3238: [Ahoi2013]差异
    bzoj 4566: [Haoi2016]找相同字符
    bzoj 4199: [Noi2015]品酒大会
    后缀数组之hihocoder 重复旋律1-4
    二分查找
    内置函数--sorted,filter,map
  • 原文地址:https://www.cnblogs.com/wuyujie/p/9441927.html
Copyright © 2011-2022 走看看