zoukankan      html  css  js  c++  java
  • Python 爬取b站专栏图片

    当olinr学会了爬虫。。。
    嘿嘿嘿

    import urllib.request as urqt
    import urllib.parse as urps
    import sys
    import os
    import re
    import shutil
    tot = 0
    def gethtml(url):
        header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0"}
        res = urqt.Request(url, headers = header)
        html = urqt.urlopen(res).read().decode("utf-8")
        return html
    def GetIntoPlace(string):
        os.chdir(r"D:信息python一些成品站专栏图片爬虫")
        have = os.listdir()
        if string in have:
            shutil.rmtree(string)
        os.mkdir(string)
        os.chdir(string)
    def getpng(url):
        global tot, num
        try:
            res = urqt.urlopen(url).read()
        except BaseException:
            return
        tot += 1
        f = open(str(tot) + '.jpg', 'wb')
        f.write(res)
        f.close()
        print("正在下载第 " + str(tot) + " 张")
        if tot == num:
            sys.exit()
    def getans(html):
        key = re.compile('img data-src="//.+?.jpg')
        have = re.findall(key, html)
        for per in have:
            per = "http:" + per[14:]
            getpng(per)
    def work(html):
        key1 = re.compile('a title.+? href=".+?"');
        key2 = re.compile('//.+?"')
        have1 = re.findall(key1, html)
        for i in have1:
            now = "http:" + re.findall(key2, i)[0]
            getans(gethtml(now))
    now = input("请输入想要的图片:")
    num = int(input("请输入想要爬取的图片数量:"))
    frm = int(input("请输入爬取起始页码:"))
    GetIntoPlace(now)
    now = urps.quote(now, encoding = "utf-8");
    while tot < num:
        url = "https://search.bilibili.com/article?keyword=" + now + "&page=" + str(frm)
        work(gethtml(url))
        frm += 1
    
  • 相关阅读:
    置顶
    hbck2的一些用法
    常用的jvm一些监控命令
    HBCK2修复hbase2的常见场景
    HBase2版本的修复工具HBCK2
    使用python写入excel
    CentOS-Linux下面的xfs磁盘配额
    使用podman容器部署飞儿云框架
    在docker中安装宝塔
    在CentOS7中安装Docker并开一台CentOS8的容器
  • 原文地址:https://www.cnblogs.com/olinr/p/13678842.html
Copyright © 2011-2022 走看看