zoukankan      html  css  js  c++  java
  • python爬虫-《笔趣看》网小说《悟空看私聊》

    小编是个爱看小说的人,哈哈

    # -*- coding:UTF-8 -*-
    '''
    类说明:下载《笔趣看》网小说《悟空看私聊》
    '''
    from bs4 import BeautifulSoup
    import requests,urllib3, sys
    urllib3.disable_warnings()
    
    class downloader(object):
        def __init__(self):
            self.server = 'http://www.biqukan.com/'
            self.target = 'http://www.biqukan.com/37_37039/'
            self.names = []            #存放章节名
            self.urls = []            #存放章节链接
            self.nums = 0            #章节数
    
        """
        函数说明:获取下载链接
        """
        def get_download_url(self):
            req = requests.get(url = self.target)
            html = req.text
            div_bf = BeautifulSoup(html, "html.parser")
            div = div_bf.find_all('div', class_ = 'listmain')
            a_bf = BeautifulSoup(str(div[0]))
            a = a_bf.find_all('a')
            self.nums = len(a[12:])                                #剔除不必要的章节,并统计章节数
            for each in a[12:]:
                self.names.append(each.string)
                self.urls.append(self.server + each.get('href'))
            print(self.names)
            print(self.urls)
    
        """
        函数说明:获取章节内容
            target - 下载连接(string)
            texts - 章节内容(string)
        """
        def get_contents(self, target):
            req = requests.get(url = target)
            aa = req.content
            bf = BeautifulSoup(aa,"html.parser")
            texts = bf.find_all('div', id = 'content')
            a = texts[0].text.replace('xa0'*8,'
    
    ')
            print(a)
            return a
    
        """
        函数说明:将爬取的文章内容写入文件
            name - 章节名称(string)
            path - 当前路径下,小说保存名称(string)
            text - 章节内容(string)
        """
        def writer(self, name, path, text):
            write_flag = True
            with open(path, 'a', encoding='utf-8') as f:
                f.write(name + '
    ')
                f.writelines(text)
                f.write('
    
    ')
    
    if __name__ == "__main__":
        dl = downloader()
        dl.get_download_url()
        print('《悟空看私聊》开始下载:')
        for i in range(dl.nums):
            dl.writer(dl.names[i], 'D://悟空看私聊.txt', dl.get_contents(dl.urls[i]))
            sys.stdout.write("  已下载:%.5f%%" %  float(i/dl.nums*100) + '
    ')
            sys.stdout.flush()
        print('《悟空看私聊》下载完成')

  • 相关阅读:
    Dynamics AX 2012 R2 配置E-Mail模板
    Dynamics AX 2012 R2 设置E-Mail
    Dynamics AX 2012 R2 为运行失败的批处理任务设置预警
    Dynamics AX 2012 R2 耗尽用户
    Dynamics AX 2012 R2 创建一个专用的批处理服务器
    Dynamics AX 2012 R2 创建一个带有负载均衡的服务器集群
    Dynamics AX 2012 R2 安装额外的AOS
    Dynamics AX 2012 R2 将系统用户账号连接到工作人员记录
    Dynamics AX 2012 R2 从代码中调用SSRS Report
    Dynamics AX 2012 R2 IIS WebSite Unauthorized 401
  • 原文地址:https://www.cnblogs.com/lixy-88428977/p/9366878.html
Copyright © 2011-2022 走看看