zoukankan      html  css  js  c++  java
  • python爬虫-《笔趣看》网小说《悟空看私聊》

    小编是个爱看小说的人,哈哈

    # -*- coding:UTF-8 -*-
    '''
    类说明:下载《笔趣看》网小说《悟空看私聊》
    '''
    from bs4 import BeautifulSoup
    import requests,urllib3, sys
    urllib3.disable_warnings()
    
    class downloader(object):
        def __init__(self):
            self.server = 'http://www.biqukan.com/'
            self.target = 'http://www.biqukan.com/37_37039/'
            self.names = []            #存放章节名
            self.urls = []            #存放章节链接
            self.nums = 0            #章节数
    
        """
        函数说明:获取下载链接
        """
        def get_download_url(self):
            req = requests.get(url = self.target)
            html = req.text
            div_bf = BeautifulSoup(html, "html.parser")
            div = div_bf.find_all('div', class_ = 'listmain')
            a_bf = BeautifulSoup(str(div[0]))
            a = a_bf.find_all('a')
            self.nums = len(a[12:])                                #剔除不必要的章节,并统计章节数
            for each in a[12:]:
                self.names.append(each.string)
                self.urls.append(self.server + each.get('href'))
            print(self.names)
            print(self.urls)
    
        """
        函数说明:获取章节内容
            target - 下载连接(string)
            texts - 章节内容(string)
        """
        def get_contents(self, target):
            req = requests.get(url = target)
            aa = req.content
            bf = BeautifulSoup(aa,"html.parser")
            texts = bf.find_all('div', id = 'content')
            a = texts[0].text.replace('xa0'*8,'
    
    ')
            print(a)
            return a
    
        """
        函数说明:将爬取的文章内容写入文件
            name - 章节名称(string)
            path - 当前路径下,小说保存名称(string)
            text - 章节内容(string)
        """
        def writer(self, name, path, text):
            write_flag = True
            with open(path, 'a', encoding='utf-8') as f:
                f.write(name + '
    ')
                f.writelines(text)
                f.write('
    
    ')
    
    if __name__ == "__main__":
        dl = downloader()
        dl.get_download_url()
        print('《悟空看私聊》开始下载:')
        for i in range(dl.nums):
            dl.writer(dl.names[i], 'D://悟空看私聊.txt', dl.get_contents(dl.urls[i]))
            sys.stdout.write("  已下载:%.5f%%" %  float(i/dl.nums*100) + '
    ')
            sys.stdout.flush()
        print('《悟空看私聊》下载完成')

  • 相关阅读:
    UVa 1151 Buy or Build【最小生成树】
    UVa 216 Getting in Line【枚举排列】
    UVa 729 The Hamming Distance Problem【枚举排列】
    HDU 5214 Movie【贪心】
    HDU 5223 GCD
    POJ 1144 Network【割顶】
    UVa 11025 The broken pedometer【枚举子集】
    HDU 2515 Yanghee 的算术【找规律】
    Java基本语法
    Java环境变量,jdk和jre的区别,面向对象语言编程
  • 原文地址:https://www.cnblogs.com/lixy-88428977/p/9366878.html
Copyright © 2011-2022 走看看