zoukankan html css js c++ java

下载代码python之小说下载器

时光紧张，先记一笔，后续优化与完善。

首先声明,我写这个是为了练手,我不看小说了.因为眼睛近视太厉害了,我连手机都不玩了.

小说下载器的目标是为了解决当初市面上能下载最新小说的网站是在太少了,但是在线观看的却很多,所以我写了这个在线抓取小说的工具.代码是针对特定的网站编写的代码,但是我认为这个网站时光很长,小说也很全,应该能满意绝大多数的需求,网站名字这里不说,一会大家代码里看,我怕有法律纠纷.

因为这个是一个网页抓取去读html的一个工具,所以需要一个解析html的框架,我发明了pyquery,因为我自己以为jquery学得不错(jquery 写过自己的插件,浏览器兼容性问题不大都能处理,jqueryui基本上全部的东西都用过,还自定制过很多jqueryui插件.能自己修复官方bug),发明了这个pyquery法宝,肯定不能放过.安装python插件我应用的是easy_install,

我原先应用的pip但是发明不如easy_install好用,我在装pyquery的时候,用pip就不能安装胜利,pip在处理依赖库的时候报错了,我用easy_install就安装胜利了.easy_install 和pip的安装可以看这里:http://blog.csdn.net/qq413041153/article/details/8950247

安装好easy_install 以后直接在cmd里面输入:

easy_install pyquery

如图,因为我已经安装过了,所以直接提示我已经在easy-install.pth中激活了pyquery1.2.4.

下载和代码

上面直接上代码:

每日一道理
如果说生命是一座庄严的城堡，如果说生命是一株苍茂的大树，如果说生命是一只飞翔的海鸟。那么，信念就是那穹顶的梁柱，就是那深扎的树根，就是那扇动的翅膀。没有信念，生命的动力便荡然无存；没有信念，生命的美丽便杳然西去。（划线处可以换其他词语）

# -*- coding:gbk -*-
'''
file desc:novel downloader
author:kingviker
email:kingviker@163.com.kingviker88@gmail.com
date:2013-05-21
depends:python 2.7.4,pyquery
'''

import os,codecs
from pyquery import PyQuery as pq


saveMode="singleFile" #singleFile or singleChapter

#novel's main webpage.
url = "http://www.dushuge.net/html/14/14712/"
#where the novels will be saved
baseSavePath="E:/enovel/"

#using pyquery to grub the webpage's content
html_pq = pq(url=url)

#using jquery's grammar to get the novel's name/
novelName = html_pq("div.book_news_style_text2 > h1").text()
print novelName


#if the novel's file system  not exists,created.
if os.path.exists(baseSavePath+novelName) is not True:
    os.mkdir(baseSavePath+novelName)

#using to save pieces and chapter lists
pieceList=[]
chapterList=[]


#find the first piece of the novel.
piece = pq(html_pq("div.book_article_texttable").find(".book_article_texttitle")[0])

#get the current piece's text
pieceList.append(piece.text())
print "piece Text:", piece.text()

#scan out the piece and chapter lists
nextPiece=False
while nextPiece==False:
    chapterDiv = piece.next()
    #print "章节div长度:",chapterDiv.length
    piece = chapterDiv
    if chapterDiv.length==0:
        pieceList.append(chapterList[:])
        del chapterList[:]
        nextPiece=True
    elif chapterDiv.attr("class")=="book_article_texttitle":
        pieceList.append(chapterList[:])
        del chapterList[:]
        pieceList.append(piece.text())
        
    else:
        chapterUrls = chapterDiv.find("a");
        for urlA in chapterUrls:
            urlList_temp = [pq(urlA).text(),pq(urlA).attr("href")]
            chapterList.append(urlList_temp)

print "下载列表收集实现",len(pieceList)


#based on the piecelist,grub the special webpage's novel content and save them .
if saveMode == "singleFile":
    
    if os.path.exists(baseSavePath+novelName+".txt"):os.remove(baseSavePath+novelName+".txt")

    #using codecs to create a file. write mode(w+) is appended.
    novelFile = codecs.open(baseSavePath+novelName+".txt","wb+","utf-8")
    #just using two for loops to analyze the piecelist.
    for pieceNum in range(0,len(pieceList),2):
        piece = pieceList[pieceNum]
        print "开始下载",pieceList[pieceNum]
        chapterList = pieceList[pieceNum+1]
        for chapterNum in range(0,len(chapterList)):
            chapter = chapterList[chapterNum]
            print "开始下载",chapter[0],"地址:",chapter[1]
            chapterPage = pq(url=url+chapter[1])

            chapterContent = piece+" "+chapter[0]+"\r\n"
            chapterContent += chapterPage("#booktext").html().replace("<br />","\r\n")
            print "小说内容:",len(chapterContent)
            novelFile.write(chapterContent+"\r\n"+"\r\n")
            
    novelFile.close()
else:
    # as same as above
   for pieceNum in range(0,len(pieceList),2):
        piece = pieceList[pieceNum]
        print "开始下载",pieceList[pieceNum]
        chapterList = pieceList[pieceNum+1]
        for chapterNum in range(0,len(chapterList)):
            chapter = chapterList[chapterNum]
            print "开始下载",chapter[0],"地址:",chapter[1]
            novelFile = codecs.open(baseSavePath+novelName+os.sep+piece+chapter[0]+".txt","wb","utf-8")
            chapterPage = pq(url=url+chapter[1])

            chapterContent = piece+" "+chapter[0]+"\r\n"
            chapterContent += chapterPage("#booktext").html().replace("<br />","\r\n")
            print "小说内容:",len(chapterContent)
            novelFile.write(chapterContent+"\r\n"+"\r\n")
            novelFile.close()

print "下载实现"

直接改换代码中的小说主页面即可下载,小说文件会放在e:/novel/下,可以选择单章保存或者单文件保存.

我没有封装成函数,因为我比较懒.

有问题或者错误欢送批评指正.

弥补:

代码里面用到了codecs,这里有篇文章可以帮助大家了解codecs:传送门

文章结束给大家分享下程序员的一些笑话语录：人在天涯钻，哪儿能不挨砖？日啖板砖三百颗，不辞长做天涯人～

查看全文

相关阅读:
Spring用代码来读取properties文件
 单链表与双链表的区别
 为什么有些IP无法PING通但又能访问
 使用iperf3调试网络
 arm linux 移植 iperf3
ZYNQ：PetaLinux工程更新HDF文件的脚本
 ZYNQ：使用PetaLinux打包 BOOT.BIN、image.ub
ZYNQ：提取PetaLinux中Linux和UBoot配置、源码
 ZYNQ：使用 PetaLinux 构建Linux项目
 ZYNQ：使用SDK打包BOOT.BIN、烧录BOOT.BIN到QSPI-FLASH

原文地址：https://www.cnblogs.com/xinyuyuanm/p/3091510.html