zoukankan      html  css  js  c++  java
  • 软工博客归档工具(自用)

    #-*- codeing = utf-8 -*-
    #@Time :2021/6/21 16:51
    #@Author :Xxg
    #@Site :
    #@File :作业归档完善版.py
    #@Software :PyCharm
    import random
    import requests
    import pymysql
    from lxml import etree
    import docx
    headers={
        "User-Agent": ""
    }
    url = ''
    
    reponse = requests.get(url, headers=headers)   # reponse
    html = etree.HTML(reponse.text)
    # print(html)
    date = html.xpath('//div[@class="dayTitle"]/a/text()')
    name = html.xpath('//div[@class="postTitle"]/a/span/text()')
    zhaiyao = html.xpath('//div[@class="postCon"]/div[@class="c_b_p_desc"]/text()')
    # 链接
    yueduquanwen = html.xpath('//div[@class="postCon"]/div[@class="c_b_p_desc"]/a/@href')
    for i in range(len(yueduquanwen)):
        url1 = yueduquanwen[i]
        # url1 = "https://www.cnblogs.com/sakura-xxg/category/1990334.html"
        reponse1 = requests.get(url1, headers=headers)  # reponse
        html_son = etree.HTML(reponse1.text)
        title = html_son.xpath('//div[@class="post"]/h1[@class="postTitle"]/a/span/text()')
        print(title)
        content = html_son.xpath('//div[@class="blogpost-body blogpost-body-html"]/p/text()')
        print(content)
        date = html_son.xpath('//div[@class="postDesc"]/span[@id="post-date"]/text()')
        print(date)
    # 创建docx对象
        file = docx.Document()
        file.add_paragraph(date)
        for j in range(len(content)):
            file.add_paragraph(content[j])
        file.save("D:\"+title[0]+".docx")
        # for j in range(len(content)):
        #   file.add_paragraphy(content[j])
        # date_son = html.xpath('//div[@class="dayTitle"]/a/text()')
        # name_son = html.xpath('//div[@class="postTitle"]/a/span/text()')
        # zhaiyao_son = html.xpath('//div[@class="postCon"]/div[@class="c_b_p_desc"]/text()')
        # print(date_son)
        # print(zhaiyao_son)
    print(yueduquanwen)
    # print(date[0])
    # print(name[0].replace(" ","").replace("
    ",""))
    # print(zhaiyao[0].replace("
    ",""))
    # print(zhaiyao[0])
    
    # 保存成word
    # for n in range(len(date)):
    #     file = docx.Document()
    #     file.add_paragraph(date[n])
    #     file.add_paragraph(zhaiyao[2*n].replace("
    ",""))
    #     # file.save("F:\word\"+name[n].replace(" ","").replace("
    ","")+".docx")
    #     print(date[n])
    #     print(zhaiyao[2*n])
  • 相关阅读:
    推荐系统算法总结(转)
    【算法题】求最大子数组之和
    jQuery中的filter和find函数
    获取文件夹大小
    微博140字,英文算半个字,中文算一个字,如何实现?
    Xcode 4 添加 Three20 的方法
    應用程式的設定檔info.plist
    iphone中结束电话后返回自己的应用
    解决问题:The icon file must be 57x57 pixels, in .png format (19014)
    开发中的一些小细节代码分享
  • 原文地址:https://www.cnblogs.com/sakura-xxg/p/14915406.html
Copyright © 2011-2022 走看看