zoukankan      html  css  js  c++  java
  • 1209诗人信息&诗词注释&诗句显示美化

    今日进度

    诗人信息

     诗词注释

    爬取诗词注释

    import requests
    from bs4 import BeautifulSoup
    from lxml import etree
    
    headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}#创建头部信息
    pom_list=[]
    k=1
    for i in range(1,1000):
        url='https://www.xungushici.com/shicis/cd-yuan-p-'+str(i)
        r=requests.get(url,headers=headers)
        content=r.content.decode('utf-8')
        soup = BeautifulSoup(content, 'html.parser')
    
        hed=soup.find('div',class_='col col-sm-12 col-lg-9')
        list=hed.find_all('div',class_="card mt-3")
        # print(len(list))
    
        for it in list:
            content = {}
            #1.1获取单页所有诗集
            href=it.find('h4',class_='card-title').a['href']
            real_href='https://www.xungushici.com'+href
            title=it.find('h4',class_='card-title').a.text
            print(title)
            #2.1爬取诗词
            r2 = requests.get(real_href, headers=headers)
            content2 = r2.content.decode('utf-8')
            soup2 = BeautifulSoup(content2, 'html.parser')
            zhu = ""
            if soup2.find('div',class_='card mt-3')==[]:
                zhu=""
                content['title'] = title
                content['zhu'] = zhu
                pom_list.append(content)
                print("" + str(k) + "")
                k = k + 1
                continue
            card_div=soup2.find('div',class_='card mt-3')
    
            if card_div==None or card_div.find('div',class_='card-body')==[]:
                zhu=""
                content['title'] = title
                content['zhu'] = zhu
                pom_list.append(content)
                print("" + str(k) + "")
                k = k + 1
                continue
            card_body=card_div.find('div',class_='card-body')
            p_list=card_body.find_all('p')
            flag=1
            for it in p_list:
                if str(it).find('strong')!=-1 and it.find('strong').text=='注释':
                    flag=0
                    continue
                if flag==0:
                    zhu=zhu+str(it)
            if len(zhu)==0:
                zhu=""
            content['title']=title
            content['zhu']=zhu
            pom_list.append(content)
            print(""+str(k)+"")
            k=k+1
    
    import xlwt
    
    xl = xlwt.Workbook()
    # 调用对象的add_sheet方法
    sheet1 = xl.add_sheet('sheet1', cell_overwrite_ok=True)
    
    sheet1.write(0,0,"title")
    
    sheet1.write(0,12,'zhu')
    
    for i in range(0,len(pom_list)):
        sheet1.write(i+1,0,pom_list[i]['title'])
        sheet1.write(i+1, 12, pom_list[i]['zhu'])
    xl.save("yuan.xlsx")
    # print(pom_list)

    展示效果

     前端页面展示

     诗句美化

    对句子按照句号分割展示,对于七言古诗按照逗号分行展示

     

  • 相关阅读:
    Windows系统Nessus离线(Offline) 版的安装
    Openstack中keystone与外部LDAP Server的集成
    MySQL常用指令
    关于RequestParam在不同的Spring版本上,接口在controller重载时注解可能失效的踩坑记录
    利用反射注册SpringCache的RedisCacheManager缓存信息
    缩减项目代码中的大面积if策略
    Pentaho Report Designer 报表系统
    五种设计模式的分享
    反射的实践测试
    关于内外网分离情况下双网卡访问速度问题的解决
  • 原文地址:https://www.cnblogs.com/xiaofengzai/p/15669229.html
Copyright © 2011-2022 走看看