zoukankan      html  css  js  c++  java
  • 爬取三国演义的章节和内容

    import requests
    from bs4 import BeautifulSoup
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4315.5 Safari/537.36'
    }
    
    url = 'https://www.shicimingju.com/book/sanguoyanyi.html'
    response = requests.get(url=url,headers=headers)
    # print(response.encoding)  # 查看返回数据的编码
    response.encoding = 'utf-8'  # 指定字符集防止乱码
    page_text = response.text
    
    soup = BeautifulSoup(page_text,'lxml')
    li_list = soup.select('.book-mulu > ul >li')
    fp = open('./sanguoyanyi3.txt','w',encoding='utf-8')
    for li in li_list:
        title = li.a.string
        detail_url = 'https://www.shicimingju.com' + li.a['href']
        response_detail = requests.get(url=detail_url,headers=headers)
        response_detail.encoding = 'utf-8'
        detail_text = response_detail.text
        detail_soup = BeautifulSoup(detail_text,'lxml')
        content = detail_soup.find('div', class_='chapter_content').text
        fp.write(title  +':' + content + '
    ')
        print(title,'下载完毕!!')
    
    人生苦短,我用python
  • 相关阅读:
    docker应用部署
    Netty实现WebSocket通信
    spring-mvc框架简单搭建
    spring-tx 事物
    spring注解及简单实用
    Spring Aop实现简单代码实现
    Jedis操作
    Proxy说明
    下载返回流
    向上捅破天,向下扎到根
  • 原文地址:https://www.cnblogs.com/niucunguo/p/14408090.html
Copyright © 2011-2022 走看看