zoukankan      html  css  js  c++  java
  • Python 爬取盗墓笔记的标题,章节,章节名称

    # coding:utf-8
    import requests
    import json
    from bs4 import BeautifulSoup

    user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'

    headers = {'User-Agent': user_agent}

    r = requests.get("http://seputu.com/", headers=headers)

    soup = BeautifulSoup(r.text, 'html.parser', from_encoding='utf-8') # html.parser

    content = []

    for mulu in soup.find_all(class_="mulu"):

    h2 = mulu.find('h2')

    if h2 != None:

    h2_title = h2.string # 获取标题

    list = []

    for a in mulu.find(class_='box').find_all('a'): # 获取所有的a标记中url和章节内容

    href = a.get('href')

    box_title = a.get('title')

    list.append({'href': href, 'box_title':box_title});

    content.append({'title': h2_title, 'content': list})

    with open('qiye.json', 'wb') as fp:
    json.dump(content, fp=fp, indent=4)

  • 相关阅读:
    python3 flask 文件下载服务器
    jquery cdn加速
    cherry 与sqlite
    cherry 与react.js
    cherrypy json 解析
    cherrypy cookies
    cherrypy 打印日志
    cherrypy pytest 覆盖,测试代码
    cherrypy ajax 请求
    cherrypy 访问css文件
  • 原文地址:https://www.cnblogs.com/paulversion/p/8336509.html
Copyright © 2011-2022 走看看