zoukankan      html  css  js  c++  java
  • python爬虫开发与项目实践一书 爬取盗墓笔记 其中的json_dump报错问题

    # encoding utf-8    将wb模式改为b模式就不会报错的  环境python36
    import requests
    import json
    from bs4 import BeautifulSoup
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    headers = {'User-Agent': user_agent}
    r = requests.get('http://seputu.com/', headers=headers)
    r.encoding = 'utf-8'
    # print(r.text)
    soup = BeautifulSoup(r.text, 'html.parser')
    content = []
    for mulu in soup.find_all(class_='mulu'):
        h2 = mulu.find("h2")
        # print(h2)
        if not (h2 is None):
            h2_title = h2.string
            list = []
            for a in mulu.find(class_='box').find_all('a'):
                href = a.get('href')
                box_title = a.get('title')
                # href = bytes(href, encoding='utf-8')
                # box_title = bytes(box_title, encoding='utf-8')
                print(href, box_title)
                list.append({'href': href, 'box_title': box_title})
            content.append({'title': h2_title, 'content':list})
    with open('../json_file/mulu.json', 'wb') as fp:
        json.dump(content, fp=fp, indent=4)
    

      

    你不能把坏习惯扔出窗外 但你可以一步步赶下电梯
  • 相关阅读:
    kafka 启动停止
    kafka消息长度限制
    python给回调函数传参数
    Promise封装setTimeout
    Twisted 基础
    kafka-eagle
    go安装
    python asyncio
    ajv参数验证
    sequlizejs学习笔记整理
  • 原文地址:https://www.cnblogs.com/Ychao/p/9210341.html
Copyright © 2011-2022 走看看