zoukankan      html  css  js  c++  java
  • csv、json 文件读取

    1、CSV 文件存储

    1.1 写入

    简单示例

    import csv
    
    with open('data.csv', 'a') as csvfile:
        writer = csv.writer(csvfile)       # 初始化写入对象,传入文件句柄
        writer.writerow(['id', 'name', 'age'])      # 调用 writerow() 方法传入每行的数据
        writer.writerow(['1', 'rose', '18'])
        writer.writerow(['2', 'john', '19'])
    

    以文本方式打开,分隔符默认为逗号(,):

    id,name,age
    
    1,rose,18
    
    2,john,19
    

    修改默认分隔符:

    writer = csv.writer(csvfile, delimiter=' ')   	# 以空格为分隔符
    

    同时写入多行:

    # 此时参数为二维列表
    writer.writerow([['1', 'rose', '18'], ['2', 'john', '19']])
    

    避免出现空行,可以在写入时加 newline=''

    with open("test.csv", "a+", newline='') as csvfile:
    

    如果数据源是字典

    import csv
    
    with open('data1.csv', 'a') as csvfile:
        fieldnames = ['id', 'name', 'age']      # 定义表头
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)         # 初始化一个字典,将文件句柄和表头传入
        writer.writeheader()        # 写入表头
        writer.writerow({'id': '1', 'name': 'rose', 'age': 18})     # 写入表格中具体内容
    

    编码问题,需要指定 open() 函数编码格式:

    open('data.csv', 'a', encoding='utf-8')
    

    另外 pandas 库的 DataFrame 对象的 to_csv() 方法也可以将数据写入 csv 中。

    1.2 读取

    import csv
    
    with open('data1.csv', 'r') as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            print(row)
    

    结果如下:

    ['id', 'name', 'age']
    ['1', 'rose', '18']
    
    

    Tips:如果有中文需要指定文件编码


    pandas 库的 read_csv() 方法

    import pandas as pd
    
    df = pd.read_csv('data.csv')
    print(df)
    
    

    运行结果如下:

       id  name  age
    0   1  rose   18
    1   2  john   19
    
    

    1.3 避免重复插入表头

    #newline的作用是防止每次插入都有空行    
    with open("test.csv", "a+", newline='') as csvfile:		# 必须使用 a+,追加方式
            writer = csv.writer(csvfile)
            #以读的方式打开csv 用csv.reader方式判断是否存在标题。
            with open("test.csv", "r", newline="") as f:
                reader = csv.reader(f)
                if not [row for row in reader]:
                    writer.writerow(["型号", "分类"])
                    writer.writerows([[keyword, miaoshu]])
                else:
                    writer.writerows([[keyword, miaoshu]])
    
    

    示例

    爬取一下该网站的所有评论:https://www.bestbuy.ca/en-ca/product/hp-hp-officejet-pro-6968-all-in-one-inkjet-printer-with-fax-6968/10441056/review

    import requests
    import time
    import csv
    
    headers = {
            "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) "
                          "Version/11.0 Mobile/15A372 Safari/604.1",
            "Referer": "https://www.bestbuy.ca/en-ca/product/hp-hp-officejet-pro-6968-all-in-one-inkjet-printer-with-fax-"
                       "6968/10441056/review"
        }
    
    
    def get_content(url):
        """爬取数据"""
        res = requests.get(url=url, headers=headers)
        # print(res.status_code)
        return res.json()
    
    
    def parse_res(res):
        """解析数据"""
        csv_data = {}
        # print(res, type(res))
        data = res["reviews"]
        for i in data:
            csv_data["title"] = i["title"]
            csv_data["comment"] = i["comment"]
            csv_data["publish"] = i["reviewerName"]
            csv_data["publish_time"] = i["submissionTime"]
            print(csv_data)
            save_data(csv_data)
    
    
    def save_data(csv_data):
        """存储数据"""
        with open('data.csv', 'a+', newline='') as csvfile:
            # 以读的方式打开 csv,判断表格是否有数据
            with open('data.csv', 'r', newline='') as f:
                reader = csv.reader(f)
                fieldnames = ['title', 'comment', 'publish', 'publish_time']
                writer = csv.DictWriter(csvfile, fieldnames=fieldnames)		# DictWriter: 字典
                if not [row for row in reader]:
                    writer.writeheader()
                    writer.writerow(csv_data)
                else:
                    writer.writerow(csv_data)
    
    if __name__ == '__main__':
        for i in range(1, 11):
            url = 'https://www.bestbuy.ca/api/v2/json/reviews/10441056?source=all&lang=en-CA&pageSize=10&page=%s' 
                  '&sortBy=date&sortDir=desc' % i
            res = get_content(url)
            time.sleep(2)
            parse_res(res)
    
    

    参考文章:https://blog.csdn.net/qq_41817302/article/details/88680886

    2. JSON 文件存储

    2.1 读取 JSON

    import json
    
    s = '''
        [{
            "name": "rose",
            "gender": "female",
            "age": "18"
        }]
    '''
    
    data = json.loads(s)
    print(data)
    print(type(data))
    
    

    运行结果如下:

    [{'name': 'rose', 'gender': 'female', 'age': '18'}]
    <class 'list'>			# 因为最外层是列表
    
    

    读取 JSON 文件

    with open('data.json', 'r') as f:
        s = f.read()
        data = json.loads(s)
        print(data)
    
    

    2.2 输出 JSON

    import json
    
    data = [{
            "name": "rose",
            "gender": "female",
            "age": "18"
        }]
    
    
    with open('data.json', 'a') as f:
        f.write(json.dumps(data))
    
    

    缩进 2 个字符,这样结构更清晰:

    with open('data.json', 'a') as f:
        f.write(json.dumps(data, indent=2))
    
    

    运行结果如下:

    [
      {
        "name": "rose",
        "gender": "female",
        "age": "18"
      }
    ]
    
    

    如果输出的包含中文,须臾指定参数 ensure_ascii=False,否则默认转换为 Unicode 字符:

    with open('data.json', 'a') as f:
        f.write(json.dumps(data, indent=2, ensure_ascii=False))
    
    
  • 相关阅读:
    pyinstaller打包Django项目
    一文览遍Lua
    6. Z 字形变换
    leetcode 5. 最长回文子串
    leetcode 4. 寻找两个正序数组的中位数
    leetcode 3. 无重复字符的最长子串
    leetcode 2.两数相加
    leetcode 1. 两数之和
    线上又炸了
    EasyNetQ 不同框架序列化反序列化问题
  • 原文地址:https://www.cnblogs.com/midworld/p/11380242.html
Copyright © 2011-2022 走看看