zoukankan      html  css  js  c++  java
  • Python Pandas read_csv报错

    为实现文本去重(将前面采集的数据进行两两对比删除重复),写了以下代码。

    #-*- coding: utf-8 -*-
    import pandas as pd

    inputfile = 'e:/data/H_KJ300F-JAC2101W.txt' #评论文件
    outputfile = 'e:/data/H_KJ300F-JAC2101W_process_1.txt' #评论处理后保存路径
    data = pd.read_csv(inputfile, encoding = 'utf-8', header = None)
    l1 = len(data)
    data = pd.DataFrame(data[0].unique())
    l2 = len(data)
    data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')
    print(u'删除了%s条评论。' %(l1 - l2))

    报错:

    Traceback (most recent call last):  File "<stdin>", line 1, in <module>    return _read(filepath_or_buffer, kwds)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read    data = parser.read()  File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read    ret = self._engine.read(nrows)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read    data = self._reader.read(nrows)  File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)  File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)  File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)  File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)  File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 360, saw 2>>> data =pd.read_csv(inputfile,encoding ='utf-8',header = None)    data = self._reader.read(nrows)  File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)>>>   File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)  File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 361, saw 2  File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)  File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)    ret = self._engine.read(nrows)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read    data = parser.read()  File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read    return _read(filepath_or_buffer, kwds)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read  File "D:Anaconda3libsite-packagespandasioparsers.py", line 646, in parser_fTraceback (most recent call last):  File "<stdin>", line 1, in <module>

    解决:把整个文件里面的半角","换成全角",“

    原因:没有设定分隔符的情况下,默认使用","作为分隔条符。

  • 相关阅读:
    边缘节点 如何判断CDN的预热任务是否执行完成刷新 路由追踪 近期最少使用算法
    查看恶意登录的尝试账号
    系统启动时发生了什么?
    JMS学习(五)--ActiveMQ中的消息的持久化和非持久化 以及 持久订阅者 和 非持久订阅者之间的区别与联系
    查找最近修改过的文件 并处理
    时间写入文件名 nohup 原理 Command In Background your shell may have its own version of nohup
    nohup COMMAND > FILE
    证明即程序、结论公式即程序类型
    C++学习注意
    C++标准库简介(转)
  • 原文地址:https://www.cnblogs.com/a1397240667/p/6812807.html
Copyright © 2011-2022 走看看