zoukankan      html  css  js  c++  java
  • Python Pandas read_csv报错

    为实现文本去重(将前面采集的数据进行两两对比删除重复),写了以下代码。

    #-*- coding: utf-8 -*-
    import pandas as pd

    inputfile = 'e:/data/H_KJ300F-JAC2101W.txt' #评论文件
    outputfile = 'e:/data/H_KJ300F-JAC2101W_process_1.txt' #评论处理后保存路径
    data = pd.read_csv(inputfile, encoding = 'utf-8', header = None)
    l1 = len(data)
    data = pd.DataFrame(data[0].unique())
    l2 = len(data)
    data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')
    print(u'删除了%s条评论。' %(l1 - l2))

    报错:

    Traceback (most recent call last):  File "<stdin>", line 1, in <module>    return _read(filepath_or_buffer, kwds)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read    data = parser.read()  File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read    ret = self._engine.read(nrows)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read    data = self._reader.read(nrows)  File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)  File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)  File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)  File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)  File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 360, saw 2>>> data =pd.read_csv(inputfile,encoding ='utf-8',header = None)    data = self._reader.read(nrows)  File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)>>>   File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)  File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 361, saw 2  File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)  File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)    ret = self._engine.read(nrows)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read    data = parser.read()  File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read    return _read(filepath_or_buffer, kwds)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read  File "D:Anaconda3libsite-packagespandasioparsers.py", line 646, in parser_fTraceback (most recent call last):  File "<stdin>", line 1, in <module>

    解决:把整个文件里面的半角","换成全角",“

    原因:没有设定分隔符的情况下,默认使用","作为分隔条符。

  • 相关阅读:
    动态加载JS脚本【转】
    定义并且立即执行JS匿名函数拾遗
    javascript操作ASCII码与字符对转
    win7的mklink命令
    [Yii Framework] How to get the current static page name?
    [Ubuntu] 利用Ubuntu光盘破解win7用户登录 Crark the win7 user via Ubuntu live CD
    [Ubuntu] reload the .bashrc file without logout nor restart.
    [Ubuntu] the permissions of lampp mysql and phpmyadmin
    [Zend PHP5 Cerification] Some note when studying
    [eZ publish] How to modify the $view_parameters valus in the template.
  • 原文地址:https://www.cnblogs.com/a1397240667/p/6812807.html
Copyright © 2011-2022 走看看