zoukankan      html  css  js  c++  java
  • Python 如何在csv中定位非数字和字母的符号

    在数据清洗过程中,有时不仅希望去掉脏数据,更希望定位脏数据的位置,例如从csv里面定位非数字和字母单元格的位置,在使用isdigit()、isalpha()、isalnum()时无法判断浮点数,会将浮点数都判断为特殊符号。

    以下为样例数据,希望定位特殊符号的位置。

    实现代码为:

    # -*- coding: utf-8 -*-
    """
    Created on Tue Dec  6 14:37:12 2016
    
    @author: user
    """
    
    import csv
    import re
    
    csv_reader = csv.reader(open('D:/工作文件夹/Pyhton/20081003.csv',encoding = 'utf-8'))
    rows = 0
    
    #方法一、此方法可用于输出所有数值,过滤非数值(反之亦然成立)
    '''
    def is_a_num(string):
        try:
            float(string)#return float(string)
        except:
            return string#return ''
            
    for row in csv_reader:
        if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
            rows += 1
            columns = 0
            for Factor in row[0:]:
                if is_a_num(Factor) and Factor != '':
    #            if not Factor.isalnum() and Factor != '' :
                    columns += 1
                    print(rows,columns,Factor)
    '''                
    #方法二                
    for row in csv_reader:
        if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
            rows += 1
            columns = 0
            for Factor in row[0:]:
                if re.match("[.0-9A-Z]+$", Factor) == None and Factor != '':
    #            if not Factor.isalnum() and Factor != '' :
                    columns += 1
                    print(rows,columns,Factor)

    其中,re.match为正则表达式:

    re.match的函数原型为:re.match(pattern, string, flags)

    第一个参数是正则表达式,这里为"[.0-9A-Z]+$",匹配[]中的任何字符至少1次,如果匹配成功,则返回一个Match,否则返回一个None;

    第二个参数表示要匹配的字符串;

    第三个参数是标致位,用于控制正则表达式的匹配方式,如:是否区分大小写,多行匹配等等。

  • 相关阅读:
    [原] Code Color Scheme
    [转] 13款开源Java大数据工具,从理论到实践的剖析
    如何在Web页面上直接打开、编辑、创建Office文档 (转)
    自己用VS2008写的数据库操作包装类
    可以用ORACLE的临时表
    ASP.net中动态加载控件时一些问题的总结(转)
    Infragistics.WebUI.WebCombo的用法
    oracle中创建表的一种方法
    oracle中插入一个blob数据
    中国人正在上的四个当
  • 原文地址:https://www.cnblogs.com/matrixworld/p/6141458.html
Copyright © 2011-2022 走看看