zoukankan      html  css  js  c++  java
  • 第五章:处理数据

    这个教练是你的好朋友,他记录了四个人的跑步十个跑步时间在四个文件里面

    james.txt ,julie.txt,mikey.txt,sarah.txt

     2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22

    2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21

    2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38

    2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55

    写一个脚本为这四个文件创建一个列表并且在屏幕上面显示出来

    vim chapter5-1.py

    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    with open('james.txt') as jaf:          #打开文件       
            data = jaf.readline()           #读取第一行
            james = data.strip().split(',') #去空格以,为分割符转换成列表赋值给james
    with open('julie.txt') as juf:
            data = juf.readline()
            julie = data.strip().split(',')
    with open('mikey.txt') as mif:
            data = mif.readline()
            mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
            data = saf.readline()
            sarah = data.strip().split(',')
    print (james)
    print (julie)
    print (mikey)
    print (sarah)
    

    执行输出

    [root@VPN chapter5]# python chapter5-1.py
    ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
    ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
    ['2:22', '3.01', '3:01', '3.02', '3:02', '3.02', '3:22', '2.49', '2:38']
    ['2:58', '2.58', '2:39', '2-25', '2-55', '2:54', '2.18', '2:55', '2:55']

    下面想要把时间按升序排列

    在Python里面排序有两种方法

    1,原地排序

    >>> data = [6,3,1,2,4,5]
    >>> data.sort()
    >>> data
    [1, 2, 3, 4, 5, 6]

    2,复制排序

    >>> data
    [1, 2, 3, 4, 5, 6]
    >>> data = [6,3,1,2,4,5]
    >>> data
    [6, 3, 1, 2, 4, 5]
    >>> data2 = sorted(data)
    >>> data
    [6, 3, 1, 2, 4, 5]
    >>> data2
    [1, 2, 3, 4, 5, 6]

    修改以上源代码

    vim chapter5-2.py

    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    with open('james.txt') as jaf:          #打开文件       
            data = jaf.readline()           #读取第一行
            james = data.strip().split(',') #去空格以,为分割符转换成列表赋值给james
    with open('julie.txt') as juf:
            data = juf.readline()
            julie = data.strip().split(',')
    with open('mikey.txt') as mif:
            data = mif.readline()
            mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
            data = saf.readline()
            sarah = data.strip().split(',')
    '''
    print (james)
    print (julie)
    print (mikey)
    print (sarah)
    '''
    print (sorted(james))
    print (sorted(julie))
    print (sorted(mikey))
    print (sorted(sarah))
    

      执行

    [root@VPN chapter5]# python chapter5-2.py
    ['2-22', '2-34', '2.34', '2.45', '2:01', '2:01', '3.01', '3:10', '3:21']
    ['2-23', '2.11', '2.59', '2:11', '2:23', '3-10', '3-21', '3.21', '3:10']
    ['2.49', '2:22', '2:38', '3.01', '3.02', '3.02', '3:01', '3:02', '3:22']
    ['2-25', '2-55', '2.18', '2.58', '2:39', '2:54', '2:55', '2:55', '2:58']

    观察排序第四行2-25排序到2.18前面了,只不符合我们的需求

    是因为数据格式不统一导致分割符需要统一

    光是分隔符还远远不够,因为分割后会把所有成绩成为字符串来保存,Python

    可以对字符串进行排序。短横在点号前面点号在冒号前面,教练数据中的这种

    不一致性导致的排序失败

    下面创建一个函数名为sanitize(),这个函数从各个选手的列表接收一个字符串作为

    输入,然后处理这些字符串。把所有的横线和冒号转换成点,如果已经包含点则不处理

    定义函数然后使用函数处理字符串,把字符串里面包含的-和:全部转换成.

    vim chapter5-3.py

    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    def sanitize(time_string):              #定义函数
            if '-' in time_string:
                    splitter = '-'
            elif ':' in time_string:
                    splitter = ':'          #检查字符串是否有:和-
            else:
                    return(time_string)
            (mins,secs) = time_string.split(splitter)       #分解字符串抽出分和秒
    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    def sanitize(time_string):              #定义函数
            if '-' in time_string:
                    splitter = '-'
            elif ':' in time_string:
                    splitter = ':'          #检查字符串是否有:和-
            else:
                    return(time_string)
            (mins,secs) = time_string.split(splitter)       #分解字符串抽出分和秒
            return(mins + '.' + secs)
    with open('james.txt') as jaf:          #打开文件       
            data = jaf.readline()           #读取第一行
            james = data.strip().split(',') #去空格以,为分割符转换成列表赋值给james
    with open('julie.txt') as juf:
            data = juf.readline()
            julie = data.strip().split(',')
    with open('mikey.txt') as mif:
            data = mif.readline()
            mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
            data = saf.readline()
            sarah = data.strip().split(',')
    clean_james = []
    clean_julie = []
    clean_mikey = []
    clean_sarah = []        #定义新的列表接收排序后的参数
    
    for each_t in james:
            clean_james.append(sanitize(each_t))
    for each_t in julie:
            clean_julie.append(sanitize(each_t))
    for each_t in mikey:
            clean_mikey.append(sanitize(each_t))
    for each_t in sarah:
            clean_sarah.append(sanitize(each_t))
    
    
    print (sorted(clean_james))
    print (sorted(clean_julie))
    print (sorted(clean_mikey))
    print (sorted(clean_sarah))
    

      执行

    [root@VPN chapter5]# python chapter5-3.py
    ['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']
    ['2.11', '2.11', '2.23', '2.23', '2.59', '3.10', '3.10', '3.21', '3.21']
    ['2.22', '2.38', '2.49', '3.01', '3.01', '3.02', '3.02', '3.02', '3.22']
    ['2.18', '2.25', '2.39', '2.54', '2.55', '2.55', '2.55', '2.58', '2.58']

    输出了正确的排序

    但是以上方法会有太多的列表以及迭代,代码会重复。Python提供了一个工具转换列表

    看一下例子

    >>> mins = [1,2,3]
    >>> secs = [m*60 for m in mins]
    >>> secs
    [60, 120, 180]

    分钟转换成秒

    lower = ["I","am","liuyueming"]

    upper = [s.upper() for s in lower]

    >>> upper
    ['I', 'AM', 'LIUYUEMING']

    所有字母转换成大写

    修改代码改成列表推算的方法

    vim chapter5-4.py

    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    def sanitize(time_string):              #定义函数
            if '-' in time_string:
                    splitter = '-'
            elif ':' in time_string:
                    splitter = ':'          #检查字符串是否有:和-
            else:
                    return(time_string)
            (mins,secs) = time_string.split(splitter)       #分解字符串抽出分和秒
            return(mins + '.' + secs)
    with open('james.txt') as jaf:          #打开文件       
            data = jaf.readline()           #读取第一行
            james = data.strip().split(',') #去空格以,为分割符转换成列表赋值给james
    with open('julie.txt') as juf:
            data = juf.readline()
            julie = data.strip().split(',')
    with open('mikey.txt') as mif:
            data = mif.readline()
            mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
            data = saf.readline()
            sarah = data.strip().split(',')
    
    clean_james = [sanitize(t) for t in james]
    clean_julie = [sanitize(t) for t in julie]
    clean_mikey = [sanitize(t) for t in mikey]
    clean_sarah = [sanitize(t) for t in sarah]      #定义新的列表使用推导列表的方法赋值
    
    print (sorted(clean_james))
    print (sorted(clean_julie))
    print (sorted(clean_mikey))
    print (sorted(clean_sarah))
    

      运行结果是一样的但是代码精简了不少

    但是教练想要的结果是去除相同的数据然后取出排名前三的数据

    vim chapter5-5.py

    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    def sanitize(time_string):		#定义函数
    	if '-' in time_string:						
    		splitter = '-'
    	elif ':' in time_string:	
    		splitter = ':'		#检查字符串是否有:和-
    	else:
    		return(time_string)
    	(mins,secs) = time_string.split(splitter)	#分解字符串抽出分和秒
    	return(mins + '.' + secs)	
    with open('james.txt') as jaf:		#打开文件	
    	data = jaf.readline()		#读取第一行
    	james = data.strip().split(',') #去空格以,为分割符转换成列表赋值给james
    with open('julie.txt') as juf:
    	data = juf.readline()
    	julie = data.strip().split(',')
    with open('mikey.txt') as mif:
    	data = mif.readline()
    	mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
    	data = saf.readline()
    	sarah = data.strip().split(',')
    
    clean_james = sorted([sanitize(t) for t in james])
    clean_julie = sorted([sanitize(t) for t in julie])
    clean_mikey = sorted([sanitize(t) for t in mikey])
    clean_sarah = sorted([sanitize(t) for t in sarah])	#定义新的列表使用推导列表的方法赋值
    
    unique_james = []				#新建列表存储去除重复数据以后的数据
    for each_t in clean_james:
    	if each_t not in unique_james:		
    		unique_james.append(each_t)	#如果不在列表中追加到列表中
    print (unique_james[0:3])			#输出前三
    
    unique_julie = []
    for each_t in clean_julie:
    	if each_t not in unique_julie:
    		unique_julie.append(each_t)
    print (unique_julie[0:3])
    	
    unique_mikey = []
    for each_t in clean_mikey:
    	if each_t not in unique_mikey:
    		unique_mikey.append(each_t)
    print (unique_mikey[0:3])
    
    unique_sarah = []
    for each_t in clean_sarah:
    	if each_t not in unique_sarah:
    		unique_sarah.append(each_t)
    print (unique_sarah[0:3])
    

      

    [root@VPN chapter5]# python chapter5-5.py
    ['2.01', '2.22', '2.34']
    ['2.11', '2.23', '2.59']
    ['2.22', '2.38', '2.49']
    ['2.18', '2.25', '2.39']

    去除了排序以后重复的数据然后取出前三的数据了

    这里使用了一个逻辑新建一个列表来存储去重的数据,有没有办法直接去重呢

    python提供了一个内置函数集合来去重

    vim chapter5-6.py

    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    def sanitize(time_string):		#定义函数
    	if '-' in time_string:						
    		splitter = '-'
    	elif ':' in time_string:	
    		splitter = ':'		#检查字符串是否有:和-
    	else:
    		return(time_string)
    	(mins,secs) = time_string.split(splitter)	#分解字符串抽出分和秒
    	return(mins + '.' + secs)	
    with open('james.txt') as jaf:		#打开文件	
    	data = jaf.readline()		#读取第一行
    	james = data.strip().split(',') #去空格以,为分割符转换成列表赋值给james
    with open('julie.txt') as juf:
    	data = juf.readline()
    	julie = data.strip().split(',')
    with open('mikey.txt') as mif:
    	data = mif.readline()
    	mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
    	data = saf.readline()
    	sarah = data.strip().split(',')
    
    
    
    print (sorted(set([sanitize(t) for t in james]))[0:3])		#输出前三
    print (sorted(set([sanitize(t) for t in julie]))[0:3])		#输出前三
    print (sorted(set([sanitize(t) for t in mikey]))[0:3])		#输出前三
    print (sorted(set([sanitize(t) for t in sarah]))[0:3])		#输出前三

    [root@VPN chapter5]# python chapter5-6.py
    ['2.01', '2.22', '2.34']
    ['2.11', '2.23', '2.59']
    ['2.22', '2.38', '2.49']
    ['2.18', '2.25', '2.39']

    输出结果一样但是代码又精简了不少

    使用一个函数来代替with

    vim chapter5-7.py

    #!/usr/bin/python
    # -*- coding:utf-8 -*-
    def sanitize(time_string):              #定义函数
            if '-' in time_string:
                    splitter = '-'
            elif ':' in time_string:
                    splitter = ':'          #检查字符串是否有:和-
            else:
                    return(time_string)
            (mins,secs) = time_string.split(splitter)       #分解字符串抽出分和秒
            return(mins + '.' + secs)
    def get_coach_data(filename):
            try:
                    with open(filename) as f:
                            data = f.readline()
                    return(data.strip().split(','))
            except IOError as ioerr:
                    print('File error:' + str(ioerr))
                    return(None)
    
    james = get_coach_data('james.txt')
    julie = get_coach_data('julie.txt')
    mikey = get_coach_data('mikey.txt')
    sarah = get_coach_data('sarah.txt')
    
    print (sorted(set([sanitize(t) for t in james]))[0:3])          #输出前三
    print (sorted(set([sanitize(t) for t in julie]))[0:3])          #输出前三
    print (sorted(set([sanitize(t) for t in mikey]))[0:3])          #输出前三
    print (sorted(set([sanitize(t) for t in sarah]))[0:3])          #输出前三
    

      输出结果是一样的,代码又精简了不少

    [root@VPN chapter5]# python chapter5-7.py
    ['2.01', '2.22', '2.34']
    ['2.11', '2.23', '2.59']
    ['2.22', '2.38', '2.49']
    ['2.18', '2.25', '2.39']

  • 相关阅读:
    vmware虚拟机安装centos,配置PHP、mysql
    Java初学者不得不知的概念,JDK,JRE,JVM的区别?(转)
    char a[] = "hello world1"和char *p = "hello world2";的区别(转)
    关于二维数组传参做形参(转)
    最长连续字母序列的长度(阿里2015在线研发工程师笔试题)
    两个线程并发执行以下代码,假设a是全局变量,那么以下输出______是不可能的?
    软件工程
    面向对象基础
    eclipse
    设计模式(java)--状态模式
  • 原文地址:https://www.cnblogs.com/minseo/p/6754547.html
Copyright © 2011-2022 走看看