zoukankan      html  css  js  c++  java
  • 用python分析短信数据

    原始数据片段展示:

    来电,2017/1/5 上午11:55,95599,【中国农业银行】您尾号9672的农行账户于01051154分完成一笔支付宝交易,金额为-18.00,余额3905.35。,
    来电,2017/1/5 下午12:10,95599,【中国农业银行】您尾号9672的农行账户于01051210分完成一笔现支交易,金额为-200.00,余额3705.35。,
    来电,2017/1/5 下午12:35,95599,【中国农业银行】您尾号9672的农行账户于01051235分完成一笔支付宝交易,金额为-50.00,余额3650.35。,
    来电,2017/1/5 下午1:47,95599,【中国农业银行】您尾号9672的农行账户于01051347分完成一笔支付宝浙交易,金额为-199.00,余额3451.35。,
    来电,2017/1/5 下午2:45,95599,【中国农业银行】您尾号9672的农行账户于01051445分完成一笔消费交易,金额为-199.00,余额3252.35。,
    来电,2017/1/5 下午4:21,95599,【中国农业银行】您尾号9672的农行账户于01051621分完成一笔支付宝浙交易,金额为-329.00,余额2923.35。,
    来电,2017/1/5 下午5:56,95599,【中国农业银行】您尾号9672的农行账户于01051756分完成一笔支付宝交易,金额为-20.00,余额2903.35。,
    来电,2017/1/9 上午10:33,106906615500,【京东】还剩最后两天!PLUS会员新年特权,开通立送2000京豆,独享全品类神券,确定要错过? dc.jd.com/auVjQQ 回TD退订,
    来电,2017/1/10 下午1:10,106980005618000055,【京东】我是京东配送员:韩富韩,您的订单正在配送途中,请准备收货,联系电话:15005125027。,
    来电,2017/1/10 下午3:13,106906615500,【京东】等着放假,忘了您的PLUS账户中还有超过2000待返京豆?现在开通PLUS正式用户即可到账,还可享受高于普通用户10倍的购物回馈,随时京豆拿到手软。另有全年360元运费补贴、专享商品、专属客服等权益。戳 dc.jd.com/XhuKQQ 开通。回TD退订,

    (数据来源-手机短信导出CVS格式)

    目的

    第一阶段的目的:分析基于中国农业银行的短信提醒,基于时间和银行账户余额的一个图表。
    二阶段:想办法表现消费原因,消费金额。
    三阶段:在处理语言方面可以灵活变动,不是简单地切片处理,而是基于处理自然语言的理解文意

    以下是第一阶段的代码。如有问题或建议,欢迎交流!

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    """
    Created on Sun Jul 22 22:13:20 2018
    
    @author: mrzhang
    """
    
    import csv
    import os
    import matplotlib.pyplot as plt
    
    
    class DealMessage:
    
        def __init__(self):
            self.home_path = os.getcwd() # get absolute path
            self.filename = self.home_path + "/message.csv" 
    
        def get_cvs_list(self):
            ''' get data for cvs '''
            with open(self.filename) as f: # open file
                reader = csv.reader(f)
                list_read = list(reader)
            return list_read
    
        def get_yinghang_message_list(self):
            ''' del other data likes name, phone and others '''
            total_list = self.get_cvs_list()
            money_list = []
            for each_line in total_list:
                if each_line[2] == '95599':
                    del each_line[0] # remove useless data
                    del each_line[1]
                    del each_line[2]
                    each_line_list = each_line[1][37:].split(',')
                    each_line_list.insert(0, each_line[0])
                    money_list.append(each_line_list) # add to a new List
            return money_list
    
        def get_type_by_parameter(self, num):
            ''' there are 2 types of data, use len of data to distinguish it '''
            money_list = self.get_yinghang_message_list()
            first_list = []
            for each in money_list:
                if len(each) == num:
                    first_list.append(each)
            return first_list
    
        def deal_time_form(self, messages):
            ''' transform time form like 1995/02/07/02/23 '''
            for each in messages:
                correct_time = each[0].split()
                date = correct_time[0]
                time = correct_time[1]
                time = time[2:]
                shi, feng = time.split(":")
                if time[0:2] == "下":
                    shi = int(shi) + 12
                final_time = date + "/" + str(shi) + "/" + feng
                each.insert(0, final_time)
    
        def choose_message_by_time(self, is_before_0223):
            ''' reduce the difference betwoon different data, deal with time and money at the same time.'''
            if is_before_0223:
                num = 4
                remove_num = 2
            else:
                num = 3
                remove_num = 5
            messages = self.get_type_by_parameter(num)
            for each in messages:
                # deal with time , transform time form like 1995/12/17/02/23 
                correct_time = each[0].split() 
                date = correct_time[0]
                time = correct_time[1]
                time = time[2:]
                shi, feng = time.split(":")
                if time[0:2] == "下": # transform time-form into 24h-form
                    shi = int(shi) + 12
                final_time = date + "/" + str(shi) + "/" + feng
                each.insert(0, final_time)
                # deal with money
                money = each[-1][remove_num:][0:-1]
                each.insert(1, money)
            return messages
    
        def get_x_y(self):
            ''' get money and time  '''
            messages = self.choose_message_by_time(True)+self.choose_message_by_time(False)
            time_list = []
            money_list = []
            for each in messages:
                time_list.append(each[0])
                money_list.append(float(each[1]))
            return time_list[35::3], money_list
    
        def draw_picture(self):
            ''' draw a picture about money change '''
            x, y = self.get_x_y()
            plt.figure(figsize=(16, 4))  # Create figure object
            plt.plot(y, 'r')  # plot‘s paramter(x,y,color,width)
            plt.xlabel("Time")  
            plt.ylabel("Money") 
            plt.title("money")  
            plt.grid(True) 
    
            plt.show()  # show picture
            plt.savefig("line.jpg")  # save picture
    
    m = DealMessage() # get a class object
    m.draw_picture() # draw picture

    程序运行:
    结果图

    随意转载,欢迎交流!

  • 相关阅读:
    Python Flask数据库连接池
    Python Flask 配置文件
    Flask 通过扩展来实现登录验证
    Flask 的系统学习
    Python Flask装饰器登录验证
    Django 批量导入文件
    fedora25的免密码rsync服务配置
    linux强制拷贝避免输入yes方法
    linux系统web站点设置-http基础设置
    rsync用法详细解释
  • 原文地址:https://www.cnblogs.com/zhangnianlei/p/12239274.html
Copyright © 2011-2022 走看看