zoukankan      html  css  js  c++  java
  • 大作业 数据清洗(去除重复数据,缺省值分析)

    今日添加了数据清洗的部分,对原先的结构进行了修改。

    在查看已创建表部分增加了查看已清洗表:

    现在还没表,数据清洗部分写完后就有表了,已经清洗表的操作与原始数据表一样:这一部分的代码实现重用,主要靠表名与数据库来区分原数据表与已清洗数据表。

    今天主要完成了数据清洗的重复值去除与缺省值统计的功能:

    先显示可清洗的表

    点击开始清洗 ,先显示原始数据

     然后显示重复值,此时数据的重复值已经去除

    然后显示缺省值的统计信息:

    点击查看缺省值可以查看缺省值:

    涉及的html代码较多,只把数据清洗的代码粘了过来

    #数据清洗部分
    #重复值
    @app.route('/data_clean_scame')
    def data_clean_scame():
        table_name = request.values.get("table_name")
        database_name = request.values.get("database_name")
        #data_came重复数据,data_remove_came去重后数据 都为pd类型
        data_came, data_remove_came=dataclean.data_clean_came(table_name,database_name)
        global  data_clean
        data_clean=data_remove_came#去重后的数据保存,为以后处理缺省值
        print("去重后数据")
        print(data_clean.head())
        num_0 = data_came.shape[0]#行数
        num_1 = data_came.shape[1]#keys数
        data_list = [];#json数组
        for i in range(num_0):
            json_data = {}
            for j in range(num_1):
                json_data[data_came.keys()[j]] = data_came.values[i][j]
            data_list.append(json_data)
        return jsonify({"code": 0, "msg": "", "count": num_0, "data": data_list})
    #缺省值
    @app.route('/data_clean_nan')
    def data_clean_nan():
        num_0 = data_clean.shape[0]
        num_1 = data_clean.shape[1]
        data_nan=[]
        json_data = {}
        for i in range(num_1):
            flag = 0
            for j in range(num_0):
                if (data_clean.values[j][i] == ''):
                    flag = flag + 1
            data_nan.append({"keys":data_clean.keys()[i],"num":flag})
        return jsonify({"code": 0, "msg": "", "count": num_1, "data": data_nan})
    #查看缺省值
    @app.route('/get_data_clean_nan')
    def get_data_clean_nan():
        num_0 = data_clean.shape[0]
        num_1 = data_clean.shape[1]
        data = []
        flag=0
        for i in range(num_0):
            json_list = {}
            for j in range(num_1):
                if (data_clean.values[i][j] == ''):
                    for k in range(num_1):
                        json_list[data_clean.keys()[k]] = data_clean.values[i][k]
                    data.append(json_list)
                    flag=flag+1
                    break
        return jsonify({"code": 0, "msg": "", "count": flag, "data": data})
    #数据清洗部分
    def data_clean_came(table_name,database_name):
        conn,cursor=pymysql_conn(database_name)
        qu_sql = "SELECT * FROM "+ table_name
        df = pd.read_sql_query(qu_sql, conn)
        close_conn_mysql(conn,cursor)
        data1=df.drop_duplicates(keep=False)
        data2=df.drop_duplicates(keep='first')
        data_came=data2.append(data1).drop_duplicates(keep=False)
        data_remove_came=df.drop_duplicates()
        return data_came,data_remove_came
        pass
  • 相关阅读:
    334 Increasing Triplet Subsequence 递增的三元子序列
    332 Reconstruct Itinerary 重建行程单
    331 Verify Preorder Serialization of a Binary Tree 验证二叉树的前序序列化
    330 Patching Array
    329 Longest Increasing Path in a Matrix 矩阵中的最长递增路径
    328 Odd Even Linked List 奇偶链表
    327 Count of Range Sum 区间和计数
    326 Power of Three 3的幂
    Java中的Class.forName
    巧用Java中Calendar工具类
  • 原文地址:https://www.cnblogs.com/fengchuiguobanxia/p/15665178.html
Copyright © 2011-2022 走看看