zoukankan      html  css  js  c++  java
  • 使用Django清理数据库中的数据

    数据库,数据清洗

    问题叙述性说明:在系统我用在,因为历史和由于各种原因,原因记录的数据内的数据库表,有一个问题,有反复和不完整的数据
    解:首先。由于数据量还是挺大的,工的清理肯定不行,
    然后,我就想写SQL脚本来依照约定的规则进行更新,能够利用游标
    来完毕表中的记录的遍历,可是SQL是面向结构化的查询语言,不是面向过程的。所以尽管能够可是没有C和python这种面向过程的使用方便,
    后来。我想直接在我的项目中新建一个方法。然后通过浏览器的地址栏来调用。就能够了。

    PS:尽管说Django的orm非常方便,可是自己使用起来还是非常的尴尬,一些筛选条件和语法规则。我还是得在网上查找样例才知道怎么用
    幸好提供了直接运行SQL语句方法,我在清理的过程中。用的就是运行原生的SQL语句。


    代码例如以下:
       
    def datetimestr():
        return datetime.now().strftime('%Y%m%d-%H%M%S')+'>>>'   
    
    def update_dcData(req):
        
        log_path='apps/dc/l10n_reports'
        update_dcData_log=open(log_path+'updateDcdataLog.log','w+')
    
        sql_getProjectIDs='select a.project_id from' 
                           ' (select count(*) num,project_id from dc_data where lableName=%s group by project_id) a,' 
                           ' (select count(*) num ,project_id from management_project_target_lang group by project_id) b' 
                           ' where a.project_id=b.project_id and a.num!=b.num order by project_id'
        
        sql_getAllProjectIDs='select project_id from management_project'
        
        sql_getLanguageSections='select b.name from management_project_target_lang a,management_l10nlanguage b' 
                                ' where a.project_id=%s and a.l10nlanguage_id=b.id and b.id!="1"'
                                
        sql_getRecords='select id, languageSection,value from dc_data where lableName =%s and project_id=%s and important="1"'
        
        sql_addRecordByID='insert into dc_data(lableName,languageSection,type,value,project_id,task_id,' 
                    'important,unit,settlement,workload) ' 
                    'select lableName,languageSection,type,value,project_id,task_id,important,unit,settlement,workload ' 
                    'from dc_data where id=%s' 
        sql_updateLgs='update dc_data set languageSection=%s where id=%s'
        
        sql_getLableNames='select lableName from dc_data where lableName like "%%_all" group by lableName'
        
        update_dcData_log.write(datetimestr()+'sql_getLableNames'+'>>>'+sql_getLableNames+'
    ')
        update_dcData_log.write(datetimestr()+'sql_getRecords'+'>>>'+sql_getRecords+'
    ')
        update_dcData_log.write(datetimestr()+'sql_getProjectIDs'+'>>>'+sql_getProjectIDs+'
    ')
        update_dcData_log.write(datetimestr()+'sql_getLanguageSections'+'>>>'+sql_getLanguageSections+'
    ')    
        update_dcData_log.write(datetimestr()+'sql_addRecordByID'+'>>>'+sql_addRecordByID+'
    ')    
        update_dcData_log.write(datetimestr()+'sql_updateLgs'+'>>>'+sql_updateLgs+'
    ')    
        context=Context({'msg':'Success'})
        resp=render_to_response("report/clean_data.html", context, 
                                  context_instance=RequestContext(req))    
        cursor=connection.cursor()
        try:
            cursor.execute(sql_getLableNames)
            lableNames=cursor.fetchall()
        except Exception,e:
            update_dcData_log.write(datetimestr()+'execute sql_getLableNames error '+str(e)+'
    ')
            context=Context({'msg':'Error'})
            return render_to_response("report/clean_data.html", context, 
                                  context_instance=RequestContext(req)) 
        for lableName in lableNames:
            try:
                cursor.execute(sql_getProjectIDs,[lableName[0]])
                projectIDs=cursor.fetchall()
            except Exception,e:
                update_dcData_log.write(datetimestr()+'execute sql_getProjectIDs error '+str(e)+'
    ')
                context=Context({'msg':'Error'})
                return render_to_response("report/clean_data.html", context, 
                                  context_instance=RequestContext(req)) 
            for pid in projectIDs:
                try:
                    cursor.execute(sql_getRecords,[lableName[0],str(pid[0])])
                    records=cursor.fetchall()
                    cursor.execute(sql_getLanguageSections,[str(pid[0])])
                    languageSections=cursor.fetchall()
                except Exception,e:
                    update_dcData_log.write(datetimestr()+'execute sql_getRecords or sql_getLanguageSections error '+str(e)+'
    ')
                    context=Context({'msg':'Error'})
                    return render_to_response("report/clean_data.html", context, 
                                  context_instance=RequestContext(req)) 
                values,lgs=[],[]
                baseValue=str(records[0][2])
                baseID=str(records[0][0])
                for item in records:
                    lgs.append(str(item[1]))
                    values.append(str(item[2]))
                    if baseValue!=str(item[2]):
                        baseValue='false'
                targetLgs=[str(item[0]) for item in languageSections]
                if len(lgs)<1 or len(targetLgs)<1:
                    baseValue=='false'
                if 'all' not in lgs:
                    try:
                        cursor.execute(sql_addRecordByID,[baseID])
                        cursor.execute(sql_updateLgs,['all',baseID])
                        transaction.commit_unless_managed()
                    except Exception,e:
                        update_dcData_log.write(datetimestr()+'execute sql_addRecordByID or sql_updateLgs error (all)'+str(e)+'
    ')
                        context=Context({'msg':'Error'})
                        return render_to_response("report/clean_data.html", context, 
                                      context_instance=RequestContext(req)) 
                            
                        update_dcData_log.write(datetimestr()+"all record is add into dc_data,the lableName and projectID were "+str(lableName[0])+'-'+str(pid[0])+'
    ')
                
                if baseValue=='false':
                    update_dcData_log.write(datetimestr()+"please update this record mutually,the lableName and projectID were "+str(lableName[0])+'-'+str(pid[0])+'
    ')
                else:
                    if len(lgs)>len(targetLgs):
                        update_dcData_log.write(datetimestr()+"the lableName languageSection length is longer than target numbers lableName and projectID were "+str(lableName[0])+'-'+str(pid[0])+'
    ')
                    else:
                        for lg in targetLgs:
                            if lg not in lgs:
                                try:
                                    cursor.execute(sql_addRecordByID,[baseID])
                                    cursor.execute(sql_updateLgs,[lg,baseID])
                                    transaction.commit_unless_managed()
                                except Exception,e:
                                    update_dcData_log.write(datetimestr()+'execute sql_addRecordByID or sql_updateLgs error (lg)'+str(e)+'
    ')
                                    context=Context({'msg':'Error'})
                                    return render_to_response("report/clean_data.html", context, 
                                      context_instance=RequestContext(req)) 
                                    
                                update_dcData_log.write(datetimestr()+lg+" record is add into dc_data,the lableName and projectID were "+str(lableName[0])+'-'+str(pid[0])+'
    ')
                            
                        
        update_dcData_log.close() 
                
                
        return  resp   



  • 相关阅读:
    java.util.zip.ZipException:error in opening zip file
    Error loading WebappClassLoader
    J2EE objectcaching frameworks
    【KMS】Cannot forward a response that is already committed
    Web service是什么?
    理解JNDI中 java:comp/env/jdbc/datasource 与 jdbc/datasource 的不同之处
    生活中的MVC架构
    云计算基础交付计算资源的另一种方式
    通俗易懂云计算
    上海联通:释放云的力量
  • 原文地址:https://www.cnblogs.com/mengfanrong/p/5046017.html
Copyright © 2011-2022 走看看