新系统上线后,需要导入历史数据,但是旧数据格式,数据缺失,数据错误,奇异值,属性归类与新系统有很大的gap。因此我们需要建立一套数据动态清洗规则给Salesforce系统,通过这些规则自动清洗导入数据,清洗规则可以让function自己配置。而不需要IT负责
下面将详细举一个例子如何在salesforce中做数据处理。数据清洗需要分成5个步骤
1,建立2个关联数据的Object的和 一个数据清洗后台设置的Object的
2,数据导入页面csv
3,定义每个字段的范围、属性,如果是错误的则自动重新分配,或者修改成临近值
4,数据清洗合并。
5,导出错误数据到Excel
第一步,新建立两个关联的Recruit 和 Recruit Department, 并且建立一个清洗规则的Object,当导入数据后我们可以读取设置的清洗规则,并对导入的数据进行清洗
第二步,对于清洗规则,我们只能有一条规则被激活,因此我们在插入新规则和更改旧规则的时候,我们需要添加一个tirgger针对Data_Washing_Setting,保证规则的唯一性。
1 trigger IsActiveChecking on Data_Washing_Setting__c (before insert,before update) { 2 3 List<Data_Washing_Setting__c> ListOldData =[select Id from Data_Washing_Setting__c 4 where Active_this_Rule__c = true]; 5 List<Data_Washing_Setting__c> ListNewData =trigger.new; 6 7 //system.debug('ListNewData:'+ListNewData.size()); 8 integer itemNum = 0; 9 if(trigger.isInsert) 10 { 11 if(trigger.isBefore) 12 { 13 for(Data_Washing_Setting__c dws : trigger.new) 14 { 15 if(dws.Active_this_Rule__c) 16 { 17 itemNum++; 18 } 19 } 20 itemNum +=ListOldData.size(); 21 22 if(itemNum>1) 23 { 24 for(Data_Washing_Setting__c dws : trigger.new){ 25 dws.adderror('only one record can be actived! pls check your history data and try again.'); 26 } 27 } 28 } 29 } 30 else if(trigger.isUpdate) 31 { 32 if(trigger.isBefore) 33 { 34 // 去掉更新的数据 35 for(Data_Washing_Setting__c dws : trigger.new) 36 { 37 for(integer i=0;i<ListOldData.size();i++){ 38 if(dws.Id== ListOldData[i].Id) 39 { 40 ListOldData.remove(i); 41 } 42 } 43 if(dws.Active_this_Rule__c) 44 { 45 itemNum++; 46 } 47 } 48 itemNum +=ListOldData.size(); 49 if(itemNum>1) 50 { 51 for(Data_Washing_Setting__c dws : trigger.new){ 52 dws.adderror('only one record can be actived! pls check your history data and try again.'); 53 } 54 } 55 56 } 57 } 58 }
第三步,我们需要建立导入页面,并添加相应的验证按钮
VF的代码
1 <apex:page controller="BatchInsertByCsvController"> 2 <apex:form > 3 <apex:sectionHeader title="Upload Recruit Data"/> 4 <apex:pageMessages /> 5 <apex:pageblock > 6 <center> 7 <apex:inputFile value="{!contentFile}" fileName="{!fileName}" /> 8 <apex:commandButton action="{!LoadData}" value="Batch Insert"/> 9 <apex:commandButton action="{!LoadBlankList}" value="Filter Blank Data"/> 10 <apex:commandButton action="{!ExportBlankToCSV}" value="Export CSV"/> 11 12 </center> 13 </apex:pageblock> 14 <apex:pageBlock title="Import Data"> 15 <apex:pageblocktable value="{!RecruitList}" var="ReList"> 16 <apex:column value="{!ReList.Name}" /> 17 <apex:column value="{!ReList.Position_Name__c}" /> 18 <apex:column value="{!ReList.Recruit_Department__c}" /> 19 <apex:column value="{!ReList.Recruit_Type__c}" /> 20 <apex:column value="{!ReList.Recruit_Number__c}" /> 21 </apex:pageblocktable> 22 </apex:pageBlock> 23 <apex:pageBlock title="Blank Data"> 24 <apex:pageblocktable value="{!BlankList}" var="BList"> 25 <apex:column value="{!BList.Name}" /> 26 <apex:column value="{!BList.Position_Name__c}" /> 27 <apex:column value="{!BList.Recruit_Department__c}" /> 28 <apex:column value="{!BList.Recruit_Type__c}" /> 29 <apex:column value="{!BList.Recruit_Number__c}" /> 30 </apex:pageblocktable> 31 </apex:pageBlock> 32 </apex:form> 33 </apex:page>
后台APEX 导入代码
1 public class BatchInsertByCsvController { 2 3 public string fileName{get;set;} 4 //Blob:二进制对象类型。通过inputFile选中后的文件在后台获取的时候是一个Blob类型, 5 public Blob contentFile{get;set;} 6 public String[] filelines = new String[]{}; 7 public List<Recruit__c> RecruitList{get;set;} 8 public List<Recruit__c> BlankList{get;set;} 9 public List<Recruit__c> invaildList{get;set;} 10 //初始化 11 public PageReference LoadData() 12 { 13 try{ 14 filename = bitToString(contentFile,'ISO-8859-1'); 15 filelines = fileName.split(' '); 16 // ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'import account:'+filelines.size()); 17 // ApexPages.addMessage(msgs); 18 RecruitList = new List<Recruit__c>(); 19 string[] inputvalues; 20 string SwpNumber; 21 22 for(Integer i=1;i<filelines.size();i++) 23 { 24 inputvalues = new string[]{}; 25 inputvalues = filelines[i].split(','); 26 Recruit__c recruits = new Recruit__c(); 27 recruits.Name = inputvalues[0]; 28 recruits.Position_Name__c = inputvalues[1]; 29 recruits.Recruit_Department__c = [SELECT Id 30 FROM Recruit_Department__c 31 WHERE Name =:inputvalues[2] LIMIT 1].Id; 32 recruits.Recruit_Type__c = inputvalues[3]; 33 SwpNumber = inputvalues[4]; 34 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim()); 35 RecruitList.add(recruits); 36 } 37 } 38 catch(exception e){ 39 ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage()); 40 ApexPages.addMessage(errormsg); 41 } 42 try{ 43 // insert RecruitList; 44 // ApexPages.Message successMsg = new ApexPages.Message(ApexPages.severity.INFO,'import success'); 45 // ApexPages.addMessage(successMsg); 46 } 47 catch(Exception e) 48 { 49 //ApexPages.Message errormsg = new ApexPages.Message(ApexPages.severity.ERROR,'An error has occured inserting the records'+e.getMessage()); 50 //ApexPages.addMessage(errormsg); 51 } 52 return null; 53 } 54 //blob是二进制存储的,String是16进制存储的,所以使用此种方式加上编码解码等操作肯定会更加适应,包括中文 55 private String bitToString(Blob input, String inCharset){ 56 //转换成16进制 57 String hex = EncodingUtil.convertToHex(input); 58 //一个String类型两个字节 32位(bit),则一个String长度应该为两个16进制的长度,所以此处向右平移一个单位,即除以2 59 //向右平移一个单位在正数情况下等同于除以2,负数情况下不等 60 //eg 9 00001001 >>1 00000100 结果为4 61 final Integer bytesCount = hex.length() >> 1; 62 //声明String数组,长度为16进制转换成字符串的长度 63 String[] bytes = new String[bytesCount]; 64 for(Integer i = 0; i < bytesCount; ++i) { 65 //将相邻两位的16进制字符串放在一个String中 66 bytes[i] = hex.mid(i << 1, 2); 67 } 68 //解码成指定charset的字符串 69 return EncodingUtil.urlDecode('%' + String.join(bytes, '%'), inCharset); 70 } 71 //筛选空值 72 public PageReference LoadBlankList() 73 { 74 try 75 { 76 BlankList=new list<Recruit__c>(); 77 DataWashingSetting dws=new DataWashingSetting(); 78 string[] flines = dws.AddQuestionsData(filelines); 79 string[] inputvalues; 80 string SwpNumber; 81 82 for(Integer i=0;i<flines.size();i++) 83 { 84 inputvalues = new string[]{}; 85 inputvalues = flines[i].split(','); 86 Recruit__c recruits = new Recruit__c(); 87 recruits.Name = inputvalues[0]; 88 recruits.Position_Name__c = inputvalues[1]; 89 recruits.Recruit_Department__c = [SELECT Id 90 FROM Recruit_Department__c 91 WHERE Name =:inputvalues[2] LIMIT 1].Id; 92 recruits.Recruit_Type__c = inputvalues[3]; 93 SwpNumber = inputvalues[4]; 94 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim()); 95 BlankList.add(recruits); 96 } 97 ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'blank num:'+BlankList.size()); 98 ApexPages.addMessage(msgs); 99 } 100 catch(Exception e) 101 { 102 ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage()); 103 ApexPages.addMessage(errormsg); 104 } 105 return null; 106 } 107 public PageReference ExportBlankToCSV() 108 { 109 return new PageReference('/apex/ExportCSV'); 110 } 111 }
后台调用的验证清洗代码,可以根据需要任意添加
1 public class DataWashingSetting { 2 3 //消除重复数据 4 public List<Recruit__c> DelDuplicateData(List<Recruit__c> OriginalList) 5 { 6 set<Recruit__c> myset= new set<Recruit__c>(); 7 List<Recruit__c> result = new List<Recruit__c>(); 8 9 myset.addAll(OriginalList); 10 result.addAll(myset); 11 12 return result; 13 } 14 //筛选为空数据 15 public string[] AddQuestionsData(string[] filelines) 16 { 17 string[] result =new string[]{}; 18 string[] inputvalues; 19 for(Integer i=1;i<filelines.size();i++) 20 { 21 inputvalues = new string[]{}; 22 inputvalues = filelines[i].split(','); 23 if(inputvalues[0] == ''||inputvalues[1] == '' ||inputvalues[2] == '' 24 ||inputvalues[3] == '' ||inputvalues[4] == '') 25 { 26 result.add(filelines[i]); 27 } 28 } 29 return result; 30 } 31 //检测各个字段的合理性 32 public string[] CheckFiled(string[] filelines) 33 { 34 //读取规则 35 Data_Washing_Setting__c dws = [select Position_Name_Rule__c, 36 Recruit_End_Number__c,Recruit_Department_Rule__c,Recruit_Start_Number__c from Data_Washing_Setting__c where Active_this_Rule__c = true]; 37 string PositionNameRule = dws.Position_Name_Rule__c; //部门规则是否允许重复 38 decimal startNumber= dws.Recruit_Start_Number__c; //招聘人数底线 39 decimal endNumber= dws.Recruit_End_Number__c; //招聘人数上线 40 string department = dws.Recruit_Department_Rule__c;//部门限制 41 42 string[] result =new string[]{}; 43 string[] inputvalues; 44 for(Integer i=1;i<filelines.size();i++) 45 { 46 inputvalues = new string[]{}; 47 inputvalues = filelines[i].split(','); 48 //填写验证代码 49 } 50 return result; //返回不合格代码 51 } 52 }
出现问题数据直接导出问题数据到Excel,手动处理后再导入。
1 <apex:page controller="BatchInsertByCsvController" cache="true" contentType="application/x-excel# BlankList.xls" showHeader="false"> 2 <head> 3 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /> 4 </head> 5 <apex:pageBlock > 6 <apex:pageblocktable value="{!BlankList}" var="BList"> 7 <apex:column value="{!BList.Name}" /> 8 <apex:column value="{!BList.Position_Name__c}" /> 9 <apex:column value="{!BList.Recruit_Department__c}" /> 10 <apex:column value="{!BList.Recruit_Type__c}" /> 11 <apex:column value="{!BList.Recruit_Number__c}" /> 12 </apex:pageblocktable> 13 </apex:pageBlock> 14 </apex:page>
下面就是最终效果:
1,导入数据,自动筛选有缺失值的数据,并支持Excel导出
2,后台清洗的规则设置。