zoukankan      html  css  js  c++  java
  • Data conversion – the first step towards data processing

    Data conversion – the first step towards data processing

             Convert all string to integers: ranging from 0 to n.

    Age

    continuous.

    Workclass

    Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.

     

    Fnlwgt

    continuous.

    Education

    Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

    education-num

    continuous.

    marital-status

    Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.

    Occupation

    Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.

    Relationship

    Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.

    Race

    White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

    Sex

    Female, Male.

    capital-gain

    continuous.

    capital-loss

    continuous.

    hours-per-week

    continuous.

    native-country

    United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

    I used a python program to deal with it, but when writing codes, especially the array, I find it is a waste of time to add quotation marks.

    So I write a program to help me add the quotation marks:

     1 import time
     2 
     3 #start timing
     4 t1 = time.time()
     5 
     6 #open files
     7 filereader = open('../resource/adult.data', 'r')
     8 filewriter = open('../resource/converted_data.data', 'w')
     9 
    10 #define arraies for conversion
    11 workclass = ['?', 'Private', 'Self-emp-not-inc', 'Self-emp-inc', 'Federal-gov', 'Local-gov', 'State-gov', 'Without-pay', 'Never-worked']
    12 
    13 education = ['?','Bachelors', 'Some-college', '11th', 'HS-grad', 'Prof-school', 'Assoc-acdm', 'Assoc-voc', '9th', '7th-8th', '12th', 'Masters', '1st-4th', '10th', 'Doctorate', '5th-6th', 'Preschool']
    14 
    15 marital_status = ['?','Married-civ-spouse','Divorced','Never-married','Separated','Widowed','Married-spouse-absent','Married-AF-spouse']
    16 
    17 occupation = ['?','Tech-support','Craft-repair','Other-service','Sales','Exec-managerial','Prof-specialty','Handlers-cleaners','Machine-op-inspct','Adm-clerical','Farming-fishing','Transport-moving','Priv-house-serv','Protective-serv','Armed-Forces']
    18 
    19 relationship = ['?','Wife','Own-child','Husband','Not-in-family','Other-relative','Unmarried']
    20 
    21 race = ['?','White','Asian-Pac-Islander','Amer-Indian-Eskimo','Other','Black']
    22 
    23 sex = ['?','Female','Male']
    24 
    25 native_country = ['?','United-States','Cambodia','England','Puerto-Rico','Canada','Germany','Outlying-US(Guam-USVI-etc)','India','Japan','Greece','South','China','Cuba','Iran','Honduras','Philippines','Italy','Poland','Jamaica','Vietnam','Mexico','Portugal','Ireland','France','Dominican-Republic','Laos','Ecuador','Taiwan','Haiti','Columbia','Hungary','Guatemala','Nicaragua','Scotland','Thailand','Yugoslavia','El-Salvador','Trinadad&Tobago','Peru','Hong','Holand-Netherlands']
    26 
    27 isover5K = ['?','>50K', '<=50K']
    28 
    29 #define a 2-dimension array
    30 items = [workclass, education, marital_status, occupation, relationship, race, sex, native_country, isover5K]
    31 
    32 #read file from lines
    33 for eachline in filereader:
    34    
    35     #iterate arraies
    36     for item in items:
    37 
    38         count = 0
    39 
    40         #iterate strings and replace them with integers
    41         for element in item:
    42            
    43             #replace strings with integers
    44             eachline = eachline.replace(element, str(count))
    45 
    46             count += 1
    47 
    48     #write to file
    49     filewriter.write(eachline)
    50    51 
    52 
    53 #close files
    54 filereader.close()
    55 filewriter.close()
    56 
    57 #end timing
    58 t2 = time.time()
    59 
    60 print('done')
    61 print(str(t2 - t1))
  • 相关阅读:
    2018年蓝桥杯java b组第五题
    2018年蓝桥杯java b组第四题
    2018年蓝桥杯java b组第三题
    2018年蓝桥杯java b组第二题
    2018年蓝桥杯ava b组第一题
    java算法基础范例
    2015年蓝桥杯java b组第十题
    第六届蓝桥杯java b组第8题
    MySQL之数据表(五)
    MySQL数据类型(四)
  • 原文地址:https://www.cnblogs.com/johnpher/p/2583634.html
Copyright © 2011-2022 走看看