zoukankan      html  css  js  c++  java
  • 92、R语言分析案例

    1、读取数据

    > bank=read.table("bank-full.csv",header=TRUE,sep=";")
    > 

    2、查看数据结构

    > bank=read.table("bank-full.csv",header=TRUE,sep=",")
    > str(bank)
    'data.frame':    41188 obs. of  21 variables:
     $ age           : int  56 57 37 40 56 45 59 41 24 25 ...
     $ job           : Factor w/ 12 levels "admin.","blue-collar",..: 4 8 8 1 8 8 1 2 10 8 ...
     $ marital       : Factor w/ 4 levels "divorced","married",..: 2 2 2 2 2 2 2 2 3 3 ...
     $ education     : Factor w/ 8 levels "basic.4y","basic.6y",..: 1 4 4 2 4 3 6 8 6 4 ...
     $ default       : Factor w/ 3 levels "no","unknown",..: 1 2 1 1 1 2 1 2 1 1 ...
     $ housing       : Factor w/ 3 levels "no","unknown",..: 1 1 3 1 1 1 1 1 3 3 ...
     $ loan          : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 3 1 1 1 1 1 ...
     $ contact       : Factor w/ 2 levels "cellular","telephone": 2 2 2 2 2 2 2 2 2 2 ...
     $ month         : Factor w/ 10 levels "apr","aug","dec",..: 7 7 7 7 7 7 7 7 7 7 ...
     $ day_of_week   : Factor w/ 5 levels "fri","mon","thu",..: 2 2 2 2 2 2 2 2 2 2 ...
     $ duration      : int  261 149 226 151 307 198 139 217 380 50 ...
     $ campaign      : int  1 1 1 1 1 1 1 1 1 1 ...
     $ pdays         : int  999 999 999 999 999 999 999 999 999 999 ...
     $ previous      : int  0 0 0 0 0 0 0 0 0 0 ...
     $ poutcome      : Factor w/ 3 levels "failure","nonexistent",..: 2 2 2 2 2 2 2 2 2 2 ...
     $ emp.var.rate  : num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
     $ cons.price.idx: num  94 94 94 94 94 ...
     $ cons.conf.idx : num  -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 ...
     $ euribor3m     : num  4.86 4.86 4.86 4.86 4.86 ...
     $ nr.employed   : num  5191 5191 5191 5191 5191 ...
     $ y             : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

    3、查看摘要统计量

    > summary(bank)
          age                 job            marital                    education    
     Min.   :17.00   admin.     :10422   divorced: 4612   university.degree  :12168  
     1st Qu.:32.00   blue-collar: 9254   married :24928   high.school        : 9515  
     Median :38.00   technician : 6743   single  :11568   basic.9y           : 6045  
     Mean   :40.02   services   : 3969   unknown :   80   professional.course: 5243  
     3rd Qu.:47.00   management : 2924                    basic.4y           : 4176  
     Max.   :98.00   retired    : 1720                    basic.6y           : 2292  
                     (Other)    : 6156                    (Other)            : 1749  
        default         housing           loan            contact          month      
     no     :32588   no     :18622   no     :33950   cellular :26144   may    :13769  
     unknown: 8597   unknown:  990   unknown:  990   telephone:15044   jul    : 7174  
     yes    :    3   yes    :21576   yes    : 6248                     aug    : 6178  
                                                                       jun    : 5318  
                                                                       nov    : 4101  
                                                                       apr    : 2632  
                                                                       (Other): 2016  
     day_of_week    duration         campaign          pdays          previous    
     fri:7827    Min.   :   0.0   Min.   : 1.000   Min.   :  0.0   Min.   :0.000  
     mon:8514    1st Qu.: 102.0   1st Qu.: 1.000   1st Qu.:999.0   1st Qu.:0.000  
     thu:8623    Median : 180.0   Median : 2.000   Median :999.0   Median :0.000  
     tue:8090    Mean   : 258.3   Mean   : 2.568   Mean   :962.5   Mean   :0.173  
     wed:8134    3rd Qu.: 319.0   3rd Qu.: 3.000   3rd Qu.:999.0   3rd Qu.:0.000  
                 Max.   :4918.0   Max.   :56.000   Max.   :999.0   Max.   :7.000  
                                                                                  
            poutcome      emp.var.rate      cons.price.idx  cons.conf.idx  
     failure    : 4252   Min.   :-3.40000   Min.   :92.20   Min.   :-50.8  
     nonexistent:35563   1st Qu.:-1.80000   1st Qu.:93.08   1st Qu.:-42.7  
     success    : 1373   Median : 1.10000   Median :93.75   Median :-41.8  
                         Mean   : 0.08189   Mean   :93.58   Mean   :-40.5  
                         3rd Qu.: 1.40000   3rd Qu.:93.99   3rd Qu.:-36.4  
                         Max.   : 1.40000   Max.   :94.77   Max.   :-26.9  
                                                                           
       euribor3m      nr.employed     y        
     Min.   :0.634   Min.   :4964   no :36548  
     1st Qu.:1.344   1st Qu.:5099   yes: 4640  
     Median :4.857   Median :5191              
     Mean   :3.621   Mean   :5167              
     3rd Qu.:4.961   3rd Qu.:5228              
     Max.   :5.045   Max.   :5228             
    > psych::describe(bank)
                   vars     n    mean     sd  median trimmed    mad     min     max
    age               1 41188   40.02  10.42   38.00   39.30  10.38   17.00   98.00
    job*              2 41188    4.72   3.59    3.00    4.48   2.97    1.00   12.00
    marital*          3 41188    2.17   0.61    2.00    2.21   0.00    1.00    4.00
    education*        4 41188    4.75   2.14    4.00    4.88   2.97    1.00    8.00
    default*          5 41188    1.21   0.41    1.00    1.14   0.00    1.00    3.00
    housing*          6 41188    2.07   0.99    3.00    2.09   0.00    1.00    3.00
    loan*             7 41188    1.33   0.72    1.00    1.16   0.00    1.00    3.00
    contact*          8 41188    1.37   0.48    1.00    1.33   0.00    1.00    2.00
    month*            9 41188    5.23   2.32    5.00    5.31   2.97    1.00   10.00
    day_of_week*     10 41188    3.00   1.40    3.00    3.01   1.48    1.00    5.00
    duration         11 41188  258.29 259.28  180.00  210.61 139.36    0.00 4918.00
    campaign         12 41188    2.57   2.77    2.00    1.99   1.48    1.00   56.00
    pdays            13 41188  962.48 186.91  999.00  999.00   0.00    0.00  999.00
    previous         14 41188    0.17   0.49    0.00    0.05   0.00    0.00    7.00
    poutcome*        15 41188    1.93   0.36    2.00    2.00   0.00    1.00    3.00
    emp.var.rate     16 41188    0.08   1.57    1.10    0.27   0.44   -3.40    1.40
    cons.price.idx   17 41188   93.58   0.58   93.75   93.58   0.56   92.20   94.77
    cons.conf.idx    18 41188  -40.50   4.63  -41.80  -40.60   6.52  -50.80  -26.90
    euribor3m        19 41188    3.62   1.73    4.86    3.81   0.16    0.63    5.04
    nr.employed      20 41188 5167.04  72.25 5191.00 5178.43  55.00 4963.60 5228.10
    y*               21 41188    1.11   0.32    1.00    1.02   0.00    1.00    2.00
                     range  skew kurtosis   se
    age              81.00  0.78     0.79 0.05
    job*             11.00  0.45    -1.39 0.02
    marital*          3.00 -0.06    -0.34 0.00
    education*        7.00 -0.24    -1.21 0.01
    default*          2.00  1.44     0.07 0.00
    housing*          2.00 -0.14    -1.95 0.00
    loan*             2.00  1.82     1.38 0.00
    contact*          1.00  0.56    -1.69 0.00
    month*            9.00 -0.31    -1.03 0.01
    day_of_week*      4.00  0.01    -1.27 0.01
    duration       4918.00  3.26    20.24 1.28
    campaign         55.00  4.76    36.97 0.01
    pdays           999.00 -4.92    22.23 0.92
    previous          7.00  3.83    20.11 0.00
    poutcome*         2.00 -0.88     3.98 0.00
    emp.var.rate      4.80 -0.72    -1.06 0.01
    cons.price.idx    2.57 -0.23    -0.83 0.00
    cons.conf.idx    23.90  0.30    -0.36 0.02
    euribor3m         4.41 -0.71    -1.41 0.01
    nr.employed     264.50 -1.04     0.00 0.36
    y*                1.00  2.45     4.00 0.00

    4、查看数据是否有缺失

    > sapply(bank,anyNA)
               age            job        marital      education        default 
             FALSE          FALSE          FALSE          FALSE          FALSE 
           housing           loan        contact          month    day_of_week 
             FALSE          FALSE          FALSE          FALSE          FALSE 
          duration       campaign          pdays       previous       poutcome 
             FALSE          FALSE          FALSE          FALSE          FALSE 
      emp.var.rate cons.price.idx  cons.conf.idx      euribor3m    nr.employed 
             FALSE          FALSE          FALSE          FALSE          FALSE 
                 y 
             FALSE 
    > 

    5、单变量频数分析

    > table(bank$y)
    
       no   yes 
    36548  4640 
    > 

    6、两个变量的交叉列联表

    > table(bank$y,bank$marital)
         
          divorced married single unknown
      no      4136   22396   9948      68
      yes      476    2532   1620      12
    > 

    > xtabs(~y+marital,data=bank)
         marital
    y     divorced married single unknown
      no      4136   22396   9948      68
      yes      476    2532   1620      12
    > 

    7、

    > prop.table(tab,1)
         
             divorced     married      single     unknown
      no  0.113166247 0.612783189 0.272189997 0.001860567
      yes 0.102586207 0.545689655 0.349137931 0.002586207
    > prop.table(tab,2)
         
           divorced   married    single   unknown
      no  0.8967910 0.8984275 0.8599585 0.8500000
      yes 0.1032090 0.1015725 0.1400415 0.1500000
    > 

    8、构建更复杂的Table

    > ftable(bank[,c(3,4,21)],row.vars = c(1,2),col.vars = "y")
                                 y   no  yes
    marital  education                      
    divorced basic.4y               406   83
             basic.6y               169   13
             basic.9y               534   31
             high.school           1086  107
             illiterate               1    1
             professional.course    596   61
             university.degree     1177  160
             unknown                167   20
    married  basic.4y              2915  313
             basic.6y              1628  139
             basic.9y              3858  298
             high.school           4683  475
             illiterate              12    3
             professional.course   2799  357
             university.degree     5573  821
             unknown                928  126
    single   basic.4y               422   31
             basic.6y               301   36
             basic.9y              1174  142
             high.school           2702  448
             illiterate               1    0
             professional.course   1247  177
             university.degree     3723  683
             unknown                378  103
    unknown  basic.4y                 5    1
             basic.6y                 6    0
             basic.9y                 6    2
             high.school             13    1
             illiterate               0    0
             professional.course      6    0
             university.degree       25    6
             unknown                  7    2
    > 

    9、卡方检验

    > tab
         
          divorced married single unknown
      no      4136   22396   9948      68
      yes      476    2532   1620      12
    > chisq.test(tab)
    
        Pearson's Chi-squared test
    
    data:  tab
    X-squared = 122.66, df = 3, p-value < 2.2e-16
    
    > 

    10、连续数据可视化

    > hist(bank$age)
    > 

    11、连续变量的分布

    > library(lattice)
    > densityplot(~age,groups=y,data=bank,plot.point=FALSE,auto.key = TRUE)
    > 

  • 相关阅读:
    关于Linux联网的问题
    MapD的数据导出与扩容(利用现有的表)
    系统重启后,MapD报错Thrift的连接被拒绝
    关于Linux系统只读(Ubuntu16.4.1)
    javaBean的依赖注入中构造注入和依赖注入的区别
    Struts2开发中遇到的坑。。。
    通过配置文件设置定时任务,以及时间的选择
    微信小程序开发的movable开发的坑
    spring基础概念
    Hibernate的三种查询方式
  • 原文地址:https://www.cnblogs.com/weizhen/p/6933642.html
Copyright © 2011-2022 走看看