zoukankan      html  css  js  c++  java
  • 工具变量法(IV)的Stata操作

    Stata操作

    工具变量法的难点在于找到一个合适的工具变量并说明其合理性,Stata操作其实相当简单,只需一行命令就可以搞定,我们通常使用的工具变量法的Stata命令主要就是ivregress命令和ivreg2命令。

    ivregress命令

    ivregress命令是Stata自带的命令,支持两阶段最小二乘(2SLS)、广义矩估计(GMM)和有限信息最大似然估计(LIML)三种工具变量估计方法,我们最常使用的是两阶段最小二乘法(2SLS),因为2SLS最能体现工具变量的实质,并且在球形扰动项的情况下,2SLS是最有效率的工具变量法。

    顾名思义,两阶段最小二乘法(2SLS)需要做两个回归:

    (1)第一阶段回归:用内生解释变量对工具变量和控制变量回归,得到拟合值。

    (2)第二阶段回归:用被解释变量对第一阶段回归的拟合值和控制变量进行回归。

    如果要使用2SLS方法,我们只需在ivregress后面加上2sls即可,然后将内生解释变量lnjinshipop和工具变量bprvdist放在一个小括号中,用=号连接。选项first表示报告第一阶段回归结果,选项cluster()表示使用聚类稳健的标准误。

    ivregress 2sls lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)

    第一阶段回归结果

    First-stage regressions
    -----------------------
    
                                                    Number of obs     =        274
                                                    No. of clusters   =         28
                                                    F(   7,    239)   =      85.27
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.6487
                                                    Adj R-squared     =     0.5988
                                                    Root MSE          =     0.4442
    
    ------------------------------------------------------------------------------
                 |               Robust
     lnjinshipop |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    lnnightlight |    .183385   .0682506     2.69   0.008     .0489354    .3178346
     lncoastdist |   .0350333    .077158     0.45   0.650    -.1169634    .1870299
             tri |    1.06676   .5637082     1.89   0.060    -.0437105    2.177231
     suitability |  -.0769726   .0549697    -1.40   0.163    -.1852596    .0313144
    lnpopdensity |    .196144   .0843727     2.32   0.021     .0299349    .3623532
      urbanrates |   3.352916   1.687109     1.99   0.048      .029414    6.676419
                 |
          provid |
             12  |   .2051006   .0551604     3.72   0.000      .096438    .3137632
             13  |  -1.890425   .0951146   -19.88   0.000    -2.077795   -1.703055
    ......
             64  |  -1.301895   .1581021    -8.23   0.000    -1.613346   -.9904433
                 |
        bprvdist |  -.0846917   .0107859    -7.85   0.000    -.1059393   -.0634441
           _cons |   2.126233   .9791046     2.17   0.031     .1974567     4.05501
    ------------------------------------------------------------------------------

    从表中可以看出,工具变量bprvdist的系数为-0.085,标准误为0.011,在1%的水平上显著。

    第二阶段回归结果

    Instrumental variables (2SLS) regression          Number of obs   =        274
                                                      Wald chi2(34)   =      13.62
                                                      Prob > chi2     =     0.9993
                                                      R-squared       =     0.7874
                                                      Root MSE        =     .05434
    
                                    (Std. Err. adjusted for 28 clusters in provid)
    ------------------------------------------------------------------------------
                 |               Robust
       lneduyear |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
     lnjinshipop |   .0834221   .0116773     7.14   0.000      .060535    .1063092
    lnnightlight |   .0592777   .0102138     5.80   0.000     .0392589    .0792964
     lncoastdist |   .0093399   .0118791     0.79   0.432    -.0139426    .0326224
             tri |  -.1044346    .063917    -1.63   0.102    -.2297097    .0208404
     suitability |  -.0008944   .0125824    -0.07   0.943    -.0255555    .0237666
    lnpopdensity |  -.0472648   .0115102    -4.11   0.000    -.0698245   -.0247051
      urbanrates |  -.0353689   .1784687    -0.20   0.843    -.3851612    .3144233
                 |
          provid |
             12  |    -.10732   .0088755   -12.09   0.000    -.1247157   -.0899243
             13  |   .0244675   .0276005     0.89   0.375    -.0296284    .0785634
    ......
             64  |  -.0740796   .0350354    -2.11   0.034    -.1427477   -.0054116
                 |
           _cons |   2.014146   .1490819    13.51   0.000     1.721951    2.306341
    ------------------------------------------------------------------------------

    第二阶段回归结果就是使用工具变量法对内生性问题进行“治疗”之后得到的估计结果,很多论文一般也只报告第二阶段回归结果。从表中可以看出,进士密度lnjinshipop的系数为0.083,标准误为0.012,在1%的水平上显著。

    弱工具变量检验

    检验工具变量的有效性,首先要检验工具变量的相关性。存在弱工具变量问题时,2SLS 估计不仅难以矫正OLS 估计的偏差,反因有更大的标准差而有更低的效率, 导致“治疗比疾病本身更坏”。我们可以使用estat firststage命令对第一阶段的结果进行分析,以判断是否存在弱工具变量问题。

    . estat firststage,forcenonrobust
    
      First-stage regression summary statistics
      --------------------------------------------------------------------------
                   |            Adjusted      Partial       Robust
          Variable |   R-sq.       R-sq.        R-sq.       F(1,27)   Prob > F
      -------------+------------------------------------------------------------
       lnjinshipop |  0.6487      0.5988       0.3276        61.914    0.0000
      --------------------------------------------------------------------------
      (F statistic adjusted for 28 clusters in provid)
    
    
      Minimum eigenvalue statistic = 116.419     
    
      Critical Values                      # of endogenous regressors:    1
      Ho: Instruments are weak             # of excluded instruments:     1
      ---------------------------------------------------------------------
                                         |    5%     10%     20%     30%
      2SLS relative bias                 |         (not available)
      -----------------------------------+---------------------------------
                                         |   10%     15%     20%     25%
      2SLS Size of nominal 5% Wald test  |  16.38    8.96    6.66    5.53
      LIML Size of nominal 5% Wald test  |  16.38    8.96    6.66    5.53
      ---------------------------------------------------------------------

    结果显示,[公式]=0.3276(偏[公式]是扣除了其他外生变量后工具变量的解释力度),说明工具变量bprvdist对内生变量lnjinshipop有很强的解释力度。F统计量=61.914>10,根据经验准则可以判断,我们的工具变量不是一个弱工具变量。除此之外,estat firststage命令还会报告一个最小特征值统计量(Minimum eigenvalue statistic),一般大于2SLS Size of nominal 5% Wald test中10%对应的临界值就表明不存在弱工具变量问题

    特别说明:豪斯曼检验(即内生性检验)和过度识别检验(即外生性检验)在此就不予以介绍了,因为这两个检验都是很“鸡肋”的,并且这里只有一个工具变量,也没法做过度识别检验,后面我将专门写一篇推文对这两个检验进行详细说明。

    ivregress 2sls Y X1 X2 X3 X4 (X1= Z1 Z2), robust
    estat overid

    ivreg2命令

    ivreg2命令是对ivregress命令的改进和优化,功能更加强大,支持的估计方法更多(默认使用2SLS),并且会直接报告工具变量的几个统计检验结果。

    ivreg2命令是一个外部命令,所以使用之前需要安装(ssc install ivreg2)。它的语法格式和ivregress基本一致:

    ivreg2 lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)

    第一阶段回归结果

    First-stage regression of lnjinshipop:
    
    Statistics robust to heteroskedasticity and clustering on provid
    Number of obs =                    274
    Number of clusters (provid) =       28
    ------------------------------------------------------------------------------
                 |               Robust
     lnjinshipop |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        bprvdist |  -.0846917   .0107633    -7.87   0.000    -.1058948   -.0634886
    lnnightlight |    .183385   .0681077     2.69   0.008     .0492169    .3175531
     lncoastdist |   .0350333   .0769964     0.45   0.650     -.116645    .1867116
             tri |    1.06676   .5625276     1.90   0.059    -.0413849    2.174906
     suitability |  -.0769726   .0548546    -1.40   0.162    -.1850328    .0310876
    lnpopdensity |    .196144    .084196     2.33   0.021      .030283    .3620051
      urbanrates |   3.352916   1.683576     1.99   0.048     .0363742    6.669459
                 |
          provid |
             12  |   .2051006   .0550449     3.73   0.000     .0966656    .3135357
             13  |  -1.890425   .0949154   -19.92   0.000    -2.077402   -1.703447
    ......
             64  |  -1.301895    .157771    -8.25   0.000    -1.612694   -.9910956
                 |
           _cons |   2.126233   .9770541     2.18   0.031      .201496    4.050971
    ------------------------------------------------------------------------------

    第二阶段回归结果

    IV (2SLS) estimation
    --------------------
    
    Estimates efficient for homoskedasticity only
    Statistics robust to heteroskedasticity and clustering on provid
    
    Number of clusters (provid) =       28                Number of obs =      274
                                                          F( 34,    27) =     0.34
                                                          Prob > F      =   0.9984
    Total (centered) SS     =  3.805186838                Centered R2   =   0.7874
    Total (uncentered) SS   =  1282.736705                Uncentered R2 =   0.9994
    Residual SS             =  .8089841945                Root MSE      =   .05434
    
    ------------------------------------------------------------------------------
                 |               Robust
       lneduyear |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
     lnjinshipop |   .0834221   .0116773     7.14   0.000      .060535    .1063092
    lnnightlight |   .0592777   .0102138     5.80   0.000     .0392589    .0792964
     lncoastdist |   .0093399   .0118791     0.79   0.432    -.0139426    .0326224
             tri |  -.1044346    .063917    -1.63   0.102    -.2297097    .0208404
     suitability |  -.0008944   .0125824    -0.07   0.943    -.0255555    .0237666
    lnpopdensity |  -.0472648   .0115102    -4.11   0.000    -.0698245   -.0247051
      urbanrates |  -.0353689   .1784687    -0.20   0.843    -.3851612    .3144233
                 |
          provid |
             12  |    -.10732   .0088755   -12.09   0.000    -.1247157   -.0899243
             13  |   .0244675   .0276005     0.89   0.375    -.0296284    .0785634
    ......
             64  |  -.0740796   .0350354    -2.11   0.034    -.1427477   -.0054116
                 |
           _cons |   2.014146   .1490819    13.51   0.000     1.721951    2.306341
    ------------------------------------------------------------------------------

    弱工具变量检验结果

    ivreg2命令会直接报告Cragg-Donald Wald F 统计量和Kleibergen-Paap Wald rk F统计量两个用于弱工具变量检验的统计量,其中Cragg-Donald Wald F 统计量假设扰动项iid(独立同分布),而Kleibergen-Paap Wald rk F统计量没有做扰动项iid的假设。需要注意的是,这里的原假设是所选工具变量是弱工具变量,当两个F统计量大于Stock-Yogo weak ID test critical values中10%偏误的临界值时,可以拒绝原假设,认为不存在弱工具变量问题

    Weak identification test
    Ho: equation is weakly identified
    Cragg-Donald Wald F statistic                                     116.42
    Kleibergen-Paap Wald rk F statistic                                61.91
    
    Stock-Yogo weak ID test critical values for K1=1 and L1=1:
                                       10% maximal IV size             16.38
                                       15% maximal IV size              8.96
                                       20% maximal IV size              6.66
                                       25% maximal IV size              5.53
    Source: Stock-Yogo (2005).  Reproduced by permission.
    NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

    补充:xtivreg2命令

    ivreg2 和 xtivreg2 之间的差异,与reg 和 xtreg 之间的差异大体类似。

    我们知道,OLS加入虚拟变量(LSDV)等价于FE模型。运行以下命令
    ivreg2允许有时间效应(即适用于混合面板),而xtivreg2只能做固定效应,不能添加时间的固定效应
    xtset id year
    xtivreg2 ys k (n=l2.n l3.n), fe first savefp(first)
    outreg2 [first second] using xxx.doc, tstat bdec(3) tdec(2) replace

    xtivreg2会报告过度检验的结果,同ivreg2

    添加时间固定效应建议使用xtivreg

    xtset stkcd year
    xtreg Ln_geodistance_ew rdls ewdistance_mean $control i.year,fe
    est store first
    outreg2[first] using first.doc,tstat bdec(3) tdec(3) addstat(Ajusted R2,`e(r2_a)') replace
    xtivreg ln_Cash_ratio1 (Ln_geodistance_ew=rdls ewdistance_mean) $control i.year,fe   //xtivreg2 加入不了i.year   
    outreg2 using xxx1.doc,cttop(second) tstat bdec(3) tdec(2)

    如果进行过度检验则需要

    xtoverid

    添加字符串类型固定效应则

    xtset stkcd year
    xi:xtreg Ln_geodistance_ew rdls ewdistance_mean $control i.industry2,fe
    est store first
    outreg2[first] using first.doc,tstat bdec(3) tdec(3) addstat(Ajusted R2,`e(r2_a)') replace
    xi:xtivreg ln_Cash_ratio1 (Ln_geodistance_ew=rdls ewdistance_mean) $control i.industry2,fe   //xtivreg2 加入不了i.year   
    outreg2 using xxx1.doc,cttop(second) tstat bdec(3) tdec(2)
  • 相关阅读:
    Cx的治疗
    [HDU2222] Keywords Search
    楼房
    [51Nod1089] 最长回文子串 V2(Manacher算法)
    [洛谷P3375] 【模板】KMP字符串匹配
    [codeVS1204] 寻找子串位置
    [洛谷P1114] “非常男女”计划
    [BZOJ2132] 圈地计划
    [COGS311] Redundant Paths
    [COGS309] [USACO 3.2] 香甜的黄油
  • 原文地址:https://www.cnblogs.com/celine227/p/15012128.html
Copyright © 2011-2022 走看看