Applied Nonparametric Statistics-lec5

zoukankan html css js c++ java

Applied Nonparametric Statistics-lec5
今天继续two-sample test

Ref: https://onlinecourses.science.psu.edu/stat464/print/book/export/html/6
- Mann-Whitney Test
前面说这个和Wilcoxon是identical的，只是统计量不同。现在我们来看一下它的统计量U。注意，现在检查的仍然是两个独立样本。

Treatment 1: x₁, x₂, ... , x_m
Treatment 2: y₁, y₂, ... , y_n

U = # of pairs of (X_i, Y_j) for which X_i < Y_j

H₀ : the distributions are the same
H₁ : the distributions are not the same

Table A4 contains lower tailed and upper tailed values for U under the null hypothesis. It can be shown that U_upper = mn - U_lower.

实际在R中的操作：
wilcox.test(new, trad, alternative="greater")
　　现在，我们来构造U的置信区间。好了，直接用R就好了啊：）
wilcox.test(ugrad, grad, conf.level=0.90, conf.int=T)
　　

以上我们检查的都是两个样本的位置情况，下面要检查scale，也就是分布的形状，variability。

我们现在假设两个样本的均值相等，方差不同，现在想判断哪个方差更大。如果两个样本来自正态分布的总体，那么可以分别计算方差，然后

但是如果这两个不是来自正态分布的，就不可以了。此时，我们考虑非参数检验的方法。
- Siegel-Tukey Test (ST test):检查方差
1. 把数据合在一起，从小到大排列。
2. 最小的rank为1，最大的为2，次小的为3，次大的为4，蛇形排序。
3. 做Wilcoxon rank-sum test. The smaller rank sums are associated with the treatment that has the largest variability.
如果把指定rank的方式改为：最小和最大为1，次小和次大为2，以此类推，这样就是Ansari-Bradley test。

R下，可以使用jmuOutlier包内的函数
siegel.test(x, y, alternative = c("two.sided", "less", "greater"), reverse = FALSE,all.perms = TRUE, num.sim = 20000)
　　Ansari-Bradley test:
ansari.test(x, y, alternative = c("two.sided", "less", "greater"), exact = NULL, conf.int = FALSE, conf.level = 0.95, ...)
　　事实上，检查方差一致性homogeneity of variance (homoscedasticity)还有其他方法，参考这篇文章：

http://www.cookbook-r.com/Statistical_analysis/Homogeneity_of_variance/
1. Bartlett’s test：适合于数据是正态分布的；
2. Levene’s test ：在car包里，对于非正态分布的数据，比Bartlett's test更具健壮性；
3. Fligner-Killeen test ：非参数检验方法。
- Tests on Deviances
1. Obtain the deviances for the two treatments (dev_ix and dev_jy) and compute RMD from the orginal data, denoted RMD_obs.

2. Permute.样本数大的话，循环指定次数，否则循环所有。计算RMD。
- Kolmogorov-Smirnov Test
这个是检查general difference的，也就是说，会考虑location,scale,shape。如果两个样本分布的位置是否不同这一点未知，使用这个；如果已经

知道两个样本数据分布的位置是不同的，那么应该用wilcoxon。
1. Calculate the observed test statistic, $K S_{obs}$
^F1(W)是sample CDF

2. Find all the possible permutation of the data and calculate KS for each permutation.

3. The p-value is found by

R语言就简单了：
ks.test(a, b)
　
查看全文

相关阅读:
互联网 | 逻辑上的黑话才是真正的花里胡哨
 OLAP引擎：基于Druid组件进行数据统计分析
 数据调度组件：基于Azkaban协调时序任务执行
 职场 | 工作五年之后，对技术和业务的思考
 数据搬运组件：基于Sqoop管理数据导入和导出
 valgrind 内存泄漏分析
 Solon Cloud 分布式服务开发套件清单，感觉受与 Spring Cloud 的不同
 Solon 的想法与架构笔记
 对标 Spring Boot & Cloud ，轻量框架 Solon 1.5.8 发布
 对标 Spring Boot & Cloud ，轻量框架 Solon 1.5.2 重要发布

原文地址：https://www.cnblogs.com/pxy7896/p/6961744.html