1.读取数据
jj = scan("http://www.stat.pitt.edu/stoffer/tsa2/data/jj.dat")
jj <- scan("http://www.stat.pitt.edu/stoffer/tsa2/data/jj.dat")
scan("http://www.stat.pitt.edu/stoffer/tsa2/data/jj.dat") -> jj
> jj<-scan("http://www.stat.pitt.edu/stoffer/tsa2/data/jj.dat") Read 84 items > jj [1] 0.710000 0.630000 0.850000 0.440000 0.610000 0.690000 [7] 0.920000 0.550000 0.720000 0.770000 0.920000 0.600000 [13] 0.830000 0.800000 1.000000 0.770000 0.920000 1.000000 [19] 1.240000 1.000000 1.160000 1.300000 1.450000 1.250000 [25] 1.260000 1.380000 1.860000 1.560000 1.530000 1.590000 [31] 1.830000 1.860000 1.530000 2.070000 2.340000 2.250000 [37] 2.160000 2.430000 2.700000 2.250000 2.790000 3.420000 [43] 3.690000 3.600000 3.600000 4.320000 4.320000 4.050000 [49] 4.860000 5.040000 5.040000 4.410000 5.580000 5.850000 [55] 6.570000 5.310000 6.030000 6.390000 6.930000 5.850000 [61] 6.930000 7.740000 7.830000 6.120000 7.740000 8.910000 [67] 8.280000 6.840000 9.540000 10.260000 9.540000 8.729999 [73] 11.880000 12.060000 12.150000 8.910000 14.040000 12.960000 [79] 14.850000 9.990000 16.200000 14.670000 16.020000 11.610000
scan
scan读入的数据生成向量类型
向量
1.基本元素为:数值(numeric)、字符(character)、逻辑值(logical)、复数型(complex)
2.向量不需要定义类型,可直接赋值。
生成一个空向量;x<-c();
给向量赋值。x<-c(0,1,2,3);
3.向量的元素下标取值是以1开始
4.如果一个向量中有一个字符,则该向量的类型会变成字符.mode(jj)
5.如果逻辑变量与数值在一起,则为转换成数值。TRUE转变成1 and FALSE 转变成 0
> mode(jj) [1] "numeric" > test<-c(1,2,'a') > mode(test) [1] "character" > test1<-c(1,2,true) 错误: 找不到对象'true' > test1<-c(1,2,TRUE) > mode(test1) [1] "numeric"
6.在R语言中没有单一的整数、单一字符的概念. X<-2;X<-'a';R都是当作向量来处理,只是这个向量只包括单一值.
7.给向量各元素命名: names(x)
> demo<-1:3 > fix(demo) > names(demo)<-c('a','b','c','d') 错误于names(demo) <- c("a", "b", "c", "d") : 'names'属性的长度[4]必需和矢量的长度[3]一样 > names(demo)<-c('a','b','c') > demo a b c 1 2 3 > names(demo)<-c('d','e','f') > demo d e f 1 2 3
jj转变为一个时间序列对象
> jj = ts(jj, start=1960, frequency=4) > jj Qtr1 Qtr2 Qtr3 Qtr4 1960 0.710000 0.630000 0.850000 0.440000 1961 0.610000 0.690000 0.920000 0.550000 1962 0.720000 0.770000 0.920000 0.600000 1963 0.830000 0.800000 1.000000 0.770000 1964 0.920000 1.000000 1.240000 1.000000 1965 1.160000 1.300000 1.450000 1.250000 1966 1.260000 1.380000 1.860000 1.560000 1967 1.530000 1.590000 1.830000 1.860000 1968 1.530000 2.070000 2.340000 2.250000 1969 2.160000 2.430000 2.700000 2.250000 1970 2.790000 3.420000 3.690000 3.600000 1971 3.600000 4.320000 4.320000 4.050000 1972 4.860000 5.040000 5.040000 4.410000 1973 5.580000 5.850000 6.570000 5.310000 1974 6.030000 6.390000 6.930000 5.850000 1975 6.930000 7.740000 7.830000 6.120000 1976 7.740000 8.910000 8.280000 6.840000 1977 9.540000 10.260000 9.540000 8.729999 1978 11.880000 12.060000 12.150000 8.910000 1979 14.040000 12.960000 14.850000 9.990000 1980 16.200000 14.670000 16.020000 11.610000
Scan和read.table不一样。Scan 生成的是有维度的向量,read.table生成的则是带有维度的数据架构.
> time(jj) Qtr1 Qtr2 Qtr3 Qtr4 1960 1960.00 1960.25 1960.50 1960.75 1961 1961.00 1961.25 1961.50 1961.75 1962 1962.00 1962.25 1962.50 1962.75 1963 1963.00 1963.25 1963.50 1963.75 1964 1964.00 1964.25 1964.50 1964.75 1965 1965.00 1965.25 1965.50 1965.75 1966 1966.00 1966.25 1966.50 1966.75 1967 1967.00 1967.25 1967.50 1967.75 1968 1968.00 1968.25 1968.50 1968.75 1969 1969.00 1969.25 1969.50 1969.75 1970 1970.00 1970.25 1970.50 1970.75 1971 1971.00 1971.25 1971.50 1971.75 1972 1972.00 1972.25 1972.50 1972.75 1973 1973.00 1973.25 1973.50 1973.75 1974 1974.00 1974.25 1974.50 1974.75 1975 1975.00 1975.25 1975.50 1975.75 1976 1976.00 1976.25 1976.50 1976.75 1977 1977.00 1977.25 1977.50 1977.75 1978 1978.00 1978.25 1978.50 1978.75 1979 1979.00 1979.25 1979.50 1979.75 1980 1980.00 1980.25 1980.50 1980.75 > plot(jj, ylab="Earnings per Share", main="J & J")
filter convolution 卷积方法做线性过滤 (相当于移动平均法)
recursive 递归方法做线性过滤 (相当于自回归法AR)
> k = c(.5,1,1,1,.5) > (k = k/sum(k)) [1] 0.125 0.250 0.250 0.250 0.125 > fjj = filter(jj, sides=2, k) > fjj Qtr1 Qtr2 Qtr3 Qtr4 1960 NA NA 0.64500 0.64000 1961 0.65625 0.67875 0.70625 0.73000 1962 0.74000 0.74625 0.76625 0.78375 1963 0.79750 0.82875 0.86125 0.89750 1964 0.95250 1.01125 1.07000 1.13750 1965 1.20125 1.25875 1.30250 1.32500 1966 1.38625 1.47625 1.54875 1.60875 1967 1.63125 1.66500 1.70250 1.76250 1968 1.88625 1.99875 2.12625 2.25000 1969 2.34000 2.38500 2.46375 2.66625 1970 2.91375 3.20625 3.47625 3.69000 1971 3.88125 4.01625 4.23000 4.47750 1972 4.65750 4.79250 4.92750 5.11875 1973 5.41125 5.71500 5.88375 6.00750 1974 6.12000 6.23250 6.41250 6.69375 1975 6.97500 7.12125 7.25625 7.50375 1976 7.70625 7.85250 8.16750 8.56125 1977 8.88750 9.28125 9.81000 10.32750 1978 10.87875 11.22750 11.52000 11.90250 1979 12.35250 12.82500 13.23000 13.71375 1980 14.07375 14.42250 NA NA
lowess 平滑,局部加权多项式回归
> plot(jj) > lines(fjj, col="red") > lines(lowess(jj), col="blue", lty="dashed")
diff 计算差分(差分就是通过做减法得到一个增量的序列)
方差(方差是衡量一个变量波动性的指标)
log 对数
我们把所有jj数据都取log值。
第二步,
我们把log值做差,即使用log值数列中第二值减去第一值,第三值减去第二值,第四值减去第三值等等。
如果做差处理前数列里有n个数值,处理后的结果中将有n-1个数值。
> dljj = diff(log(jj)) > dljj Qtr1 Qtr2 Qtr3 Qtr4 1960 -0.119545151 0.299516530 -0.658461623 1961 0.326684230 0.123232640 0.287682072 -0.514455392 1962 0.269332934 0.067139303 0.177983155 -0.427444015 1963 0.324496046 -0.036813973 0.223143551 -0.261364764 1964 0.177983155 0.083381609 0.215111380 -0.215111380 1965 0.148420005 0.113944259 0.109199292 -0.148420005 1966 0.007968170 0.090971778 0.298492989 -0.175890666 1967 -0.019418086 0.038466281 0.140581951 0.016260521 1968 -0.195308752 0.302280872 0.122602322 -0.039220713 1969 -0.040821995 0.117783036 0.105360516 -0.182321557 1970 0.215111380 0.203598955 0.075985907 -0.024692613 1971 0.000000000 0.182321557 0.000000000 -0.064538521 1972 0.182321557 0.036367644 0.000000000 -0.133531393 1973 0.235314087 0.047252885 0.116072171 -0.212921997 1974 0.127155175 0.057987258 0.081125545 -0.169418152 1975 0.169418152 0.110541874 0.011560822 -0.246400413 1976 0.234839591 0.140772554 -0.073331273 -0.191055237 1977 0.332705754 0.072759354 -0.072759354 -0.088728230 1978 0.308091059 0.015037877 0.007434978 -0.310154928 1979 0.454736157 -0.080042708 0.136132174 -0.396415273 1980 0.483426650 -0.099206650 0.088033349 -0.321971146 > plot(dljj) > shapiro.test(dljj) Shapiro-Wilk normality test data: dljj W = 0.9725, p-value = 0.07211
用qqnorm()函数绘制正态概率图 qqline()一条拟合曲线
用hist()函数可以绘制直方图
http://www.stathome.cn/manual/s/10.html
> par(mfrow=c(2,1)) > hist(dljj, prob=TRUE, 12) > lines(density(dljj)) > qqnorm(dljj) > qqline(dljj)