3.1.7. Cross validation of time series data

zoukankan html css js c++ java

3.1.7. Cross validation of time series data
3.1.7. Cross validation of time series data

Time series data is characterised by the correlation between observations that are near in time (autocorrelation). However, classical cross-validation techniques such as KFold and ShuffleSplit assume the samples are independent and identically distributed, and would result in unreasonable correlation between training and testing instances (yielding poor estimates of generalisation error) on time series data. Therefore, it is very important to evaluate our model for time series data on the “future” observations least like those that are used to train the model. To achieve this, one solution is provided by TimeSeriesSplit.
3.1.7.1. Time Series Split

TimeSeriesSplit is a variation of k-fold which returns first $k$ folds as train set and the $(k+1)$ th fold as test set. Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them. Also, it adds all surplus data to the first training partition, which is always used to train the model.

This class can be used to cross-validate time series data samples that are observed at fixed time intervals.

Example of 3-split time series cross-validation on a dataset with 6 samples:

>>>
>>> from sklearn.model_selection import TimeSeriesSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> tscv = TimeSeriesSplit(n_splits=3) >>> print(tscv) TimeSeriesSplit(n_splits=3) >>> for train, test in tscv.split(X): ... print("%s %s" % (train, test)) [0 1 2] [3] [0 1 2 3] [4] [0 1 2 3 4] [5]
查看全文

相关阅读:
关闭各种浏览器自动更新的方法
 新中新question
linux忘记root密码后的解决办法
 Linux下的命令
 windows未启用Administrator账户
 Jenkins安装以及邮件配置
 pl/sql不能复制粘贴
 python-装饰器&自动化框架搭建
 进程练习——生成指定规格的图片
 python-类的各种方法

原文地址：https://www.cnblogs.com/zle1992/p/6915276.html

3.1.7. Cross validation of time series data

3.1.7. Cross validation of time series data

3.1.7.1. Time Series Split