原文链接:http://tecdat.cn/?p=3014
前言
预测是通过基于来自过去和当前状态的信息来对将要发生的事情做出声明。
每个人每天都以不同程度的成功解决预测问题。例如,需要预测天气,收获,能源消耗,外汇(外汇)货币对或股票,地震和许多其他东西的变动。...
预测分析
通过分类,深度学习能够在例如图像中的像素和人的名称之间建立相关性。你可以称之为静态预测。出于同样的原因,暴露于足够的正确数据,深度学习能够建立当前事件和未来事件之间的相关性。从某种意义上说,未来的事件就像标签一样。深度学习并不一定关心时间,或者事情尚未发生。给定时间序列,深度学习可以读取一串数字并预测下一个最可能发生的数字。
数据样例:
2011001;3;9;20;24;26;32;10 2011002;6;8;12;17;28;33;5 2011003;13;14;21;22;23;27;4 2011004;4;6;8;10;13;26;5 2011005;6;9;12;14;20;22;13 2011006;1;3;5;13;16;18;5 2011007;1;9;17;24;26;31;5 2011008;10;12;13;17;24;31;15 2011009;17;18;23;24;25;26;4 2011010;1;4;5;9;15;19;13 2011011;1;12;18;19;21;24;10 2011012;7;8;11;13;15;26;13 2011013;1;3;13;16;21;22;8 2011014;5;7;10;11;23;26;1
截屏
import random for x in range(0,6):#NUM_OF_RED=6 choice_num_red = random.choice( redBalls ) print( choice_num_red ) redBalls.remove(choice_num_red) for y in range(0,1):#NUM_OF_BLUE=1 choice_num_blue = random.choice( blueBalls ) print( choice_num_blue ) #scipy test code #matplotlib test print(pylab.plot(abs(b))) #show() #from matplotlib.mlab import normpdf #import matplotlib.numerix as nx #import pylab as p # #x = nx.arange(-4, 4, 0.01) #y = normpdf(x, 0, 1) # unit normal #p.plot(x,y, color='red', lw=2) #p.show()
plt.plot(dfs_blue_balls_count_values,'x',label='Dot plot') plt.legend() plt.ylabel('Y-axis,number of blue balls') plt.xlabel('X-axis,number of duplication') plt.show() #Jitter plot idx_min = min(dfs_blue_balls_count_values) idx_max = max(dfs_blue_balls_count_values) idx_len = idx_max-idx_min print("min:",idx_min,"max:",idx_max) num_jitter = 0 samplers = random.sample(range(idx_min,idx_max),idx_len) while num_jitter < 5: samplers += random.sample(range(idx_min,idx_max),idx_len) num_jitter += 1 ##lots of jitter effect print("samplers:",samplers) #plt.plot(samplers,'ro',label='Jitter plot') #plt.ylabel('Y-axis,number of blue balls') #plt.xlabel('X-axis,number of duplication') #plt.legend() #plt.show() #Histograms and Kernel Density Estimates: #Scott rule, #This rule assumes that the data follows a Gaussian distribution; #Plotting the blue balls appear frequency histograms(x-axis:frequency,y-axis:VIPs) ##@see http://pandas.pydata.org/pandas-docs/dev/basics.html#value-counts-histogramming num_of_bin = len(series_blue_balls_value_counts) array_of_ball_names = series_blue_balls_value_counts.keys() print("Blue ball names:",array_of_ball_names) list_merged_by_ball_id = [] for x in xrange(0,num_of_bin): num_index = x+1.5 list_merged_by_ball_id += [num_index]*dfs_blue_balls_count_values[x] print("list_merged_by_ball_id:",list_merged_by_ball_id) ##Histograms plotting plt.hist(list_merged_by_ball_id, bins=num_of_bin) plt.legend() plt.xlabel('Histograms,number of appear time by blue ball number') plt.ylabel('Histograms,counter of appear time by blue ball number') plt.show() ###Gaussian_KDE ##CDF(The Cumulative Distribution Function from scipy.stats import cumfreq idx_max = max(dfs_blue_balls_count_values) hi = idx_max a = numpy.arange(hi) ** 2 # for nbins in ( 2, 20, 100 ): for nbins in dfs_blue_balls_count_values: cf = cumfreq(a, nbins) # bin values, lowerlimit, binsize, extrapoints w = hi / nbins x = numpy.linspace( w/2, hi - w/2, nbins ) # care # print x, cf plt.plot( x, cf[0], label=str(nbins) ) plt.legend() plt.xlabel('CDF,number of appear time by blue ball number') plt.ylabel('CDF,counter of appear time by blue ball number') plt.show() ###Optional: Comparing Distributions with Probability Plots and QQ Plots ###Quantile plot of the server data. A quantile plot is a graph of the CDF with the x and y axes interchanged. ###Probability plot for the data set shown,a standard normal distribution: ###@see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html import scipy.stats as stats prob_measurements = numpy.random.normal(loc = 20, scale = 5, size=num_of_bin) stats.probplot(prob_measurements, dist="norm", plot=plt) plt.show()
路线图:
阶段I.图形:看数据;
1.单变量:形状和分布; (点/抖动图,直方图和核密度估计,累积分布函数,秩序...)
2.两个变量:建立关系; (散点图,征服噪声,对数图,银行......)
3.时间变量:时间序列分析; (平滑,关联,过滤器,卷积..)
4.两个以上的变量;图形多变量分析;(假彩色图,多图......)
5.Intermezzo:一个数据分析会议;(Session,gnuplot ..)
6 ...
阶段II.Analytics:建模数据;
1.评估和信封背面;
2.缩放参数的模型;
3.概率模型的分析;
4 ...
第三阶段。计算:挖掘数据;
1.Simulations;
2.寻找集群;
3.在森林中寻找决策树;
4 ....
第四阶段。应用:使用数据;
1.报告,BI(商业智能),仪表板;
2.财务计算和建模;
3.预测分析;
4 ....
=======
参考
时间序列预测的TensorFlow教程:https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series
参考文献:
http://deeplearning4j.org/usingrnns.html
http://www.scriptol.com/programming/list-algorithms.php
http://www.ipedr.com/vol25/54-ICEME2011-N20032.pdf
http://www.brightpointinc.com/flexdemos/chartslicer/chartslicersample.html
http://blog.lookbackon.com/?page_id=2506
http://stats.stackexchange.com/questions/68662/using-deep-learning-for-time-series-prediction