zoukankan      html  css  js  c++  java
  • 对双色球结果预测的一次无聊的尝试

    今天晚上突然脑子不知怎么的,本来正在人工给12306验证码做打标工作,突然想看看双色球每期的开奖结果是否有规律

    这里下载从03年到今年的每期双色球开奖结果

    用t-SNE降维到3维打印出来看看

    似乎并没有什么规律

    准备用线性回归来拟合一个模型,马上就有一个问题,对于双色球预测,自变量取什么?这是个非常复杂的问题了,而且可能是无解的问题,因为如果双色球是完全的独立随机事件,那也就无法提取出自变量,自然也就没法提取特征空间,这里姑且用开奖期号作为自变量特征,用结果(6维的红球结果,1维的蓝球结果)作为label

    # -*- coding: utf-8 -*-
    
    import os
    import numpy as np
    import matplotlib.pyplot as plt
    import pickle
    from sklearn.manifold import TSNE
    from mpl_toolkits.mplot3d import Axes3D
    from sklearn import datasets, linear_model
    from sklearn.metrics import mean_squared_error, r2_score
    
    def load_historydata():
        if not os.path.isfile("ssq.pkl"):
            ori_data = np.loadtxt('ssq.TXT', delimiter=' ', usecols=(0, 2, 3, 4, 5, 6, 7, 8), unpack=False)
            pickle.dump(ori_data, open("ssq.pkl", "w"))
            return ori_data
        else:
            ori_data = pickle.load(open("ssq.pkl", "r"))
            return ori_data
    
    def load_tsnedata(ori_data):
        if not os.path.isfile("ssq_tsne.pkl"):
            tsne = TSNE(n_components=3, random_state=0)
            tsne_data = tsne.fit_transform(ori_data)
            pickle.dump(tsne_data, open("ssq_tsne.pkl", "w"))
            return tsne_data
        else:
            tsne_data = pickle.load(open("ssq_tsne.pkl", "r"))
            return tsne_data
    
    def show_oridata(show_date):
        fig = plt.figure(1, figsize=(8, 6))
        ax = Axes3D(fig, elev=-150, azim=110)
        ax.scatter(show_date[:, 0], show_date[:, 1], show_date[:, 2], edgecolor='k', s=40)
        plt.show()
    
    if __name__ == '__main__':
        ori_data = load_historydata()
        np.random.shuffle(ori_data)
        # tsne_data = load_tsnedata(ori_data)
        # show_oridata(tsne_data)
    
        X_data = ori_data[:, 0].reshape(-1, 1)
        Y_data = ori_data[:, 1:]
        print "X_data[0]: ", X_data[0]
        print "Y_data[0]: ", Y_data[0]
    
        # Split the data into training/testing sets
        split_len = int(len(X_data) * 0.8)
        X_train = X_data[:split_len]
        X_test = X_data[split_len:]
        print "X_train"
        print X_train
    
        # Split the targets into training/testing sets
        y_train = Y_data[:split_len]
        y_test = Y_data[split_len:]
        print "y_train"
        print y_train
    
        # Create linear regression object
        regr = linear_model.LinearRegression()
    
        # Train the model using the training sets
        regr.fit(X_train, y_train)
    
        # Make predictions using the testing set
        #y_pred = regr.predict(X_train).round()
        y_pred = regr.predict(X_test).round()
        print "y_pred"
        print y_pred
    
        print "y_pred distinct"
        y_pred_cache = list()
        for line in y_pred:
            line = list(line)
            if line not in y_pred_cache:
                y_pred_cache.append(line)
        for line in y_pred_cache:
            print line
    
        # 预测的准确度
        print "Prediction accurate: {0}%".format(np.mean(X_test == y_pred) * 100)

    线性回归的预测结果如下

    y_pred distinct
    [5.0, 9.0, 14.0, 19.0, 24.0, 29.0, 9.0]
    [5.0, 10.0, 15.0, 19.0, 24.0, 29.0, 9.0]
    [5.0, 10.0, 14.0, 19.0, 24.0, 29.0, 9.0]

    模型对所有的training set的每一条预测结果都相同,这说明,对于开奖期号来说,开奖结果是一个完全随机的事件

    如果考虑每期和每期之间可能有关联性,可以考虑试试用RNN来训练,输入依然是开奖期号

    但是反过来也给了我一个启示,在进行机器学习项目的时候,如果train或者test的结果不好或者不符合预期,不要急于去调参数或者换模型,更应该回过头来想想自己给模型输入的特征是否确实隐含了规律,算法是无法对随机事件进行预测的,只有原始数据中确实隐含了规律,使用适当的模型才能从中抽象出模型,特征工程是非常关键的,也是需要长久思考的

    Relevant Link:

    https://datachart.500.com/ssq/history/history.shtml
    http://blog.csdn.net/supperman_009/article/details/40623503
    https://zhuanlan.zhihu.com/p/26341086
    http://ssq.50018.com/zou-shi-tu/default.aspx
    http://www.sohu.com/a/134552307_116235
  • 相关阅读:
    块数据加密模式
    "jobTracker is not yet running"(hadoop 配置)
    平衡搜索树
    Programming Assignment 3: Collinear Points
    Programming Assignment 2: Randomized Queues and Deques
    Programming Assignment 1: Percolation
    1007. Maximum Subsequence Sum (25)
    Link List
    1081. Rational Sum (20)
    strassen algorithm
  • 原文地址:https://www.cnblogs.com/LittleHann/p/7518410.html
Copyright © 2011-2022 走看看