zoukankan      html  css  js  c++  java
  • 机器学习|用机器学习预测谁将夺得世界杯冠军(附代码)

    原文来自CSDN,公众号ID:CSDNnews,对其结构略作改动。

    写在前面

    2018 年 FIFA 世界杯即将拉开帷幕,全世界的球迷都热切地想要知道:谁将获得那梦寐以求的大力神杯?如果你不仅是个足球迷,而且也是高科技人员的话,我猜你肯定知道机器学习和人工智能也是目前的流行词。让我们结合两者来预测一下本届俄罗斯 FIFA 世界杯哪个国家将夺冠。

    作者:Gerald Muriuki,经济、数据科学专家

    译者:弯月,责编:郭芮

    点击此处获取完整的代码:https://github.com/itsmuriuki/FIFA-2018-World-cup-predictions

    译文

    足球比赛涉及的因素非常繁多,我无法将所有因素都融入机器学习模型中。本文只是一个黑客想用数据尝试一些很酷的东西。本文的目标是:

    1. 用机器学习来预测谁将赢得2018 FIFA世界杯的冠军;

    2. 预测整个比赛的小组赛结果;

    3. 模拟四分之一决赛、半决赛以及决赛。

    这些目标代表了独一无二的现实世界里机器学习的预测问题,并将解决机器学习中的各种任务:数据集成、特征建模和结果预测。

    数据

    我采用了两个来自 Kaggle 的数据集,我们将使用自 1930 年第一届世界杯以来所有参赛队的历史赛事结果。

    FIFA 排名是于 90 年代创建的,因此这里缺失很大一部分数据,所以我们使用历史比赛记录。点击以下链接获取所有数据 :

    首先,我们要针对两个数据集做探索性分析,然后经过特征工程来选择与预测关联性最强的特征,还有数据处理,再选择一个机器学习模型,最后将模型配置到数据集上。

    让我们开始动手吧

    首先,导入所需的代码库,并将数据集加载到数据框中:

    1 import pandas as pd
    2 import numpy as np
    3 import matplotlib.pyplot as plt
    4 import seaborn as sns
    5 import matplotlib.ticker as ticker
    6 import matplotlib.ticker as plticker
    7 from sklearn.model_selection import train_test_split
    8 from sklearn.linear_model import LogisticRegression

    导入数据集:

    1 #load data 
    2 world_cup = pd.read_csv('C:CodingFIFA2018-World-cupdatasetsWorld Cup 2018 Dataset.csv')
    3 results = pd.read_csv('C:/Coding/FIFA2018-World-cup/datasets/results.csv')

    下一步是加载数据集。通过调用 world_cup.head() 和 results.head() ,务必将两个数据集都加载到数据框中,如下所示:

    探索性分析

    在分析了两组数据集后,所得的数据集包含了以往赛事的数据——这个新的(所得的)数据集对于分析和预测将来的赛事非常有帮助。

    探索性分析和特征工程:需要建立与机器学习模型相关的特征,在任何数据科学的项目中,这部分工作都是最耗时的。

    现在我们把目标差异和结果列添加到结果数据集:

     1 #Adding goal difference and establishing who is the winner 
     2 winner = []
     3 for i in range (len(results['home_team'])):
     4     if results ['home_score'][i] > results['away_score'][i]:
     5         winner.append(results['home_team'][i])
     6     elif results['home_score'][i] < results ['away_score'][i]:
     7         winner.append(results['away_team'][i])
     8     else:
     9         winner.append('Draw')
    10 results['winning_team'] = winner
    11 
    12 #adding goal difference column
    13 results['goal_difference'] = np.absolute(results['home_score'] - results['away_score'])
    14 
    15 results.head()

    检查一下新的结果数据框:

    然后我们着手处理仅包含尼日利亚参加比赛的一组数据(这可以帮助我们集中找出哪些特征对一个国家有效,随后再扩展到参与世界杯的所有国家):

    1 #lets work with a subset of the data one that includes games played by Nigeria in a Nigeria dataframe
    2 df = results[(results['home_team'] == 'Nigeria') | (results['away_team'] == 'Nigeria')]
    3 nigeria = df.iloc[:]
    4 nigeria.head()

    第一届世界杯于 1930 年举行。我们为年份创建一列,并选择所有 1930 年之后举行的比赛:

    1 #creating a column for year and the first world cup was held in 1930
    2 year = []
    3 for row in nigeria['date']:
    4     year.append(int(row[:4]))
    5 nigeria ['match_year']= year
    6 nigeria_1930 = nigeria[nigeria.match_year >= 1930]
    7 nigeria_1930.count()

    现在我们可以用图形表示这些年来尼日利亚队最普遍的比赛结果:

    #what is the common game outcome for nigeria visualisation
    wins = []
    for row in nigeria_1930['winning_team']:
        if row != 'Nigeria' and row != 'Draw':
            wins.append('Loss')
        else:
            wins.append(row)
    winsdf= pd.DataFrame(wins, columns=[ 'Nigeria_Results'])
    
    #plotting
    fig, ax = plt.subplots(1)
    fig.set_size_inches(10.7, 6.27)
    sns.set(style='darkgrid')
    sns.countplot(x='Nigeria_Results', data=winsdf)

    每个参加世界杯的国家的胜率是非常有帮助性的指标,我们可以用它来预测此次比赛最可能的结果。

    锁定参加世界杯的队伍

    我们为2018世界杯所有参赛队伍创建一个数据框,然后从该数据框中进一步筛选出从 1930 年起参加世界杯的队伍,并去掉重复的队伍:

     1 #narrowing to team patcipating in the world cup
     2 worldcup_teams = ['Australia', ' Iran', 'Japan', 'Korea Republic', 
     3             'Saudi Arabia', 'Egypt', 'Morocco', 'Nigeria', 
     4             'Senegal', 'Tunisia', 'Costa Rica', 'Mexico', 
     5             'Panama', 'Argentina', 'Brazil', 'Colombia', 
     6             'Peru', 'Uruguay', 'Belgium', 'Croatia', 
     7             'Denmark', 'England', 'France', 'Germany', 
     8             'Iceland', 'Poland', 'Portugal', 'Russia', 
     9             'Serbia', 'Spain', 'Sweden', 'Switzerland']
    10 df_teams_home = results[results['home_team'].isin(worldcup_teams)]
    11 df_teams_away = results[results['away_team'].isin(worldcup_teams)]
    12 df_teams = pd.concat((df_teams_home, df_teams_away))
    13 df_teams.drop_duplicates()
    14 df_teams.count()

    为年份创建一列,去掉 1930 年之前的比赛,并去掉不会影响到比赛结果的数据列,比如 date(日期)、home_score(主场得分)、away_score(客场得分)、tournament(锦标赛)、city(城市)、country(国家)、goal_difference(目标差异)和 match_year(比赛年份):

    #create an year column to drop games before 1930
    year = []
    for row in df_teams['date']:
        year.append(int(row[:4]))
    df_teams['match_year'] = year
    df_teams_1930 = df_teams[df_teams.match_year >= 1930]
    df_teams_1930.head()
    #dropping columns that wll not affect matchoutcomes
    df_teams_1930 = df_teams.drop(['date', 'home_score', 'away_score', 'tournament', 'city', 'country', 'goal_difference', 'match_year'], axis=1)
    df_teams_1930.head()

    为了简化模型的处理,我们修改一下预测标签。

    如果主场队伍获胜,那么 winning_team(获胜队伍)一列显示“2”,如果平局则显示“1”,如果是客场队伍获胜则显示“0”:

    1 #Building the model
    2 #the prediction label: The winning_team column will show "2" if the home team has won, "1" if it was a tie, and "0" if the away team has won.
    3 
    4 df_teams_1930 = df_teams_1930.reset_index(drop=True)
    5 df_teams_1930.loc[df_teams_1930.winning_team == df_teams_1930.home_team,'winning_team']=2
    6 df_teams_1930.loc[df_teams_1930.winning_team == 'Draw', 'winning_team']=1
    7 df_teams_1930.loc[df_teams_1930.winning_team == df_teams_1930.away_team, 'winning_team']=0
    8 
    9 df_teams_1930.head()

    通过设置哑变量(dummy variables),我们将 home_team(主场队伍)和away _team(客场队伍)从分类变量转换成连续的输入。

    这时可以使用 pandas 的 get_dummies() 函数,它会将分类列替换成一位有效值(one-hot,由数字‘1’和‘0’组成),以便将它们加载到 Scikit-learn 模型中。

    接下来,我们将数据按照 70% 的训练数据集和 30% 的测试数据集分成 X 集和 Y 集:

     1 #convert home team and away team from categorical variables to continous inputs 
     2 # Get dummy variables
     3 final = pd.get_dummies(df_teams_1930, prefix=['home_team', 'away_team'], columns=['home_team', 'away_team'])
     4 
     5 # Separate X and y sets
     6 X = final.drop(['winning_team'], axis=1)
     7 y = final["winning_team"]
     8 y = y.astype('int')
     9 
    10 # Separate train and test sets
    11 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

    这里我们将使用分类算法:逻辑回归。这个算法的工作原理是什么?该算法利用逻辑函数来预测概率,从而可以测量出分类因变量与一个或多个自变量之间的关系。具体来说就是累积的逻辑分布。

    换句话说,逻辑回归可以针对一组可以影响到结果的既定数据集(统计值)尝试预测结果(赢或输)。

    在实践中这种方法的工作原理是:使用上述的两套“数据集”和比赛的实际结果,一次输入一场比赛到算法中。然后模型就会学习输入的每条数据对比赛结果产生了积极的效果还是消极的效果,以及影响的程度。

    经过充分的(好)数据的训练后,就可以得到能够预测未来结果的模型,而模型的好坏程度取决于输入的数据。

    之后我们将这些数据传递到算法中:

    logreg = LogisticRegression()
    logreg.fit(X_train, y_train)
    score = logreg.score(X_train, y_train)
    score2 = logreg.score(X_test, y_test)
    
    print("Training set accuracy: ", '%.3f'%(score))
    print("Test set accuracy: ", '%.3f'%(score2))
    Training set accuracy:  0.573
    Test set accuracy:  0.551

    我们的模型子训练数据集的正确率为 57%,在测试数据集上的正确率为 55%。虽然结果不是很好,但是我们先继续下一步。

    接下来我们建立需要配置到模型的数据框。

    首先我们加载 2018 年 4 月 FIFA 排名数据和小组赛分组状况的数据集。由于世界杯比赛中没有“主场”和“客场”,所以我们把 FIFA 排名靠前的队伍作为“喜爱”的比赛队伍,将他们放到“home_teams”(主场队伍)一列。然后我们根据每个队伍的排名将他们加入到新的预测数据集中。下一步是创建默认变量,并部署机器学习模型。

    2018 年 4 月 FIFA 排名数据:https://us.soccerway.com/teams/rankings/fifa/?ICID=TN_03_05_01

    小组赛分组状况的数据集:https://fixturedownload.com/results/fifa-world-cup-2018

    #adding Fifa rankings
    #the team which is positioned higher on the FIFA Ranking will be considered "favourite" for the match
    #and therefore, will be positioned under the "home_teams" column
    #since there are no "home" or "away" teams in World Cup games. 
    
    # Loading new datasets
    ranking = pd.read_csv('C:/Coding/FIFA2018-World-cup/datasets/fifa_rankings.csv') 
    fixtures = pd.read_csv('C:/Coding/FIFA2018-World-cup/datasets/fixtures.csv')
    
    # List for storing the group stage games
    pred_set = []
    
    
    # Create new columns with ranking position of each team
    fixtures.insert(1, 'first_position', fixtures['Home Team'].map(ranking.set_index('Team')['Position']))
    fixtures.insert(2, 'second_position', fixtures['Away Team'].map(ranking.set_index('Team')['Position']))
    
    # We only need the group stage games, so we have to slice the dataset
    fixtures = fixtures.iloc[:48, :]
    
    
    # Loop to add teams to new prediction dataset based on the ranking position of each team
    for index, row in fixtures.iterrows():
        if row['first_position'] < row['second_position']:
            pred_set.append({'home_team': row['Home Team'], 'away_team': row['Away Team'], 'winning_team': None})
        else:
            pred_set.append({'home_team': row['Away Team'], 'away_team': row['Home Team'], 'winning_team': None})
            
    pred_set = pd.DataFrame(pred_set)
    backup_pred_set = pred_set
    
    
    # Get dummy variables and drop winning_team column
    pred_set = pd.get_dummies(pred_set, prefix=['home_team', 'away_team'], columns=['home_team', 'away_team'])
    
    # Add missing columns compared to the model's training dataset
    missing_cols = set(final.columns) - set(pred_set.columns)
    for c in missing_cols:
        pred_set[c] = 0
    pred_set = pred_set[final.columns]
    
    # Remove winning team column
    pred_set = pred_set.drop(['winning_team'], axis=1)
    
    pred_set.head()

    比赛结果预测

    首先,我们将模型部署到小组赛中:

    #group matches 
    predictions = logreg.predict(pred_set)
    for i in range(fixtures.shape[0]):
        print(backup_pred_set.iloc[i, 1] + " and " + backup_pred_set.iloc[i, 0])
        if predictions[i] == 2:
            print("Winner: " + backup_pred_set.iloc[i, 1])
        elif predictions[i] == 1:
            print("Draw")
        elif predictions[i] == 0:
            print("Winner: " + backup_pred_set.iloc[i, 0])
        print('Probability of ' + backup_pred_set.iloc[i, 1] + ' winning: ', '%.3f'%(logreg.predict_proba(pred_set)[i][2]))
        print('Probability of Draw: ', '%.3f'%(logreg.predict_proba(pred_set)[i][1]))
        print('Probability of ' + backup_pred_set.iloc[i, 0] + ' winning: ', '%.3f'%(logreg.predict_proba(pred_set)[i][0]))
        print("")
    Russia and Saudi Arabia
    Winner: Russia
    Probability of Russia winning:  0.667
    Probability of Draw:  0.223
    Probability of Saudi Arabia winning:  0.111
    
    Uruguay and Egypt
    Winner: Uruguay
    Probability of Uruguay winning:  0.583
    Probability of Draw:  0.352
    Probability of Egypt winning:  0.065
    
    Iran and Morocco
    Draw
    Probability of Iran winning:  0.217
    Probability of Draw:  0.407
    Probability of Morocco winning:  0.376
    
    Portugal and Spain
    Winner: Spain
    Probability of Portugal winning:  0.302
    Probability of Draw:  0.344
    Probability of Spain winning:  0.354
    
    France and Australia
    Winner: France
    Probability of France winning:  0.628
    Probability of Draw:  0.227
    Probability of Australia winning:  0.145
    
    Argentina and Iceland
    Winner: Argentina
    Probability of Argentina winning:  0.803
    Probability of Draw:  0.161
    Probability of Iceland winning:  0.036
    
    Peru and Denmark
    Winner: Peru
    Probability of Peru winning:  0.439
    Probability of Draw:  0.171
    Probability of Denmark winning:  0.391
    
    Croatia and Nigeria
    Winner: Croatia
    Probability of Croatia winning:  0.590
    Probability of Draw:  0.258
    Probability of Nigeria winning:  0.152
    
    Costa Rica and Serbia
    Winner: Serbia
    Probability of Costa Rica winning:  0.315
    Probability of Draw:  0.324
    Probability of Serbia winning:  0.361
    
    Germany and Mexico
    Winner: Germany
    Probability of Germany winning:  0.567
    Probability of Draw:  0.282
    Probability of Mexico winning:  0.150
    
    Brazil and Switzerland
    Winner: Brazil
    Probability of Brazil winning:  0.775
    Probability of Draw:  0.138
    Probability of Switzerland winning:  0.087
    
    Sweden and Korea Republic
    Winner: Sweden
    Probability of Sweden winning:  0.503
    Probability of Draw:  0.329
    Probability of Korea Republic winning:  0.168
    
    Belgium and Panama
    Winner: Belgium
    Probability of Belgium winning:  0.765
    Probability of Draw:  0.145
    Probability of Panama winning:  0.090
    
    England and Tunisia
    Winner: England
    Probability of England winning:  0.649
    Probability of Draw:  0.292
    Probability of Tunisia winning:  0.059
    
    Colombia and Japan
    Winner: Colombia
    Probability of Colombia winning:  0.511
    Probability of Draw:  0.210
    Probability of Japan winning:  0.280
    
    Poland and Senegal
    Winner: Poland
    Probability of Poland winning:  0.612
    Probability of Draw:  0.223
    Probability of Senegal winning:  0.165
    
    Egypt and Russia
    Winner: Russia
    Probability of Egypt winning:  0.225
    Probability of Draw:  0.297
    Probability of Russia winning:  0.478
    
    Portugal and Morocco
    Winner: Portugal
    Probability of Portugal winning:  0.486
    Probability of Draw:  0.377
    Probability of Morocco winning:  0.138
    
    Uruguay and Saudi Arabia
    Winner: Uruguay
    Probability of Uruguay winning:  0.668
    Probability of Draw:  0.259
    Probability of Saudi Arabia winning:  0.073
    
    Spain and Iran
    Winner: Spain
    Probability of Spain winning:  0.695
    Probability of Draw:  0.247
    Probability of Iran winning:  0.058
    
    Denmark and Australia
    Winner: Denmark
    Probability of Denmark winning:  0.551
    Probability of Draw:  0.241
    Probability of Australia winning:  0.207
    
    France and Peru
    Winner: France
    Probability of France winning:  0.635
    Probability of Draw:  0.215
    Probability of Peru winning:  0.150
    
    Argentina and Croatia
    Winner: Argentina
    Probability of Argentina winning:  0.599
    Probability of Draw:  0.255
    Probability of Croatia winning:  0.146
    
    Brazil and Costa Rica
    Winner: Brazil
    Probability of Brazil winning:  0.800
    Probability of Draw:  0.147
    Probability of Costa Rica winning:  0.053
    
    Iceland and Nigeria
    Winner: Nigeria
    Probability of Iceland winning:  0.278
    Probability of Draw:  0.248
    Probability of Nigeria winning:  0.474
    
    Switzerland and Serbia
    Winner: Switzerland
    Probability of Switzerland winning:  0.402
    Probability of Draw:  0.228
    Probability of Serbia winning:  0.370
    
    Belgium and Tunisia
    Winner: Belgium
    Probability of Belgium winning:  0.619
    Probability of Draw:  0.253
    Probability of Tunisia winning:  0.128
    
    Mexico and Korea Republic
    Winner: Mexico
    Probability of Mexico winning:  0.504
    Probability of Draw:  0.327
    Probability of Korea Republic winning:  0.169
    
    Germany and Sweden
    Winner: Germany
    Probability of Germany winning:  0.571
    Probability of Draw:  0.228
    Probability of Sweden winning:  0.201
    
    England and Panama
    Winner: England
    Probability of England winning:  0.781
    Probability of Draw:  0.178
    Probability of Panama winning:  0.041
    
    Senegal and Japan
    Winner: Senegal
    Probability of Senegal winning:  0.397
    Probability of Draw:  0.278
    Probability of Japan winning:  0.325
    
    Poland and Colombia
    Draw
    Probability of Poland winning:  0.379
    Probability of Draw:  0.391
    Probability of Colombia winning:  0.230
    
    Uruguay and Russia
    Winner: Uruguay
    Probability of Uruguay winning:  0.403
    Probability of Draw:  0.388
    Probability of Russia winning:  0.209
    
    Egypt and Saudi Arabia
    Winner: Egypt
    Probability of Egypt winning:  0.544
    Probability of Draw:  0.216
    Probability of Saudi Arabia winning:  0.240
    
    Portugal and Iran
    Winner: Portugal
    Probability of Portugal winning:  0.548
    Probability of Draw:  0.353
    Probability of Iran winning:  0.099
    
    Spain and Morocco
    Winner: Spain
    Probability of Spain winning:  0.650
    Probability of Draw:  0.267
    Probability of Morocco winning:  0.083
    
    France and Denmark
    Winner: France
    Probability of France winning:  0.621
    Probability of Draw:  0.159
    Probability of Denmark winning:  0.220
    
    Peru and Australia
    Winner: Peru
    Probability of Peru winning:  0.463
    Probability of Draw:  0.250
    Probability of Australia winning:  0.288
    
    Argentina and Nigeria
    Winner: Argentina
    Probability of Argentina winning:  0.708
    Probability of Draw:  0.222
    Probability of Nigeria winning:  0.070
    
    Croatia and Iceland
    Winner: Croatia
    Probability of Croatia winning:  0.734
    Probability of Draw:  0.185
    Probability of Iceland winning:  0.080
    
    Mexico and Sweden
    Winner: Mexico
    Probability of Mexico winning:  0.465
    Probability of Draw:  0.264
    Probability of Sweden winning:  0.271
    
    Germany and Korea Republic
    Winner: Germany
    Probability of Germany winning:  0.598
    Probability of Draw:  0.282
    Probability of Korea Republic winning:  0.120
    
    Brazil and Serbia
    Winner: Brazil
    Probability of Brazil winning:  0.714
    Probability of Draw:  0.165
    Probability of Serbia winning:  0.120
    
    Switzerland and Costa Rica
    Winner: Switzerland
    Probability of Switzerland winning:  0.587
    Probability of Draw:  0.213
    Probability of Costa Rica winning:  0.200
    
    Poland and Japan
    Winner: Poland
    Probability of Poland winning:  0.551
    Probability of Draw:  0.242
    Probability of Japan winning:  0.206
    
    Colombia and Senegal
    Winner: Colombia
    Probability of Colombia winning:  0.577
    Probability of Draw:  0.194
    Probability of Senegal winning:  0.229
    
    Tunisia and Panama
    Winner: Tunisia
    Probability of Tunisia winning:  0.631
    Probability of Draw:  0.257
    Probability of Panama winning:  0.113
    
    Belgium and England
    Winner: England
    Probability of Belgium winning:  0.273
    Probability of Draw:  0.235
    Probability of England winning:  0.492

    之后进行16强的模拟:

    # List of tuples before 
    group_16 = [('Uruguay', 'Portugal'),
                ('France', 'Croatia'),
                ('Brazil', 'Mexico'),
                ('England', 'Colombia'),
                ('Spain', 'Russia'),
                ('Argentina', 'Peru'),
                ('Germany', 'Switzerland'),
                ('Poland', 'Belgium')]
    def clean_and_predict(matches, ranking, final, logreg):

        # Initialization of auxiliary list for data cleaning
        positions = []

        # Loop to retrieve each team's position according to FIFA ranking
        for match in matches:
            positions.append(ranking.loc[ranking['Team'] == match[0],'Position'].iloc[0])
            positions.append(ranking.loc[ranking['Team'] == match[1],'Position'].iloc[0])
        
        # Creating the DataFrame for prediction
        pred_set = []

        # Initializing iterators for while loop
        i = 0
        j = 0

        # 'i' will be the iterator for the 'positions' list, and 'j' for the list of matches (list of tuples)
        while i < len(positions):
            dict1 = {}

            # If position of first team is better, he will be the 'home' team, and vice-versa
            if positions[i] < positions[i + 1]:
                dict1.update({'home_team': matches[j][0], 'away_team': matches[j][1]})
            else:
                dict1.update({'home_team': matches[j][1], 'away_team': matches[j][0]})

            # Append updated dictionary to the list, that will later be converted into a DataFrame
            pred_set.append(dict1)
            i += 2
            j += 1

        # Convert list into DataFrame
        pred_set = pd.DataFrame(pred_set)
        backup_pred_set = pred_set

        # Get dummy variables and drop winning_team column
        pred_set = pd.get_dummies(pred_set, prefix=['home_team', 'away_team'], columns=['home_team', 'away_team'])

        # Add missing columns compared to the model's training dataset
        missing_cols2 = set(final.columns) - set(pred_set.columns)
        for c in missing_cols2:
            pred_set[c] = 0
        pred_set = pred_set[final.columns]

        # Remove winning team column
        pred_set = pred_set.drop(['winning_team'], axis=1)

        # Predict!
        predictions = logreg.predict(pred_set)
        for i in range(len(pred_set)):
            print(backup_pred_set.iloc[i, 1] + " and " + backup_pred_set.iloc[i, 0])
            if predictions[i] == 2:
                print("Winner: " + backup_pred_set.iloc[i, 1])
            elif predictions[i] == 1:
                print("Draw")
            elif predictions[i] == 0:
                print("Winner: " + backup_pred_set.iloc[i, 0])
            print('Probability of ' + backup_pred_set.iloc[i, 1] + ' winning: ' , '%.3f'%(logreg.predict_proba(pred_set)[i][2]))
            print('Probability of Draw: ', '%.3f'%(logreg.predict_proba(pred_set)[i][1]))
            print('Probability of ' + backup_pred_set.iloc[i, 0] + ' winning: ', '%.3f'%(logreg.predict_proba(pred_set)[i][0]))
            print("")

    clean_and_predict(group_16, ranking, final, logreg)
    Portugal and Uruguay
    Winner: Portugal
    Probability of Portugal winning:  0.428
    Probability of Draw:  0.285
    Probability of Uruguay winning:  0.287
    
    France and Croatia
    Winner: France
    Probability of France winning:  0.481
    Probability of Draw:  0.252
    Probability of Croatia winning:  0.267
    
    Brazil and Mexico
    Winner: Brazil
    Probability of Brazil winning:  0.695
    Probability of Draw:  0.209
    Probability of Mexico winning:  0.096
    
    England and Colombia
    Winner: England
    Probability of England winning:  0.516
    Probability of Draw:  0.368
    Probability of Colombia winning:  0.116
    
    Spain and Russia
    Winner: Spain
    Probability of Spain winning:  0.529
    Probability of Draw:  0.280
    Probability of Russia winning:  0.191
    
    Argentina and Peru
    Winner: Argentina
    Probability of Argentina winning:  0.713
    Probability of Draw:  0.212
    Probability of Peru winning:  0.075
    
    Germany and Switzerland
    Winner: Germany
    Probability of Germany winning:  0.672
    Probability of Draw:  0.192
    Probability of Switzerland winning:  0.137
    
    Belgium and Poland
    Winner: Belgium
    Probability of Belgium winning:  0.513
    Probability of Draw:  0.202
    Probability of Poland winning:  0.285

    之后依次进行四分之一、半决赛、决赛的模拟:

    四分之一:

    # List of matches
    quarters = [('Portugal', 'France'),
                ('Spain', 'Argentina'),
                ('Brazil', 'England'),
                ('Germany', 'Belgium')]
    clean_and_predict(quarters, ranking, final, logreg)
    Portugal and France
    Winner: Portugal
    Probability of Portugal winning:  0.437
    Probability of Draw:  0.256
    Probability of France winning:  0.307
    
    Argentina and Spain
    Winner: Argentina
    Probability of Argentina winning:  0.518
    Probability of Draw:  0.262
    Probability of Spain winning:  0.220
    
    Brazil and England
    Winner: Brazil
    Probability of Brazil winning:  0.525
    Probability of Draw:  0.216
    Probability of England winning:  0.260
    
    Germany and Belgium
    Winner: Germany
    Probability of Germany winning:  0.563
    Probability of Draw:  0.269
    Probability of Belgium winning:  0.167
    半决赛:
    # List of matches
    semi = [('Portugal', 'Brazil'),
            ('Argentina', 'Germany')]
    clean_and_predict(semi, ranking, final, logreg)
    Brazil and Portugal
    Winner: Brazil
    Probability of Brazil winning:  0.705
    Probability of Draw:  0.152
    Probability of Portugal winning:  0.143
    
    Germany and Argentina
    Winner: Germany
    Probability of Germany winning:  0.441
    Probability of Draw:  0.264
    Probability of Argentina winning:  0.295
    决赛:
    # Finals
    finals = [('Brazil', 'Germany')]
    clean_and_predict(finals, ranking, final, logreg)
    Germany and Brazil
    Winner: Brazil
    Probability of Germany winning:  0.359
    Probability of Draw:  0.220
    Probability of Brazil winning:  0.421
    

    写在最后

    根据该模型,巴西将有可能获得本届世界杯的冠军。

    进一步的研究和提高领域:

    • 为提高数据集的质量,可以利用 FIFA 的比赛数据评估每个球员的水平;
    • 混淆矩阵可以帮助我们分析模型预测的哪场有误;
    • 我们可以尝试将多个模型组合在一起,提高预测准确度。
  • 相关阅读:
    C#中的转义字符verbatim string
    how to use Inspector in fiddler
    how to use composer in fiddler
    CodeWars上的JavaScript技巧积累
    What's the difference between using “let” and “var” to declare a variable in JavaScript?
    Loop through an array in JavaScript
    Why does typeof array with objects return “Object” and not “Array”?
    Owin and Startup class
    Qt Widgets、QML、Qt Quick的区别
    飞舞的蝴蝶(GraphicsView框架)
  • 原文地址:https://www.cnblogs.com/jlutiger/p/9184914.html
Copyright © 2011-2022 走看看