zoukankan      html  css  js  c++  java
  • Kaggle竞赛入门(四):随机森林算法的Python实现

    首先导入数据,将数据分为训练集和测试集:

    import pandas as pd
        
    # Load data
    melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
    melbourne_data = pd.read_csv(melbourne_file_path) 
    # Filter rows with missing values
    melbourne_data = melbourne_data.dropna(axis=0)
    # Choose target and features
    y = melbourne_data.Price
    melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'BuildingArea', 
                            'YearBuilt', 'Lattitude', 'Longtitude']
    X = melbourne_data[melbourne_features]
    
    from sklearn.model_selection import train_test_split
    
    # split data into training and validation data, for both features and target
    # The split is based on a random number generator. Supplying a numeric value to
    # the random_state argument guarantees we get the same split every time we
    # run this script.
    train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)

    导入sklearn当中随机森立算法实现的包,拟合模型并求出平均误差:

    from sklearn.ensemble import RandomForestRegressor
    from sklearn.metrics import mean_absolute_error
    
    forest_model = RandomForestRegressor(random_state=1)
    forest_model.fit(train_X, train_y)
    melb_preds = forest_model.predict(val_X)
    print(mean_absolute_error(val_y, melb_preds))

    输出:

    202888.18157951365
    /opt/conda/lib/python3.6/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
      "10 in version 0.20 to 100 in 0.22.", FutureWarning)

    得解。

  • 相关阅读:
    257. Binary Tree Paths
    324. Wiggle Sort II
    315. Count of Smaller Numbers After Self
    350. Intersection of Two Arrays II
    295. Find Median from Data Stream
    289. Game of Life
    287. Find the Duplicate Number
    279. Perfect Squares
    384. Shuffle an Array
    E
  • 原文地址:https://www.cnblogs.com/geeksongs/p/12637595.html
Copyright © 2011-2022 走看看