zoukankan      html  css  js  c++  java
  • Kaggle:Titanic: Machine Learning from Disaster

    一直想着抓取股票的变化,偶然的机会在看股票数据抓取的博客看到了kaggle,然后看了看里面的题,感觉挺新颖的,就试了试。

    题目如图:给了一个train.csv,现在预测test.csv里面的Passager是否幸存。train.csv里面包含的乘客信息有

    PassagerId 乘客id
    Survived 乘客是否幸存
    Pclass 仓位
    Name 乘客姓名
    Sex 乘客性别
    Age 乘客年龄
    SibSp 船上是否有兄弟姐妹
    Parch 穿上是否有父母子女
    Ticket 船票信息
    Fare 票价
    Cabin 客舱
    Embarked 上船地址

    然后表里面的Sibsp,Parch,Name,PassagerId,Ticket,Cabin都是些数据无关的信息。

     然后用到了随机森林算法。

    #-*- coding:utf-8 -*-
    import numpy as np # linear algebra
    import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
    from subprocess import check_outputimport csv
    import random as rnd
    import seaborn as sns
    import matplotlib.pyplot as plt
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.cross_validation import cross_val_score
    from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
    train_df = pd.read_csv('train.csv', header=0)
    test_df = pd.read_csv('test.csv', header=0)
    df = pd.concat([train_df, test_df])
    df = df.reset_index()
    df = df.drop('index',axis=1)
    #移除index列
    df = df.reindex_axis(train_df.columns,axis=1)
    #填补合并之后的表中 属性是Age,Fare,Embarked为空的值
    df['Age'][df['Age'].isnull()] = df['Age'].median()
    df['Fare'][df['Fare'].isnull()] = df['Fare'].median()
    df['Embarked'][df['Embarked'].isnull()] = df['Embarked'].mode().values
    #将表中的Sex属性做映射
    df['Sex'] = pd.factorize(df['Sex'])[0]
    df['Embarked'] = pd.factorize(df['Embarked'])[0]
    df['family_member'] = df['SibSp'] + df['Parch']
    #移除表中的'Cabin','Ticke t','Name','SibSp','Parch','PassengerId'属性
    d= df.drop(['Cabin','Ticke t','Name','SibSp','Parch','PassengerId'],axis=1)
    survived_member = df[df['Survived'].notnull()].values
    test_message = df[df['Survived'].isnull()].values
    Y = survived_member[:, 0].astype(int)
    #取servived属性不为空的属性的第一列
    X = survived_member[:, 1:].astype(int)
    #取servived属性不为空的出第一列之外的所有信息
    result = RandomForestClassifier(n_estimators=1000, random_state=312, min_samples_leaf=3).fit(X, Y)
    #随机森林算法
    pre = result.predict(test_message[:, 1:]).astype(int)
    Id = test_df['PassengerId']
    result_csv = open('result1.csv',"w")
    result_fd = csv.writer(result_csv)
    result_fd.writerow(['PassengerId','Survived'])
    result_fd.writerows(zip(Id,pre))
    result_csv.close()
  • 相关阅读:
    CodeForces 219D Choosing Capital for Treeland (树形DP)
    POJ 3162 Walking Race (树的直径,单调队列)
    POJ 2152 Fire (树形DP,经典)
    POJ 1741 Tree (树的分治,树的重心)
    POJ 1655 Balancing Act (树的重心,常规)
    HDU 2196 Computer (树形DP)
    HDU 1520 Anniversary party (树形DP,入门)
    寒门子弟
    JQuery选择器(转)
    (四)Web应用开发---系统架构图
  • 原文地址:https://www.cnblogs.com/chenyang920/p/7248138.html
Copyright © 2011-2022 走看看