zoukankan      html  css  js  c++  java
  • A Small End-to-End Project

    Start

    Books and courses are frustrating. They give you lots of recipes and snippets, but you never get to see how they all fit together.

    When you are applying machine learning to your own datasets, you are working on a project.

    A machine learning project may not be linear, but it has a number of well known steps:

    1. Define Problem.
    2. Prepare Data.
    3. Evaluate Algorithms.
    4. Improve Results.
    5. Present Results.

    The best way to really come to terms with a new platform or tool is to work through a machine learning project end-to-end and cover the key steps. Namely, from loading data, summarizing data, evaluating algorithms and making some predictions.

    If you can do that, you have a template that you can use on dataset after dataset. You can fill in the gaps such as further data preparation and improving result tasks later, once you have more confidence.

    Machine Learning in Python: Step-By-Step Tutorial

    Downloading, Installing and Starting Python SciPy

    Install SciPy Libraries

    There are 5 key libraries that you will need to install. Below is a list of the Python SciPy libraries required for this tutorial:

    • scipy
    • numpy
    • matplotlib
    • pandas
    • sklearn

    Start Python and Check Versions

    # Check the versions of libraries
     
    # Python version
    import sys
    print('Python: {}'.format(sys.version))
    # scipy
    import scipy
    print('scipy: {}'.format(scipy.__version__))
    # numpy
    import numpy
    print('numpy: {}'.format(numpy.__version__))
    # matplotlib
    import matplotlib
    print('matplotlib: {}'.format(matplotlib.__version__))
    # pandas
    import pandas
    print('pandas: {}'.format(pandas.__version__))
    # scikit-learn
    import sklearn
    print('sklearn: {}'.format(sklearn.__version__))
    

    Ouput:

    Python: 2.7.13 |Anaconda 4.4.0 (x86_64)| (default, Dec 20 2016, 23:05:08)
    [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
    scipy: 0.19.0
    numpy: 1.12.1
    matplotlib: 2.0.2
    pandas: 0.20.1
    sklearn: 0.18.1
    

    Load the Data

    Import libraries

    Load Dataset

    Summarize the Dataset

    In this step we are going to take a look at the data a few different ways:

    1. Dimensions of the dataset.
    2. Peek at the data itself.
    3. Statistical summary of all attributes.
    4. Breakdown of the data by the class variable.

    Dimensions of Dataset

    Peek at the Data

    Statistical Summary

    Class Distribution

    Data Visualization

    We now have a basic idea about the data. We need to extend that with some visualizations.

    We are going to look at two types of plots:

    1. Univariate plots to better understand each attribute.
    2. Multivariate plots to better understand the relationships between attributes.

    Univariate Plots

    Multivariate Plots

    Evaluate Some Algorithms

    Now it is time to create some models of the data and estimate their accuracy on unseen data.

    Here is what we are going to cover in this step:

    1. Separate out a validation dataset.
    2. Set-up the test harness to use 10-fold cross validation.
    3. Build 5 different models to predict species from flower measurements
    4. Select the best model.

    Create a Validation Dataset

    Test Harness

    Build Models

    Let’s evaluate 6 different algorithms: (Classification)

    1. Logistic Regression (LR)
    2. Linear Discriminant Analysis (LDA)
    3. K-Nearest Neighbors (KNN).
    4. Classification and Regression Trees (CART).
    5. Gaussian Naive Bayes (NB).
    6. Support Vector Machines (SVM).

    Select Best Model

    Make Predictions

    Reference

    1. Your First Machine Learning Project in Python Step-By-Step
  • 相关阅读:
    将文件写进数据库的方法
    立个Flag
    JQuery_学习1
    js制作一个简单的选项卡
    输出数据库中的表格的内容(pdo连接)
    不饮鸡汤的寂寞先生
    详细谈Session
    详细谈Cookie
    php字符串操作函数练习2
    ios开发网络学习五:MiMEType ,多线程下载文件思路,文件的压缩和解压缩
  • 原文地址:https://www.cnblogs.com/bermaker/p/9384850.html
Copyright © 2011-2022 走看看