zoukankan      html  css  js  c++  java
  • A Small End-to-End Project

    Start

    Books and courses are frustrating. They give you lots of recipes and snippets, but you never get to see how they all fit together.

    When you are applying machine learning to your own datasets, you are working on a project.

    A machine learning project may not be linear, but it has a number of well known steps:

    1. Define Problem.
    2. Prepare Data.
    3. Evaluate Algorithms.
    4. Improve Results.
    5. Present Results.

    The best way to really come to terms with a new platform or tool is to work through a machine learning project end-to-end and cover the key steps. Namely, from loading data, summarizing data, evaluating algorithms and making some predictions.

    If you can do that, you have a template that you can use on dataset after dataset. You can fill in the gaps such as further data preparation and improving result tasks later, once you have more confidence.

    Machine Learning in Python: Step-By-Step Tutorial

    Downloading, Installing and Starting Python SciPy

    Install SciPy Libraries

    There are 5 key libraries that you will need to install. Below is a list of the Python SciPy libraries required for this tutorial:

    • scipy
    • numpy
    • matplotlib
    • pandas
    • sklearn

    Start Python and Check Versions

    # Check the versions of libraries
     
    # Python version
    import sys
    print('Python: {}'.format(sys.version))
    # scipy
    import scipy
    print('scipy: {}'.format(scipy.__version__))
    # numpy
    import numpy
    print('numpy: {}'.format(numpy.__version__))
    # matplotlib
    import matplotlib
    print('matplotlib: {}'.format(matplotlib.__version__))
    # pandas
    import pandas
    print('pandas: {}'.format(pandas.__version__))
    # scikit-learn
    import sklearn
    print('sklearn: {}'.format(sklearn.__version__))
    

    Ouput:

    Python: 2.7.13 |Anaconda 4.4.0 (x86_64)| (default, Dec 20 2016, 23:05:08)
    [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
    scipy: 0.19.0
    numpy: 1.12.1
    matplotlib: 2.0.2
    pandas: 0.20.1
    sklearn: 0.18.1
    

    Load the Data

    Import libraries

    Load Dataset

    Summarize the Dataset

    In this step we are going to take a look at the data a few different ways:

    1. Dimensions of the dataset.
    2. Peek at the data itself.
    3. Statistical summary of all attributes.
    4. Breakdown of the data by the class variable.

    Dimensions of Dataset

    Peek at the Data

    Statistical Summary

    Class Distribution

    Data Visualization

    We now have a basic idea about the data. We need to extend that with some visualizations.

    We are going to look at two types of plots:

    1. Univariate plots to better understand each attribute.
    2. Multivariate plots to better understand the relationships between attributes.

    Univariate Plots

    Multivariate Plots

    Evaluate Some Algorithms

    Now it is time to create some models of the data and estimate their accuracy on unseen data.

    Here is what we are going to cover in this step:

    1. Separate out a validation dataset.
    2. Set-up the test harness to use 10-fold cross validation.
    3. Build 5 different models to predict species from flower measurements
    4. Select the best model.

    Create a Validation Dataset

    Test Harness

    Build Models

    Let’s evaluate 6 different algorithms: (Classification)

    1. Logistic Regression (LR)
    2. Linear Discriminant Analysis (LDA)
    3. K-Nearest Neighbors (KNN).
    4. Classification and Regression Trees (CART).
    5. Gaussian Naive Bayes (NB).
    6. Support Vector Machines (SVM).

    Select Best Model

    Make Predictions

    Reference

    1. Your First Machine Learning Project in Python Step-By-Step
  • 相关阅读:
    什么叫精通C++
    编程好书推荐
    Reading Notes ofC Traps and Pitfalls
    继承的小问题
    关于strcpy函数
    #pragma once 与 #ifndef 的区别解析
    模板类的友元重载函数
    NET开发人员必知的八个网站
    获取MDI窗体的实例
    .Net下收发邮件第三方公共库
  • 原文地址:https://www.cnblogs.com/bermaker/p/9384850.html
Copyright © 2011-2022 走看看