zoukankan      html  css  js  c++  java
  • A Small End-to-End Project

    Start

    Books and courses are frustrating. They give you lots of recipes and snippets, but you never get to see how they all fit together.

    When you are applying machine learning to your own datasets, you are working on a project.

    A machine learning project may not be linear, but it has a number of well known steps:

    1. Define Problem.
    2. Prepare Data.
    3. Evaluate Algorithms.
    4. Improve Results.
    5. Present Results.

    The best way to really come to terms with a new platform or tool is to work through a machine learning project end-to-end and cover the key steps. Namely, from loading data, summarizing data, evaluating algorithms and making some predictions.

    If you can do that, you have a template that you can use on dataset after dataset. You can fill in the gaps such as further data preparation and improving result tasks later, once you have more confidence.

    Machine Learning in Python: Step-By-Step Tutorial

    Downloading, Installing and Starting Python SciPy

    Install SciPy Libraries

    There are 5 key libraries that you will need to install. Below is a list of the Python SciPy libraries required for this tutorial:

    • scipy
    • numpy
    • matplotlib
    • pandas
    • sklearn

    Start Python and Check Versions

    # Check the versions of libraries
     
    # Python version
    import sys
    print('Python: {}'.format(sys.version))
    # scipy
    import scipy
    print('scipy: {}'.format(scipy.__version__))
    # numpy
    import numpy
    print('numpy: {}'.format(numpy.__version__))
    # matplotlib
    import matplotlib
    print('matplotlib: {}'.format(matplotlib.__version__))
    # pandas
    import pandas
    print('pandas: {}'.format(pandas.__version__))
    # scikit-learn
    import sklearn
    print('sklearn: {}'.format(sklearn.__version__))
    

    Ouput:

    Python: 2.7.13 |Anaconda 4.4.0 (x86_64)| (default, Dec 20 2016, 23:05:08)
    [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
    scipy: 0.19.0
    numpy: 1.12.1
    matplotlib: 2.0.2
    pandas: 0.20.1
    sklearn: 0.18.1
    

    Load the Data

    Import libraries

    Load Dataset

    Summarize the Dataset

    In this step we are going to take a look at the data a few different ways:

    1. Dimensions of the dataset.
    2. Peek at the data itself.
    3. Statistical summary of all attributes.
    4. Breakdown of the data by the class variable.

    Dimensions of Dataset

    Peek at the Data

    Statistical Summary

    Class Distribution

    Data Visualization

    We now have a basic idea about the data. We need to extend that with some visualizations.

    We are going to look at two types of plots:

    1. Univariate plots to better understand each attribute.
    2. Multivariate plots to better understand the relationships between attributes.

    Univariate Plots

    Multivariate Plots

    Evaluate Some Algorithms

    Now it is time to create some models of the data and estimate their accuracy on unseen data.

    Here is what we are going to cover in this step:

    1. Separate out a validation dataset.
    2. Set-up the test harness to use 10-fold cross validation.
    3. Build 5 different models to predict species from flower measurements
    4. Select the best model.

    Create a Validation Dataset

    Test Harness

    Build Models

    Let’s evaluate 6 different algorithms: (Classification)

    1. Logistic Regression (LR)
    2. Linear Discriminant Analysis (LDA)
    3. K-Nearest Neighbors (KNN).
    4. Classification and Regression Trees (CART).
    5. Gaussian Naive Bayes (NB).
    6. Support Vector Machines (SVM).

    Select Best Model

    Make Predictions

    Reference

    1. Your First Machine Learning Project in Python Step-By-Step
  • 相关阅读:
    IDEA debug时特慢 Method breakpoints may dramatically slow down debugging
    docker构建镜像
    ubuntu 挂载硬盘
    python 的 flask 、django 、tornado 、sanic
    scrapy实战之scrapyrt的使用
    scrapy框架集成http
    python3之Splash
    CentOS7安装PostgreSQL9.6(图文详细操作)
    替代Navicat的数据库操作工具DBeaver
    CentOS 7 安装 Graylog
  • 原文地址:https://www.cnblogs.com/bermaker/p/9384850.html
Copyright © 2011-2022 走看看