  • scikitlearn: machine learning in Python

    scikit-learn: machine learning in Python — scikit-learn 0.12.1 documentation

    scikit-learn: machine learning in Python


    Easy-to-use and general-purpose machine learning in Python

    scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit scientific Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems, accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering.

    Supervised learning
    Support vector machines, linear models, naive Bayes, Gaussian processes...

    Unsupervised learning
    Clustering, Gaussian mixture models, manifold learning, matrix factorization, covariance...

    And much more
    Model selection, datasets, feature extraction... See below.

    License: Open source, commercially usable: BSD license (3 clause)

    Documentation for scikit-learn version 0.12.1. For other versions and printable format, see Documentation resources.
    User Guide¶

        1. Installing scikit-learn
            1.1. Installing an official release
            1.2. Third party distributions of scikit-learn
            1.3. Bleeding Edge
            1.4. Testing
        2. Tutorials: From the bottom up with scikit-learn
            2.1. An Introduction to machine learning with scikit-learn
            2.2. A tutorial on statistical-learning for scientific data processing
        3. Supervised learning
            3.1. Generalized Linear Models
            3.2. Support Vector Machines
            3.3. Stochastic Gradient Descent
            3.4. Nearest Neighbors
            3.5. Gaussian Processes
            3.6. Partial Least Squares
            3.7. Naive Bayes
            3.8. Decision Trees
            3.9. Ensemble methods
            3.10. Multiclass and multilabel algorithms
            3.11. Feature selection
            3.12. Semi-Supervised
            3.13. Linear and Quadratic Discriminant Analysis
        4. Unsupervised learning
            4.1. Gaussian mixture models
            4.2. Manifold learning
            4.3. Clustering
            4.4. Decomposing signals in components (matrix factorization problems)
            4.5. Covariance estimation
            4.6. Novelty and Outlier Detection
            4.7. Hidden Markov Models
        5. Model Selection
            5.1. Cross-Validation: evaluating estimator performance
            5.2. Grid Search: setting estimator parameters
            5.3. Pipeline: chaining estimators
        6. Dataset transformations
            6.1. Preprocessing data
            6.2. Feature extraction
            6.3. Kernel Approximation
        7. Dataset loading utilities
            7.1. General dataset API
            7.2. Toy datasets
            7.3. Sample images
            7.4. Sample generators
            7.5. Datasets in svmlight / libsvm format
            7.6. The Olivetti faces dataset
            7.7. The 20 newsgroups text dataset
            7.8. Downloading datasets from the mldata.org repository
            7.9. The Labeled Faces in the Wild face recognition dataset
        8. Reference
            8.1. sklearn.cluster: Clustering
            8.2. sklearn.covariance: Covariance Estimators
            8.3. sklearn.cross_validation: Cross Validation
            8.4. sklearn.datasets: Datasets
            8.5. sklearn.decomposition: Matrix Decomposition
            8.6. sklearn.ensemble: Ensemble Methods
            8.7. sklearn.feature_extraction: Feature Extraction
            8.8. sklearn.feature_selection: Feature Selection
            8.9. sklearn.gaussian_process: Gaussian Processes
            8.10. sklearn.grid_search: Grid Search
            8.11. sklearn.hmm: Hidden Markov Models
            8.12. sklearn.kernel_approximation Kernel Approximation
            8.13. sklearn.semi_supervised Semi-Supervised Learning
            8.14. sklearn.lda: Linear Discriminant Analysis
            8.15. sklearn.linear_model: Generalized Linear Models
            8.16. sklearn.manifold: Manifold Learning
            8.17. sklearn.metrics: Metrics
            8.18. sklearn.mixture: Gaussian Mixture Models
            8.19. sklearn.multiclass: Multiclass and multilabel classification
            8.20. sklearn.naive_bayes: Naive Bayes
            8.21. sklearn.neighbors: Nearest Neighbors
            8.22. sklearn.pls: Partial Least Squares
            8.23. sklearn.pipeline: Pipeline
            8.24. sklearn.preprocessing: Preprocessing and Normalization
            8.25. sklearn.qda: Quadratic Discriminant Analysis
            8.26. sklearn.svm: Support Vector Machines
            8.27. sklearn.tree: Decision Trees
            8.28. sklearn.utils: Utilities

    Example Gallery¶

            General examples
            Examples based on real world datasets
            Covariance estimation
            Dataset examples
            Ensemble methods
            Tutorial exercises
            Gaussian Process for Machine Learning
            Generalized Linear Models
            Manifold learning
            Gaussian Mixture Models
            Nearest Neighbors
            Semi Supervised Classification
            Support Vector Machines
            Decision Trees


            Submitting a bug report
            Retrieving the latest code
            Contributing code
            Other ways to contribute
            Coding guidelines
            APIs of scikit-learn objects
        How to optimize for speed
            Python, Cython or C/C++?
            Profiling Python code
            Memory usage profiling
            Performance tips for the Cython developer
            Profiling compiled extensions
            Multi-core parallelism using joblib.Parallel
            A sample algorithmic trick: warm restarts for cross validation
        Utilities for Developers
            Validation Tools
            Efficient Linear Algebra & Array Operations
            Efficient Routines for Sparse Matrices
            Graph Routines
            Testing Functions
            Helper Functions
            Hash Functions
            Warnings and Exceptions
        Developers’ Tips for Debugging
            Memory errors: debugging Cython with valgrind
        About us
            Citing scikit-learn
