zoukankan      html  css  js  c++  java
  • PyconChina2015丁来强Pydata Ecosystem

    pydata ecosystem基于python的数据分析生态系统

    0.

    Agenda

    Data Science ecosystem

    Data Wrangling

    Data Analysis

    Data Visualization

    3 Real Case Demo

    Bigger Data Consideration

    Spark Data Frame Demo

    1.

    Data Science Process

    Data Collection

    Databases

    Applications

    3rdpart data

    Data Wrangling

    Enrichment

    ETL/Blending

    Data

    Intergration

    Data Analysis

    insights

    Statistics

    Visualization

    Modeling

    2.

    Data Wrangling

    Data scientists spend 80% of their time convert data into a usable form.

    Clean data:handle messy or missed data

    Transform and Extract data

    Merge,Join and Reshape data

    Time series Resampling

    3.Data Analysis

    Interactive Data Exploration

    Rich visualzation

    Satistical Modeling

    4.python vs R

    TIOBE Index

    5.Pros and Cons

    R+visualization = perfect match

    R,Lingua Franca of Statistics(develop by Statistics)

    R is slow

    Python is multi-purpose language

    Python is challenger for either visualization or essential R packages replacement

    6.PyData Ecosystem

    Fundamental Libs

    numpyscipy

    AdvancedLibs

    pandassympyScikit-leanxrayBlaze

    7.Numpy

    High performance N-Arrary operation lib

    高性能多维

    8.pands

    打包

    9.Blaze

    High-level user interface for databases and array computing systems

    10.Spark

    11.DataFrame

    12.matplotlib

    13.seaborn

    14.Bokeh

    15.IPython

  • 相关阅读:
    6-1面向对象
    5-1模块
    python随机数
    4-5目录
    4-4内置函数
    4-3迭代器和生成器
    4-1装饰器1
    4-2装饰器2
    3-4函数-全局变量
    3-5递归-函数
  • 原文地址:https://www.cnblogs.com/jsben/p/5018477.html
Copyright © 2011-2022 走看看