zoukankan      html  css  js  c++  java
  • PyconChina2015丁来强Pydata Ecosystem

    pydata ecosystem基于python的数据分析生态系统

    0.

    Agenda

    Data Science ecosystem

    Data Wrangling

    Data Analysis

    Data Visualization

    3 Real Case Demo

    Bigger Data Consideration

    Spark Data Frame Demo

    1.

    Data Science Process

    Data Collection

    Databases

    Applications

    3rdpart data

    Data Wrangling

    Enrichment

    ETL/Blending

    Data

    Intergration

    Data Analysis

    insights

    Statistics

    Visualization

    Modeling

    2.

    Data Wrangling

    Data scientists spend 80% of their time convert data into a usable form.

    Clean data:handle messy or missed data

    Transform and Extract data

    Merge,Join and Reshape data

    Time series Resampling

    3.Data Analysis

    Interactive Data Exploration

    Rich visualzation

    Satistical Modeling

    4.python vs R

    TIOBE Index

    5.Pros and Cons

    R+visualization = perfect match

    R,Lingua Franca of Statistics(develop by Statistics)

    R is slow

    Python is multi-purpose language

    Python is challenger for either visualization or essential R packages replacement

    6.PyData Ecosystem

    Fundamental Libs

    numpyscipy

    AdvancedLibs

    pandassympyScikit-leanxrayBlaze

    7.Numpy

    High performance N-Arrary operation lib

    高性能多维

    8.pands

    打包

    9.Blaze

    High-level user interface for databases and array computing systems

    10.Spark

    11.DataFrame

    12.matplotlib

    13.seaborn

    14.Bokeh

    15.IPython

  • 相关阅读:
    医疗设备软件的安全性问答
    python使用技巧
    C++对象模型
    面向对象方法综述
    如何设计可扩展性系统架构
    敏捷过程
    python中import的相关知识总结
    软件架构的关键原则
    读后感——程序员的思维修炼
    LINUX系统备份工具
  • 原文地址:https://www.cnblogs.com/jsben/p/5018477.html
Copyright © 2011-2022 走看看