zoukankan      html  css  js  c++  java
  • pasty描述性统计包

    http://patsy.readthedocs.io/en/latest/overview.html

    pasty功能:线性分析里因素分析(方差分析)

    and Patsy takes care of building appropriate matrices. Furthermore, it:

    • Allows data transformations to be specified using arbitrary Python code: instead of x, we could have written log(x), (x > 0), or even log(x) if x > 1e-5 else log(1e-5),
    • Provides a range of convenient options for coding categorical variables, including automatic detection and removal of redundancies,
    • Knows how to apply ‘the same’ transformation used on original data to new data, even for tricky transformations like centering or standardization (critical if you want to use your model to make predictions),
    • Has an incremental mode to handle data sets which are too large to fit into memory at one time,
    • Provides a language for symbolic, human-readable specification of linear constraint matrices,
    • Has a thorough test suite (>97% statement coverage) and solid underlying theory, allowing it to correctly handle corner cases that even R gets wrong, and
    • Features a simple API for integration into statistical packages.

    pasty不能做的模型分析,只是提供描述性统计的高级接口

    What Patsy won’t do is, well, statistics — it just lets you describe models in general terms. It doesn’t know or care whether you ultimately want to do linear regression, time-series analysis, or fit a forest of decision trees, and it certainly won’t do any of those things for you — it just gives a high-level language for describing which factors you want your underlying model to take into account. It’s not suitable for implementing arbitrary non-linear models from scratch; for that, you’ll be better off with something like Theano, SymPy, or just plain Python. But if you’re using a statistical package that requires you to provide a raw model matrix, then you can use Patsy to painlessly construct that model matrix; and if you’re the author of a statistics package, then I hope you’ll consider integrating Patsy as part of your front-end.

    Patsy’s goal is to become the standard high-level interface to describing statistical models in Python, regardless of what particular model or library is being used underneath.

    pasty函数可以自定义

    I()让+表示算术模式加号

    Arithmetic transformations are also possible, but you’ll need to “protect” them by wrapping them in I(), so that Patsy knows that you really do want + to mean addition:

    In [23]: dmatrix("I(x1 + x2)", data)  # compare to "x1 + x2"
    Out[23]: 
    DesignMatrix with shape (8, 2)
      Intercept  I(x1 + x2)
              1     1.66083
              1     0.81076
              1     1.12278
              1     3.69517
              1     2.62860
              1    -0.85560
              1     1.39395
              1     0.18232
      Terms:
        'Intercept' (column 0)
        'I(x1 + x2)' (column 1)
    
    In [24]: dmatrix("I(x1 + x2)", {"x1": np.array([1, 2, 3]), "x2": np.array([4, 5, 6])})
    Out[24]: 
    DesignMatrix with shape (3, 2)
      Intercept  I(x1 + x2)
              1           5
              1           7
              1           9
      Terms:
        'Intercept' (column 0)
        'I(x1 + x2)' (column 1)
    
    In [25]: dmatrix("I(x1 + x2)", {"x1": [1, 2, 3], "x2": [4, 5, 6]})
    Out[25]: 
    DesignMatrix with shape (6, 2)
      Intercept  I(x1 + x2)
              1           1
              1           2
              1           3
              1           4
              1           5
              1           6
      Terms:
        'Intercept' (column 0)
        'I(x1 + x2)' (column 1)

    # ---------------------------------------------------------------
    def anova_statsmodels():
        ''' do the ANOVA with a function '''
        
        # Get the data
        data = pd.read_csv('galton.csv')
        #sex是性别,属于分类变量
        anova_results = anova_lm(ols('height~C(sex)', data).fit())
        print(' ANOVA with "statsmodels" ------------------------------')
        print(anova_results)
        
        return anova_results['F'][0]

     https://study.163.com/provider/400000000398149/index.htm?share=2&shareId=400000000398149( 欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章)

     
     
  • 相关阅读:
    jMeter 里 CSV Data Set Config Sharing Mode 的含义详解
    如何使用 jMeter Parallel Controller
    使用 Chrome 开发者工具 coverage 功能分析 web 应用的渲染阻止资源的执行分布情况
    使用 Chrome 开发者工具的 lighthouse 功能分析 web 应用的性能问题
    关于 SAP 电商云首页加载时触发的 OCC API 请求
    SAP UI5 确保控件 id 全局唯一的实现方法
    SAP 电商云 Accelerator 和 Spartacus UI 的工作机制差异
    介绍一个好用的能让网页变成黑色背景的护眼 Chrome 扩展应用
    Chrome 开发者工具 performance 标签页的用法
    Client Side Cache 和 Server Side Cache 的区别
  • 原文地址:https://www.cnblogs.com/webRobot/p/6907860.html
Copyright © 2011-2022 走看看