zoukankan      html  css  js  c++  java
  • 统计与概率论

    首先,介绍几个概念:

    1.  IQR:interquartile range. = Q3- Q1

        outlier: 异常值。

    2.  如何判断一个值是否为outlier?? ------ 使用1.5* IQR rule。

    例如: 一系列数据为: 1,1,6,13,13,14,14,14,15,15,16,18,18,18,19。 

      step 1:  找到1/4位数,1/2即中位数,3/4位数。此题中,Q1=13, median=14, Q3=18.

      step 2: calculate IQR = Q3-Q1 = 18-13=5. 

      step 3:  judge.  If a number x< Q1-1.5*IQR   ,   OR    x > Q3+1.5*IQR,    x is an outlier. For example, given a nunmber 30. Because  30 >18+1.5*5=25.5,   30 is judged to be a outlier. 

       以上。


    3. population variance 的另一种计算公式:

          ∑ (xi^2)/n  - µ^2 

    4. 什么是z-score?

      z-score, is also called z-value, standard score, normal score...   In normal distribution, given a value x,   z-score is equal to  (x - µ)/sigma .   Z- score shows how far away a single data is from the mean relatively

       For example, a normal distribution with µ =81, sigma=6.3.    for x=65,  Z-score= (65-81)/6.3 = -2.54 3. empirical rule?

        68-95-99.7 rule

     5. how to calculate correlation coefficient r. 

          ∑ (Zxi * Zyi) / (n-1)  ,  in which Zxi means Z-score of variate x. 


    6. In linear regression, formula is as follows:

               y_pred= mx + b   (1)

       in (1), m is calculated by 

                m = r * Sy/Sx.    (2)

     例如, 4 scatters, namely, (1,1), (2,2), (2,3), (3,6), giving a linear regression formula. 

      Step 1, By calculating, we get  x(mean) = 2, Sx=0.816;  y(mean)=3, Sy=2.16. 

          Step 2, calculating r,  r= ∑ (Zxi * Zyi) / (n-1) = 0.946 

          Step 3, calculating m,  m= r * Sy/Sx = 2.5 

          Step 4, 将(x_mean, y_mean), 也就是(2,3) 带入 (1),  we get the result: 

                y_pred=2.5 x -2  


    5. what's coefficient of determination ? 

       r^2 is called  coefficient of determination. 

      (1)  SE(y_mean) = (y1-y_mean)^2+ (y2-y_mean)^2 + (y3-y_mean)^2+ .....

      (2)  SE(line) = (y1-y1_pred) ^2 + (y2-y2_pred)^2 + (y3-y3_pred)^3 + ....

      (3)  r^2  =  1 -  SE(line)/SE(y_mean). 

    例如,         1⃣️ 对于非线性回归,SE(y_mean) is 41.1879

                2⃣️ 对于linear regression,SE(line)  is  13.7627.

           so, according to 1⃣️2⃣️,r^2 = 1-  13.762/ 41.187966.59%

           thus, 0.6659 is the coefficient of determination, 66.59%,  也表示 how well this line could fit these data.

     


    7. What's Root-mean-square deviation (RMSD) ,

       It's also called "standard deviation of residuals "

      Ri is residual, which calculated as follows, 

        Ri = y - y_pred       (1)       

     so,

       RMSD =   ∑ ( Ri^2 ) / n-1    (2)


     8. 另一种求linear regression的方法:分别求m跟b的偏导,然后令等于0,解二元一次方程组,结果如下:

     

    参考:可汗学院 :https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data  

                             https://www.khanacademy.org/math/statistics-probability/modeling-distributions-of-data

            https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/scatterplots-and-correlation/v/calculating-correlation-coefficient-r

            https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/assessing-the-fit-in-least-squares-regression/v/r-squared-or-coefficient-of-determination

  • 相关阅读:
    CSS--盒子模型详解
    html元素分类
    HTML语义化(2016/3/16更新)
    如何在线预览github上的html页面?
    【鬼脸原创】谷歌扩展--知乎V2.0
    CSS选择器详解
    HTML基础知识
    python- 日志学习
    python-ddt 数据驱动测试
    python
  • 原文地址:https://www.cnblogs.com/yyagrt/p/11408190.html
Copyright © 2011-2022 走看看